# CE 93: Engineering Data Analysis
# LAB 06 Probability Plots

**Full Name:** *replace text here*

## Instructions 

Welcome to Lab 06! 

Please save your work after every question! At the end, you will have to submit your Jupyter Notebook as a PDF file in the bCourses quiz. The notebook should be consistent with your quiz answers. Not submitting a PDF file will result in a grade of 0 on the lab assignment. You will also receive a 0 if your answers to the quiz are inconsistent with your PDF.

If you see cells with "..." make sure to replace the "..." with your code even if they are not listed with a "Question". 
Please remember to label all axes with the quantity and units being plotted. 

Any part listed as a "<font color='red'>**Question**</font>" should be answered in the bCourses quiz to receive credit.

We will use the following Python packages:

* NumPy
* MatPlotLib
* scipy.stats
* pandas
* random

## Load the required libraries 

The following code loads the required libraries. Run this cell first.

In [None]:
# import python library / packages 
import numpy as np                           # ndarrays for gridded data
import matplotlib.pyplot as plt              # plotting
from scipy.stats import *                    # common distributions
import pandas as pd                          # DataFrames for tabular data
import random                                # random sampling

## About Lab 06

Treasure Island lies in the middle of San Francisco Bay, visible to the right of the Bay Bridge as you cross from Oakland to San Francisco.The redevelopment of Treasure Island has been an on-going project since 2002, transitioning from a former military base into a new neighborhood and publicly accessible land. Your company has been asked to study the distribution of the bedrock depth across Treasure Island, which is critical for designing the planned high-rise buildings in Treasure Island.

<img src="Treasure_Island.jpg" width="600">

*Source: https://merrill-morris.com/projects/project/treasure-island-redevelopment*


### Load the data

In Lab 6 we will be working with bedrock depth data set from many tests performed near or around the Bay Area. The file is named `site_data.csv`. Note that this is a synthetic data set.

Let's load the provided data set `site_data.csv`. It has two features:

|Feature|Units|Description|
|:-|:-|:-|
|depth|m|depth to bedrock at different locations|
|elevation|m|ground elevation at test location|

* load using the Pandas `read_csv()` function

In [None]:
# edit the code cell below to read the .csv file and print the first few rows of the data set. 
# If you do not know how to do this, refer to Lab 01 or Lab 02

...

# Print the first few rows of the data set

### Create Variables from the DataFrame

We want to generate a data vector for depth and a data vector for elevation. Add your code below to take columns `depth(m)` and `elevation(m)` from the DataFrame you loaded above and save them as `population_depth` and `population_elevation`, respectively.

In [None]:
# create variables
# replace ... with your code

population_depth = ...
population_elevation = ...

## Bedrock Depth Parameters

Assume that the bedrock depth is a random variable $X$ in m, and that it represents the full population of bedrock depths in this area. Let's try to generate numerical summaries for $X$, which is saved as `population_depth`.

<font color='red'>**Question 1.**</font> What is the population mean bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

<font color='red'>**Question 2.**</font> What is the population standard deviation of the bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

In [None]:
# Add your code below

# Calculate and output mean(X)
population_mean = ...
print('E(X) = '+ str(round(population_mean,3))+ ' m')

# Calculate and output sigma(X)
population_stdev = ...
print('Stdev(X) = '+ str(round(population_stdev,3))+ ' m')

## Bedrock Distribution

You want to investigate whether the bedrock depth can be modeled using one of the common distributions. Plot a **density** histogram of the bedrock depth population, `population_depth`, using `plt.hist()` with `bins=25`. Based on the histogram and the shape of the different distributions we have discussed, try to think of a common distribution that is likely to represent this data set. You do not have to answer any questions, just think of one or a few.


In [None]:
# Add you code below

...

## Sampling

We assumed that this data set represents the entire population of bedrock depth. In practice, we generally do not have access to the full population, but rather a subset of the population. We will select a random sample that has a size equal to 10% of the population. 

<font color='red'>**Question 3.1.**</font> What is the size of the data set `population_depth`? There are different ways to get the size of this data set. If you are not sure how to proceed, refer to previous labs or surf the Internet. Add your answer in bCourses.

<font color='red'>**Question 3.2.**</font> What should our sample size be if we want to sample 10% of the population? Save this number as `sample_size`. Add your answer in bCourses.

In [None]:
# Add your code below

# get size of data set
population_size = ...
print('Size of data set: '+ str(population_size))

# get sample size
sample_size = ...

# Make sure sample_size is an integer
sample_size = int(sample_size)
print('Sample size: '+ str(sample_size))

Let's try to simulate a random sample from the population. Recall that in Simple Random Sampling, every sample has an equal chance of being selected. So, think of this as putting all the values in a bowl, and then selecting a few at random.`Python` can help us easily select a simple random sample using `random.choices(sequence, k)`. This function takes two inputs:
* `sequence`: the data set you want to sample from (`population_depth` in this case)
* `k`: the sample size (`sample_size` in this case)

**Note that** the data `sequence` cannot be a DataFrame. In our case, `population_depth` is a DataFrame. So to convert it to an acceptable data format, we use `sorted(population_depth)`. Also, `k` must be an integer.

Run the code cell below to see how we are selecting a random sample of size `sample_size` from `population_depth`. The sample is saved as `sample_depth`.

Plot a density histogram of `sample_depth` using `plt.hist()` with `bins=25` to see how the sample differs from the population histogram you plotted above. You do not have to answer any questions (yet).

Note that we have used `random.seed(99)` so that everyone gets the same answer. If you delete `random.seed(99)` and rerun the code multiple times, your sample histogram will change. That's sampling variation! 

**Make sure you add again `random.seed(99)` at the top of the code cell and rerun it before proceeding. Otherwise, your answers might differ from the correct answers.**

In [None]:
#set the random seed equal to 99
random.seed(99)

# select a random sample
sample_depth = random.choices(sorted(population_depth), k=sample_size)

# Modify ... to plot a density histogram of the sample.
# Plot density histogram of the sample
...

### Bedrock Depth Statistics

Now, we will try to use the sample we have selected to estimate the population parameters and to assess if the data plausibly come from some theoretical distribution such as normal, lognormal, exponential, etc.

Unless otherwise noted, you should use `sample_depth` in all of the remaining questions.

<font color='red'>**Question 4.**</font> What is the sample mean bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

<font color='red'>**Question 5.**</font> What is the sample standard deviation of the bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

In [None]:
# Add your code below

...

## Normal Distribution

Let's investigate whether the data of bedrock depth plausibly come a normal distribution. The normal distribution has two parameters, mean and standard deviation: $N(\mu_X,\sigma_X)$.

We will assume that the sample statistics are estimates of the population parameters.

In the code cell below, do the following **in a single plot**:
1. Plot a **density** histogram of the simulated values of `sample_depth`, using `plt.hist()` with `bins=25`.
2. Plot a theoretical normal pdf with parameters equal to your answers for Questions 4 and 5. Recall that we can use `Python` functions to calculate the pdf of a normal distribution. For this question, you should use: [`norm.pdf(x, loc=sample_mean, scale=sample_stedv)`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html). Note that you have to define `x`, which should be an array with possible values for the bedrock depth. Make sure that `x` covers the range of bedrock depths (see the histograms above to determine the range) and also has a small step (this is a continuous distribution). 
3. Label your axes correctly. Also, add a title. 
4. After generating the plot, right click on it and click 'Save image as' to download your figure as an image.

We did something very similar in Lab 05. You can refer to that lab for an example.

<font color='red'>**Question 6.**</font> Upload your figure to bCourses using the instructions there.

In [None]:
# Add your code below

...

### Q-Q Plot of Normal Distribution

The plot above can be used to judge, in a rough way, if the data of bedrock depth plausibly come a normal distribution. A more formal graphical way to judge whether the distribution is a good fit is using quantile-quantile (Q-Q) plots. A Q-Q plot looks like a scatter plot. To generate a Q-Q plot, we first sort our data in ascending order. We then plot the sorted data versus quantiles calculated from a theoretical distribution. The number of quantiles is selected to match the size of the sample data. I will show you how to create a Q-Q plot for the assumed normal distribution. You will then do the same for a lognormal and then an exponential distribution.

The theoretical quantiles based on a normal distribution can be calculated using: `norm.ppf(q, loc, scale)`, where q is an array of the quantiles you want to compute.

Finally, to check if our data possibly come from a normal distribution, we would create a scatter plot of the sorted data versus the theoretical quantiles. If the data fall on a straight line (45$^\circ$ angle), the assumed distribution is reasonable.

In [None]:
# sort the sample data in ascending order usinf np.sort()
sample_depth_sorted = np.sort(sample_depth)

# define the sequence of quantiles to compute. These should be decimals (not percentage)
# the first quantile corresponds to (1-0.5)*(1/sample_size): Refer to lecture slides
# the last quantile corresponds to (n-0.5)*(1/sample_size), where n is the sample size
# the number of quantiles is sample_size
# so, we can create the array of quantiles using np.linspace(start, end, n)
q = np.linspace((1-0.5)*(1/sample_size), (sample_size-0.5)*(1/sample_size), sample_size)

# calculate theoretical quantiles using normal distribution
theoretical_quantiles = norm.ppf(q, loc=sample_mean, scale=sample_stdev)

# Plot sorted data vs theoretical quantiles
plt.scatter(theoretical_quantiles, sample_depth_sorted)

# plot a 45 degrees line and specify its color as red, 'r'
plt.axline([0, 0], [1, 1] , c='r')

# label axes
plt.xlabel('Theoretical Quantiles from Normal Distribution')
plt.ylabel('Sample Quantiles of Bedrock Depth (m)')
plt.title('Q-Q Plot based on Normal Distribution ')

plt.show()

## Lognormal Distribution

Let's investigate whether the data of bedrock depth plausibly come a lognormal distribution. The lognormal distribution has two parameters: $LN(\mu,\sigma)$. Here, $\mu$ and $\sigma$ are the mean and standard deviation of the natural logarithm of $X$: $log(X)$, **not** $X$.

To correctly estimate the parameters $\mu$ and $\sigma$ in this case, follow these steps:
1. Transform the sample values `sample_depth` by taking their natural logarithm using `np.log()`. Save that as `log_sample_depth`
2. Calculate the sample mean of `log_sample_depth` just like you did in Question 4.
3. Calculate the sample standard deviation of `log_sample_depth` just like you did in Question 5.


Note that **THIS IS NOT THE SAME AS TAKING THE log OF THE MEAN AND STANDARD DEVIATION YOU COMPUTED IN QUESTIONS 4 AND 5.** If you are not convinced, compare these values to the log of the values you calculated in Questions 4 and 5.

<font color='red'>**Question 7.**</font> What is the sample mean of the natural log of bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

<font color='red'>**Question 8.**</font> What is the sample standard deviation of the natural log of bedrock depth? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

In [None]:
# Add your code below

# transform the sample data by taking their natural log
log_sample_depth = ...

# calculate sample mean of the log of depths
log_sample_mean = ...

# calculate sample standard deviation of the log of depths
log_sample_stdev = ...

# output results
print('Sample Mean = '+ str(round(log_sample_mean,3)))
print('Sample Stdev = '+ str(round(log_sample_stdev,3)))

### Q-Q Plot of Logormal Distribution

Let's make a Q-Q plot to see if the data of bedrock depth plausibly come a lognormal distribution. Copy the code that was provided to you to create a Q-Q plot for a normal distribution. Modify the code to calculate theoretical quantiles based on a lognormal distribution. You only have to edit the following:

* `theoretical_quantiles = norm.ppf(q, loc=sample_mean, scale=sample_stdev)`
* Any titles/labels

Instead of `norm.ppf(q, loc, scale)`, you should use `lognorm.ppf(q, s=..., scale=...)`. Replace `...` with the correct values (read the next line for details).

Recall that the inputs for a lognormal distribution are defined as follows:
[`scipy.stats.lognorm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html)(s=$\sigma$, scale=$e^\mu$), where $\mu$ and $\sigma$ are the values you calculated in Questions 7 and 8.

In [None]:
# Add your code below

...

## Exponential Distribution

Let's investigate whether the data of bedrock depth plausibly come an exponential distribution. The exponential distribution has one parameter, rate of occurrence: $Exp(\lambda)$. 

If we have a sample, we can estimate $\lambda$ as:

$$\hat{\lambda} = \dfrac{n}{\sum{X_i}}$$

where:
* $n$ is the sample size
* $\sum{X_i}$ is the sum of all the values in the sample

<font color='red'>**Question 9.**</font> What is the estimate of the rate of bedrock depth based on the random sample? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

Use the equation above and the sample bedrcok depth data: `sample_depth`. You can use the following function `np.sum()` to get the sum of an array.

In [None]:
# Add your code below 

rate_estimate = ...

# output results
print('Estimated Rate = '+ str(round(rate_estimate,3))+ ' /m')

### Q-Q Plot of Exponential Distribution

Let's make a Q-Q plot to see if the data of bedrock depth plausibly come an exponential distribution. Copy the code that was provided to you to create a Q-Q plot for a normal distribution. Modify the code to calculate theoretical quantiles based on an exponential distribution. You only have to edit the following:

* `theoretical_quantiles = norm.ppf(q, loc=sample_mean, scale=sample_stdev)`
* Any titles/labels

Instead of `norm.ppf(q, loc, scale)`, you should use `expon.ppf(q, scale=...)`. 

Recall that the input for an exponential distribution is defined as follows:
[`scipy.stats.expon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html)(scale=1/$\lambda$), where $\lambda$ is the value you calculated in Question 9.

In [None]:
# Add your code below

...

## Best Distribution

You should now have three Q-Q plots for the sample bedrock depth data using normal, lognormal, and exponential distributions.

<font color='red'>**Question 10.**</font> By comparing the Q-Q plots, which distribution best represents the sample bedrock depth data? Add your answer in bCourses.

## Covariance and Correlation

Recall that the data set includes bedrock depth and ground elevation. So, we have two jointly distributed random variables. Generally, as the ground elevation increases, the bedrock depth decreases (bedrock becomes shallower). So, let's investigate the strength of the relationship between ground elevation and bedrock depth. In the lecture, we discussed different parameters to quantify how dependent two random variables are:
1. Covariance
2. Correlation

### Covariance

The covariance of two random variables $X$ and $Y$ is defined as:

$$Cov(X,Y) = E(XY)-E(X)E(Y)$$

In the lecture, we only discussed the covariance between two random variables: $Cov(X,Y)$. We can also compute something known as the covariance **matrix**. `Python` provides a direct way to compute the covariance **matrix** using `np.cov(X,Y)`, which returns:

\begin{bmatrix}
Var(X) & Cov(X,Y)\\
Cov(Y,X) & Var(Y)
\end{bmatrix}

The diagonal elements of the matrix contain the variances of the variables, and the off-diagonal elements contain the covariances between all possible pairs of variables. Recall that $Cov(X,Y)=Cov(Y,X)$. So, the off-diagonal values will be equal.

Note that `np.cov()` has an optional input parameter `ddof`, which plays a similar role as the `ddof` in `np.std()`. Thus, `ddof=0` returns a population covariance matrix, whereas `ddof=1` returns a sample covariance matrix.

Compute the covariance matrix between `population_depth` and `population_elevation`.

<font color='red'>**Question 11.**</font> What is the covariance between bedrock depth, `population_depth`, and the ground elevation, `population_elevation`? Round your answer to the nearest integer. Add your answer in bCourses.

<font color='red'>**Question 12.**</font> What can you say about bedrock depth and ground elevation based on their covariance? Select your answer(s) from the options in bCourses.

In [None]:
# Add your code below

...

### Correlation

The correlation of two random variables $X$ and $Y$ is defined as:

$$\rho_{XY} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$$

`Python` also provides a direct way to compute the correlation **matrix** using using `np.corrcoef(X,Y)`, which returns:

\begin{bmatrix}
1 & \rho_{XY}\\
\rho_{YX} & 1
\end{bmatrix}

The diagonal elements of the matrix contain the correlation of the variables with themselves (which is 1), and the off-diagonal elements contain the correlation coefficients between all possible pairs of variables.

Compute the correlation matrix between `population_depth` and `population_elevation`.

<font color='red'>**Question 13.**</font> What is the correlation coefficient between bedrock depth, `population_depth`, and the ground elevation, `population_elevation`? Round your answer to 3 digits after the decimal point. Add your answer in bCourses.

<font color='red'>**Question 14.**</font> What can you say about bedrock depth and ground elevation based on their correlation coefficient? Select your answer(s) from the options in bCourses.

In [None]:
# Add your code below

...

## Testing Cost

Now that you have a better understanding of the distribution of bedrock depth across Treasure Island, you want to plan a site investigation. A geotechnical investigation is performed by engineers to obtain information on the properties of the underground soil and rock, which will support any proposed structures. Usually, a site investigation includes drilling deep vertical holes with a small diameter into the ground. These are known as boreholes. Drilling boreholes can be quite expensive, and the cost increases significantly with depth. Therefore, it is important to carefully plan boreholes so as to obtain the required information with minimum possible cost.

Assume that we want to drill a borehole until we reach bedrock below the ground surface. Thus, that the cost of performing one borehole is a random variable $Z$ in $\$$ that is a function of bedrock depth, $X$, as shown here:

$$ Z ($) = X^{2} + 5000 $$

Thus $Z$ is also a random variable, because it is a function of $X$, which is a random variable.

Note that $Z$ is a **nonlinear** function of X (it is a function of $X^2$).

### Expected Testing Cost
<font color='red'>**Question 15.**</font> How can we calculate the expected value of the cost of one borehole, $E(Z)$, based on the parameters of the bedrock depth $(X)$? Select your answer(s) from the options in bCourses.

While you should be able to answer this question based on the lecture concepts, here are a few things you could do to confirm your answer.
1. Create a new random variable, name it `population_cost`, which is equal to `population_depth**2 + 5000`. This is simply the equation of $Z$.
2. Compute the mean of `population_cost`. This is $E(Z)$.
3. You already calculated $E(X)$ in Question 1 and can compute $Var(X)$ based on Question 2. You can also compute $E(X^2)$ by simply taking the mean of `population_depth**2`.
4. Try the different equations from bCourses below and see which ones match the value of $E(Z)$.

In [None]:
# Add any code below

...

## Sampling Variation (Relevant to Project 1)

In Project 1, you will investigate the influence of the sample size on sampling variation. So read the next parts carefully.

All of the analyses you have performed thus far were based on a sample that has a size equal to 10% of the population size.

Let's try to estimate the sample mean using sample sizes of [1%, 2.5%, 5%, 10%, 25%] of the population size.

In the code cell below, we are taking samples of different sizes, and for each size, we are calculating the sample mean. Then, we are plotting the calculated samples means versus the sample sizes. The true population mean is shown using a solid red line, and the +/- 5% range of the population mean is shown using dotted red lines.

In the lecture, we mentioned that the expected value of the sample mean is equal to the population mean, regardless of the sample size.

Read then run the code cell below.

In [None]:
#set the random seed equal to 6
random.seed(6)

# define an array of different sample sizes correpsonding to [1%, 2.5%, 5%, 10%, 25%]
N_values = np.array([0.01, 0.025, 0.05, 0.1, 0.25])*population_size

# make sure N_values has integers
N_values = N_values.astype(int)

# loop through every value in the array N_values
# so, we will first take a sample that has a size = 1% of the population
# then, we will take a sample that has a size = 2.5% of the population
# etc. until we go through every value in N_values
for N in N_values:
    
    # take a random sample of size N
    sample_depth = random.choices(sorted(population_depth), k=N)
    
    # plot the sample mean versus the sample size
    plt.plot(N, np.mean(sample_depth), 'ob')

# plot a horizontal solid line for the population mean and set its color as red, 'r'
plt.axline([0, 0], [population_mean, population_mean] , c='r', label='Population Mean $\pm 5\%$')

# plot a horizontal dotted line for +/- 5% of population mean and set its color as red, 'r'
plt.axline([0, 0], 0.95*np.array([population_mean, population_mean]), linestyle = ':', c='r')
plt.axline([0, 0], 1.05*np.array([population_mean, population_mean]), linestyle = ':', c='r')

# add legend
plt.legend()

# label axes
plt.xlabel('Sample Size')
plt.ylabel('Bedrock Depth Mean (m)')
plt.title('Variation of Sample Mean with Sample Size')

# set the y-axis limit
plt.ylim(60, 95)

# convert the x-axis to log scale so that the plot looks better
plt.xscale("log")  

# specify x-axis tick marks
plt.xticks(ticks=N_values, labels=N_values)

# display plot
plt.show()

### Standard Deviation Sampling Variation

We can similarly investigate the accuracy of the sample standard deviation as a function of sample size.

The instructions below ask you to edit specific lines of the code cell above. If you do not have line numbers showing next to your code, you can enable them by typing `Shift-L` or clicking on `View` from the Menu bar, and then `Toggle Line Numbers`.

Copy the code from above and paste it in th code cell below. Then, edit the following:
1. Modify line 20 to plot **sample standard deviation** of `sample_depth` versus the sample size
2. Modify Line 23 to show the population standard deviation (you already calculated this in Question 2 and saved it as `population_stdev`)
3. Similarly, modify Lines 26 and 27 based on the population standard deviation
4. Update axes labels and titles accordingly
5. Modify the y-axis limit as needed (Line 38) to show all the data (you can simply delete it)
5. Make sure to include `random.seed(6)` at the beginning of the code cell below

<font color='red'>**Question 16.**</font> Based on the plot, which sample sizes result in a sample standard deviation that is within +/- 5\% of the true population standard deviation? Select your answer(s) from the options in bCourses.

In [None]:
...

## Estimator Precision

You have completed all questions needed for the lab. However, please read and understand the next code, as you will be asked to do the same analysis for Project 1.

In the code above, we just took 1 sample of each sample size. If we want to investigate the sampling variation, we can take many samples of the same size, and then evaluate, for each sample size, the precision of the sample mean. Accuracy evaluates how close the estimate is to the true value. Precision evaluates how much variation there is in the estimate when the sample changes. Ideally, you want an estimate to be both accurate and precise.

In the code below, we are taking 20 different samples from each sample size (i.e., 20 samples with size 1% of the population, 20 samples of size 2.5% of the population, etc.), and then plotting the sample mean for each of these samples. This is achieved using a for-loop (Line 13), that simply repeats everything that is within the loop 20 times. 

Run the code and investigate the output. It should be evident that the sampling variation and the precision change as a function of the sample size.

While you don't have to submit anything, understand what was done so that you can replicate it in your project. 

In [None]:
#set the random seed equal to 6
random.seed(6)

# loop through every value in the array N_values
# so, we will first take a sample that has a size = 1% of the population
# then, we will first take a sample that has a size = 2.5% of the population
# etc. until we go through every value in N_values
for N in N_values:
    
    estimate=[]
    
    # for each sample size, take 20 samples
    for trial in range (20):
        
        # take a random sample of size N
        sample_depth = random.sample(sorted(population_depth), N)

        # plot the sample mean versus the sample size
        plt.plot(N, np.mean(sample_depth), 'ob')
        estimate = np.append(estimate,np.mean(sample_depth))

# plot a horizontal line for the population mean and set its color as red, 'r'
plt.axline([0, 0], [population_mean, population_mean] , c='r', label='Population Mean')

# add legend
plt.legend()

# label axes
plt.xlabel('Sample Size')
plt.ylabel('Bedrock Depth Mean (m)')
plt.title('Variation of Sample Mean with Sample Size')

# set the y-axis limit
plt.ylim(60, 95)

# convert the x-axis to log scale so that the plot looks better
plt.xscale("log")  

# specify x-axis tick marks
plt.xticks(ticks=N_values, labels=N_values)

# display plot
plt.show()

## Submit your work!

<font color='red'>**Question 17.** </font> Submit your PDF file.

I recommend that you save your .ipynb file and keep a copy of it so that you can refer to it in the future (e.g., when working on the project). 

Once done with answering ALL questions and you are ready to submit the quiz, follow these steps:

1. Run all cells in the notebook. You can do this by going to Cell > Run All. This makes sure that all your visuals and answers show up in the file you submit.

2. Then, go to "File > Download as > PDF via LaTex(.pdf)" to generate a PDF file or PDF via HTML(.html). Name the PDF file with your last name "Lastname.pdf". Even if you click on PDF via HTML(.html), make sure that the downloaded file is '.pdf'.

3. If you have trouble generating the PDF file from Jupyter notebook, use [datahub.berkeley.edu](http://datahub.berkeley.edu). Log in with your CalNet credentials. Upload the ipynb file with your outputs and results to Juptrer. Then follow step 2.

4. Upload the PDF file to the bCourses quiz (more instructions there).


**Not submitting a PDF file will result in a grade of 0 on this lab assignment.**
**You will also receive a 0 if your answers to this quiz are inconsistent with your PDF.**