# Lab 03 Prelab, Part 2 - Analysis preparation and data collection practice

Please complete Part 1 of the prelab on Canvas before working through this notebook.

In [3]:
%reset -f 
# Clear all variables, start with a clean environment.
import numpy as np
import data_entry2

This prelab activity introduces useful features of our spreadsheet tool, `data_entry2`, and then shows you how to use Python to calculate the quantities _mean_ (_average_), _standard deviation_, and _(standard) uncertainty of the mean_. 

## Simple Calculations in data_entry2 cells

It is possible to do some simple calculations directly in the `data_entry2` sheet. In general we want you to do calculations in their own notebook cells, but for some tasks, most notably recording your uncertainties, it is very convenient to use this feature of the sheet.

As an example, if you measure a mass of 497 g, and estimate a 95% confidence interval of [477, 516] g, you can record the mass and its uncertainty $dm$ in your spreadsheet like this:

| m | dm|
| ------ | ------- |
| g | g |
| 497 | = (516-477)/4|


Alternatively, if you have a rectangular PDF on a balance with a 10 g resolution, you might use something like:

| m | dm |
| ------ | ------- |
| g | g |
| 142 | = 10/(2 * np.sqrt(3))|


**Your turn #1:**

Use the sheet below to try calculating both of these forms of uncertainty within the sheet itself.
- Enter a variable name, $m$ (in grams) for the first column, and $dm$ in the second column for the uncertainty. 

- In the next two rows, enter the measurements and expressions to calculate uncertainties as shown in the two examples above.

- Notice that in the sheet interface, you see the formulas you've entered, but that when you `Generate Vectors`, the expressions are evaluated and the generated uncertainy vector contains the results of the calculations.

- Alter one of the expressions in the uncertainty column so that it contains an error - perhaps add an extra ')' at the end of the expression to see what happens.

- To get rid of unused rows and columns, execute (Shift+Enter) in the cell that you used to create the ```data_entry2``` sheet.

In [4]:
de0 = data_entry2.sheet("test_formulas")

Sheet name: test_formulas.csv


VBox(children=(HBox(children=(Button(description='Undo', style=ButtonStyle()), Button(description='Redo', styl…

## Summary of Part 1 of the prelab

Here is a summary of the statistics concepts covered or reviewed in Part 1 of this prelab.

For a distribution of **$N$ data points**:

1. The **average** (or mean) is defined by:
     
$$\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i.$$

2. The **standard deviation** is defined by:

$$ \sigma = \sigma(x) = \sqrt{\frac{1}{N-1}\sum_{i=1}^N \left(x_i - \bar{x}\right)^2} $$

3. For variables that follow a Gaussian probability density function (PDF), 
    - the **68 \% confidence interval** ($\mathrm{CI_{68 \%}}$) is defined as $[\bar{x} - \sigma,~\bar{x} + \sigma]$; and $\sigma = \frac{\mathrm{CI_{68 \%}}}{2}$,
    - the **95 \% confidence interval** ($\mathrm{CI_{95 \%}}$) is defined as $[\bar{x} - 2\sigma,~\bar{x} + 2\sigma]$; and $\sigma = \frac{\mathrm{CI_{95 \%}}}{4}$.

4. As we have done in Lab 01 and 02, we use the **standard deviation** as an indicator of the uncertainty in a **single measurement** and this value does **not** depend on the number of measurements taken. 

5. So far, we have been estimating the standard deviation by estimating the $\mathrm{CI_{95 \%}}$ by eye.  However, in Lab 03 and 04, you will actually make multiple measurements of the same quantity and will thus be able to calculate $\sigma$ explicitly using the formula above.

6. The **uncertainty in the mean** (often called **standard error of the mean**) is given by:

    $$\delta\bar{x} = \frac{\sigma(x)}{\sqrt{N}}$$

7. We use the **uncertainty in the mean** as an indicator of the uncertainty in the **average of multiple measurements** and it does improve as we increase the number of measurements.

## Developing your Python skills

Let's import a spreadsheet of the distance data you considered in Part 1 of your pre-lab. 

_Note: We have labeled the distance data as ```x``` instead of ```d``` like in the pre-lab, which will keep our Python variable names more easily readable._

In [None]:
# Run me to import the spreadsheet, `prelab03_1`, which is found in the same directory as `Lab03-prelab.ipynb`
de1 = data_entry2.sheet('prelab03_1')

Sheet name: prelab03_1.csv


VBox(children=(HBox(children=(Button(description='Undo', style=ButtonStyle()), Button(description='Redo', styl…

**Your turn #2:** Double-check that you have the correct number of data points by looking at the row number in the data table above. There are 25 data points, but remember that Python indexing starts at 0! 

Below is a table of the hypothetical data that should appear in the above data table.

### Hypothetical data

| x (mm) |
| ------ |
| 439.3  |
431.6
434.6
433.3
439.3
442.6
428.6
441.6
431.2
427.6
433.2
441.3
436
437.6
434.7
433.2
433.1
431.3
436
432.9
436.5
437.2
435.7
432.6
434.7

## Calculating average and standard deviation using Numpy functions

**Your turn #3:** 

Press the `Generate Vectors` button at the top of your spreadsheet to load the data into your notebook. Then, use the cell below to calculate the average and standard deviation using the `np.mean()` and `np.std()` functions, respectively. Notice that `np.mean()` has a single *argument*, which is the vector of values over which to calculate the average. We discuss the second argument in `np.std()` below.

_Note: If it is not working correctly, double-check above that you have correctly titled the single spreadsheet column as `x` and that there is a resulting generated vector `xVec`._

In [6]:
# Run me to calculate average and standard deviation. 
# - Notice how we're able to include descriptive text and units in the print commands.

x_avg = np.mean(xVec)
print("Average of x =", x_avg, "mm")

x_std = np.std(xVec, ddof=1)
print("Standard deviation of x =", x_std, "mm")

Average of x = 435.028 mm
Standard deviation of x = 3.8362872676586677 mm


You should find that the average is 435.028 mm, which is consistent with our estimate of 435 mm from the histogram in Part 1 of the prelab. The standard deviation should be 3.8362872676586677 mm, which would be 3.8 mm if we were to round it to 2 significant figures when we report it. This is also consistent with our estimate of 4 mm using the 95% confidence interval with the histogram earlier.

Note that in `np.std()` we are supplying a second argument, `ddof=1`. This additional argument is needed because the `np.std()` function uses a general formula in its calculation - it can be used for a number of related calculations. In particular the formula it uses is:

$$ \textrm{np.std()} = \sqrt{\frac{1}{N-\textrm{ddof}}\sum_{i=1}^N \left(x_i - \bar{x}\right)^2}. $$

We want $N-1$ in the denominator as per our definition of standard deviation (which we discussed in Lab 01), so we need to use `ddof = 1`:

$$ \sigma = \sqrt{\frac{1}{N-1}\sum_{i=1}^N \left(x_i - \bar{x}\right)^2}. $$

If you want to control the number of significant figures displayed you can modify the print statement to be an f-string as follows. Recall that we first encountered f-strings in Lab 00.

Within the curly braces, the `:.2` tells the print function to round the variable to the left of the colon, in this case `x_td`, the standard deviation of `x`, to two digits. 

In [6]:
# Run me to print the standard deviation with 2 significant figures "{x_std:.2}"

print(f"Standard deviation to 2 sig figs = {x_std:.2} mm")

Standard deviation to 2 sig figs = 3.8 mm


Let's step back for a moment and think about what the standard deviation represents. Twenty-five measurements were made using the same experimental procedure, so the standard deviation we calculated from these twenty-five measurements represents the *variability* we can expect when making any one measurement. 

In the language we are using in this course, this standard deviation is the standard uncertainty in a **single** distance measurement. In other words, if we wanted to report the value and uncertainty for one of our measurements of $x$, say $x_1=434.7 \,\text{mm}$, we would report it as:

$$ x_1 = (434.7 \pm 3.8) \, \textnormal{mm} $$

The subscript '1' is being used here to emphasize that we are talking about a single measurement and not the average. **Overall, the key take-home message is that the standard deviation of many repeated measurements describes how confident we should be in any one of the individual measurements.**

**Note:** By definition, you can't calculate the standard deviation from a single measurement (look at what happens if you put $N=1$ in the standard deviation equation!). This might be confusing to you since, in the first two labs, we estimated the standard deviation by just making a single measurement of the scale reading or the spring length. While it's true that you only recorded a single measurement for those quantities in your notebook, our eyes/brain considered a whole distribution of values! For example, when making a single measurement for $m_2$ your eyes were "measuring" many fluctuating values and picking out the rough min and max to estimate your $\mathrm{CI_{95 \%}}$. You were essentially using your eyes to sample the underlying Gaussian probability function to estimate the standard deviation.

# Familiarizing yourself with Lab 03

In Lab 03, you will make measurements to determine the period of a pendulum.  In this section of the pre-lab, you will use a simulation of the Lab 03 experimental equipment so that you can familiarize yourself with the lab equipment and the sort of calculations you will be expected to carry out in your lab notebook. 

*Notes: You may find it helpful to add some notes about your observations in the "Part C.1 - Familiarization notes" section of your Lab03.ipynb notebook as you do this section.*


#### Pendulum simulation

**Your turn #4a:** Please open the Pendulum simulation ([link](https://phas.ubc.ca/~sqilabs/Lab03-Pendulum.html)). First take a minute to play around with all the features of the simulation so that you understand how the pendulum and the timer work. 

In the cell below, brainstorm some ideas about how you'd go about measuring the period of the pendulum with the timer if you release it an initial amplitude of $15^\circ$. Then read the hidden "Answer and discussion" cell.

_Remember that the period, $T$, is defined as the time taken for one complete cycle of the pendulum’s motion (i.e. the time it takes for the pendulum to swing out _and_ back to its initial position)._


*My initial measurement strategy for measuring the period $T$ at a starting amplitude of $15^\circ$ is:*

- ...
- 

**Answer and discussion:**

You might have realized that there are many ways to experimentally measure the period. The first approach most people think of is to directly measure the period from **one** cycle of the pendulum.  However, you will soon discover that it is more advantageous to calculate the period from the time interval taken for the pendulum to go through **multiple** pendulum cycles. Unsurprisingly, repeating your experiment over multiple trials will also yield a better estimate of the period.

**Your turn #4b:** If during one trial of your experiment you measured that it took $\Delta t = 18.1 \, \text{s}$ for the pendulum to complete $M_{\text{cycles}}=10$ cycles, what is the period of the pendulum?

**Answer:**

The period $T$ is simply: $$T = \frac{\Delta t}{M_{\text{cycles}}} = \frac{18.1 \, \text{s}}{10} = 1.81 \, \text{s}. $$


#### Experimental design

You now have an experimental design choice to make when planning an experiment to measure the pendulum period given some starting amplitude: 
* how many pendulum cycles, $M_{\text{cycles}}$, will you count in each of your trials? 
* how many trials, $N_{\text{trials}}$, will you complete? 

**Your turn #4c:**  Follow the instructions below to take some preliminary data with the simulation:

1. Start a fresh spreadsheet below for data collection. In the new spreadsheet, you will record the time interval $\Delta t$ (in Python, `delta_t`) taken by your pendulum to complete $M_{\text{cycles}}$ cycles (remember to record this number as an appropriately named Python variable like `M_cycles = ` in a code cell).
   
2. Set an external timer and give yourself **5 minutes** total to collect as much data as you can:
   1. Set the initial release amplitude to $15^\circ$.
   2. Start your first trial and record $\Delta t$ **directly in your spreadsheet in this notebook**. 
   3. Repeat your $\Delta t$ measurement **as many times as you can** in 5 minutes, and record each measurement in a new row of the spreadsheet. The number of data points you collected for $\Delta t$ is your number of trials, $N_{\text{trials}}$ (which you should also record as the Python variable `n_trials = ` in a code cell).
   
3. After your 5 minutes of data collection are finished, press `generate_vectors` to create a vector with your data.

In [7]:
# Use this cell to create a new spreadsheet, prelab03_2, for data collection
de2 = data_entry2.sheet('prelab03_2')


Creating undo file
Sheet name: prelab03_2.csv


VBox(children=(HBox(children=(Button(description='Undo', style=ButtonStyle()), Button(description='Redo', styl…

In [None]:
# Use this cell record the number M of pendulum cycles per trial and number N of trials

M_cycles = 

N_trials =

#### Calculating the period and its uncertainty

Below are some useful relationships to help you calculate the period and its uncertainty based on the data you collected above. 

* Our best estimate for the pendulum period is simply given by the average period: $$T=\overline T.$$
*  As discussed in ```Your turn #4b```, the pendulum period $T_i$ of your $i$-th trial, consisting of $M_{\text{cycles}}$ cycles is related to the measured time interval taken to achieve those cycles $\Delta t_i$ by:
$$T_i=\frac{\Delta t_i}{M_{\text{cycles}}}.$$
* So you can find the average pendulum period $\overline{T}$ by average over all the $T_i$ collected over $N_{\text{trials}}$: $$\overline{T} = \frac{1}{N_{\text{trials}}} \sum_{i=1}^{N_{\text{trials}}} T_i.$$ ❗Remember that you can just use the ```np.mean()``` function to calculate this.
  
* Given that we know how the period $T$ relates to the time interval $\Delta t$, we can also re-write the average period as: $$\overline{T}=\frac{\overline{\Delta t}}{M_{\text{cycles}}}.$$ So to find the average period you can either: 
  * take the average of your $\Delta t_i$ data and then divide by $M_{\text{cycles}}$, or;
  * divide your $\Delta t_i$ data by $M_{\text{cycles}}$ first to get $T_i$ data and then average over that. 
  
  Both approaches are exactly equivalent, pick the one that works for you!
  
* The **uncertainty** in your best estimate for the pendulum period is given by the standard uncertainty in the mean of the period (which we discussed at the start of the prelab), and is given by: $$\delta T = \delta \overline{T} = \frac{\sigma(T)}{\sqrt{N_{\text{trials}}}},$$ where the standard deviation in the period data is given by: $$ \sigma(T) = \sqrt{\frac{1}{N_{\text{trials}}-1}\sum_{i=1}^{N_{\text{trials}}} \left(T_i - \overline{T} \right)^2} $$ ❗Remember that you can just use the ```np.std()``` function to calculate this.
  
* Using the above equation for $ \delta T $ and given that we again know how the period $T$ relates to the time interval $\Delta t$, we can also re-write the uncertainty in the average period as: $$ \delta T = \delta \overline{T} = \frac{\sigma(\Delta t)}{M_{\text{cycles}} \sqrt{N_{\text{trials}}}}.$$ ❗This is a **very** important relationship for you to consider when deciding on how to design your experiment.


**Your turn #4d:**

Using the data you just collected and the relationships detailed above, calculate in the cell below:
- the pendulum period $T$, 
- the uncertainty in the period $\delta T$, 
- and the relative uncertainty $\delta T / T$.

There are a couple ways to do this as you saw above, pick the approach that works for you.  Remember to use ```np.mean()``` and ```np.std()```.

In [None]:
# Use this cell (and additional ones if you like) to calculate and print T, dT, dT_rel



## Preparing your Lab 03 notebook
You should now prepare your Lab 03 notebook for data collection and analysis

**Your turn #5:**
1. Open the Lab 03 Instructions on Canvas and take a couple minutes to read through them so that you have a sense of how you will be spending your time during the lab.
2. Focusing on Part C.1, open up your Lab 03 notebook and notice that we have again provided you with a ready-to-go spreadsheet with two columns for data entry. Instead of just `delta_t` from the prelab (for 15°), we have specified `delta_t_10` and `delta_t_20` since in the lab you will be collecting data at two different angles.
3. In the provided spreadsheet, make-up a few rows of test data for these two angles and press ```Generate Vectors```.
4. Copy in and modify your code as needed from this prelab so that you can calculate the average periods `T_10` and `T_20`, as well as their uncertainties `dT10` and `dT20`, and relative uncertainties `dT10_rel` and `dT20_rel`. Note that you will need to specify or extract your values for `Mcycles` and `Ntrials` to be able to do these calculations.
5. Test your code in your Lab 02 notebook using the provided prelab data to ensure you are getting the same values in that notebook as in this one.

You should now be ready for data collection and data analysis in the lab.

# Submit

Steps for submission:

1. Click: Run => Run_All_Cells
2. Read through the notebook to ensure all the cells executed correctly and without error.
3. Correct any errors you find.
4. File => Save_and_Export_Notebook_As->HTML
5. Upload the HTML document to the lab submission assignment on Canvas.

In [None]:
display_sheets()