# Lab 04 Prelab - Practice Making Calculations With Python and Numpy

The goal of this prelab activity is to give you additional practice using Python and Numpy. What we will do is reproduce the calculations done by ```np.mean()``` and ```np.std()``` step by step as per the equations we introduced previously.

**To be clear, for your actual lab, you should still calculate the mean and standard deviation of your data using the built-in Numpy functions ```np.mean()``` and ```np.std()```.**  However, as you will see here, you can also do so without these built-in functions. While it does require several steps of calculations, we want to give you the opportunity to learn how to do it in this "long way" since:

1. Practicing and maintaining your coding skills is useful beyond just this lab course 
2. Many of the calculations we perform later in this course will not correspond to built-in functions, so it is useful to learn how to do more complicated calculations.
3. Breaking down complicated calculations into a several lines of code---as we do in these "long way" calculations---is the strategy that we will be encouraging you to use for most of your coding work going forward in this course. It is often easier to find problems or errors in your calculations if you can look at intermediate steps and values.
4. This notebook will also allow you to learn a few more very useful Python and Numpy functions

In [None]:
%reset -f 
# Clear all variables, start with a clean environment.
import numpy as np
import data_entry2

### Import data

Let's import a spreadsheet of the distance data, $x$, that you considered in your last pre-lab (Prelab 03).  

* Recall that we can always copy a spreadsheet that already exists somewhere into a new file.  Here, the spreadsheet we want to copy is in your Lab03 folder and we want to copy it into a new spreadsheet in your current folder (Lab04).  We will call our new spreadsheet file: ```prelab03_copy```.  
  
* If you forget how to use ```data_entry2``` to create a new spreadsheet or to copy an existing one, you can always go to the Canvas page **Python Tips and Troubleshooting Jupyter Issues** for a quick guide on how to do it.

Remember to click on `Generate Vectors` after you've run the cell to import the $x$ data into the `xVec` variable.

In [None]:
# Run me to import the spreadsheet, `prelab03_1`, which is found in the Lab03 directory
de1 = data_entry2.sheet_copy('../Lab03/prelab03_1', 'prelab03_copy')

### Calculating an average without ```np.mean()```

Let's revisit our equation for calculating an average,

$$\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i.$$

We will break the operation of calculating the average into steps. We will first sum up all the $x_i$ values, then count how many values there are (i.e. $N$), and then finally calculate the quotient.

**Your turn #1a:** Similar to `np.mean()` and `np.std()`, there is a NumPy function for calculating a sum,`np.sum()`. Use this function in the code cell below to define a variable `xSum` which is the result of the sum over the elements in `xVec`.

In [None]:
# Use this cell to define your variable xSum

xSum = 

Next, the built-in Python function `len()` calculates how "long" a vector is, i.e. it counts up the number of elements within the supplied variable. For instance, if you run the code cell below you can see `len()` returns `3` when we supply it with the three-element vector `foo`:

In [None]:
# Run me to see how len() works

foo = np.array([1, 2, 3])
len(foo)

**Your turn #1b:** Use `len()` in the cell below to define variable `N` which is the result of counting the number of elements in `xVec`.

In [None]:
# Use this cell to define a variable N

N = 

**Your turn #1c:** Finally, define the variable `xAvg`, which is calculated by dividing `xSum` by `N` to arrive at the average of `xVec` the "long way". Print out the value of `xAvg`.

In [None]:
# Use this cell to define xAvg. Add a second line of code to print out the value

xAvg = 

print(f"Average (long way) = {xAvg:.4f} mm")
print(f"Average (short way) = {np.mean(xVec):.4f} mm")

You should find that you calculated an average distance of 435.028 mm just like when using the `np.mean()` function.

### Calculating standard deviation the "long way"

Recall the equation for the standard deviation:

$$ \sigma = \sqrt{\frac{1}{N-1}\sum_{i=1}^N \left(x_i - \bar{x}\right)^2}.$$

This equation is a little more involved, but, as we did with the average, we will divide our process into smaller steps.

Our steps, in order, are as follows

1. Find the average (done!)
2. For each value $x_i$, find the difference between it and the average.
3. Find the square of that difference for each value and then sum up all of those differences of squares.
4. Finally we need divide that sum by $N-1$ and take the square root.

Let's start with calculating $x_i - \bar{x}$ for each data point. What we want Python to do is take each data point in `xVec` and subtract `xAvg`. Thankfully, this can be done in a single, intuitive line of code. If we were to do this in a calculator, we'd have to make 25 calculations - one for each data point in `xVec`. However, Python is smart enough that when we supply it with a 25-element Numpy vector like `xVec` and ask it to subtract off a one-element vector or scalar like `xAvg`, then it knows that you want to subtract `xAvg` from each data point in `xVec`.

Run the cell below to test this behaviour:

In [None]:
# Run me to see an example of subtracting a single number from a vector

bar = np.array([1, 2, 3, 4, 5])
print('Dummy data = ', bar)

barMinusOne = bar - 1
print('Dummy data subtracted by 1 = ', barMinusOne)

**Your turn #2a:** Using the example above, define a new Python variable `diffFromAvg` below which subtracts `xAvg` from each element of `xVec`.

In [None]:
# Use this cell to define diffFromAvg

diffFromAvg = 

Going back to the standard deviation formula, we see that we now need to _square_ each of these differences from the average. In Python, the operator that raises a number to a power is `**`. Again, Python is smart enough to know when we ask to square a vector, Python will square each element within the vector. 

Run the cell below to test this behaviour:

In [None]:
# Run me to see an example of squaring numbers

some_numbers = np.array([1, 2, 3, 4])
some_numbers_squared = some_numbers**2
print('The numbers squared are', some_numbers_squared)

**Your turn #2b:** Complete the cell below to define the new variable `diffFromAvgSquared`, which squares your previous result.

In [None]:
# Run this cell to define diffFromAvgSquared, the square of each element from the vector diffFromAvg

diffFromAvgSquared = 

**Your turn #2c:** Our next step is to sum up these squared differences. You already learned how to perform sums in Python using `np.sum()` earlier in calculating the average the "long way". Use `np.sum()` to define a new variable `sumSquaredDiffs` which is the result of summing all the elements from the vector `diffFromAvgSquared`.

In [None]:
# Use this cell to define sumSquaredDiffs

sumSquaredDiffs = 

**Your turn #2d:** We are almost done. We need to divide the sum of the squared differences, `sumSquaredDiffs`, by $N-1$.  And then, we take the square root of the whole thing.  Run the cell below to do this.

In [None]:
# Run me to finish the "long way" calculation of the standard deviation and compare it to the "short way"

xStdLong = 

print(f"Standard deviation (long way) = {xStdLong:.4f} mm")
print(f"Standard deviation (short way) = {np.std(xVec, ddof=1):.4f} mm")

If all went well, you should see identical results for calculating the standard deviation of `xVec` the long or short way.

## Preparing your Lab 04 notebook
As usual, you should now prepare your Lab 04 notebook for data collection and analysis.

1. Open the Lab 04 Instructions on Canvas and take a couple minutes to read through them so that you have a sense of how you will be spending your time during the lab.
2. Focusing on Part C and D, copy in and modify your code as needed from the last lab so that you can calculate the average periods `T_10` and `T_20`, as well as their uncertainties `dT10` and `dT20`, and relative uncertainties `dT10_rel` and `dT20_rel`. 
3. You will also need the code to compute the $t'$-score to compare your two averages. Think about how to correctly interpret what the $t'$-score is telling you.
4. You may wish to test your code with some data from Lab 03 to make sure everything works as expected. To do so, you can make a copy of the spreadsheet you made in Lab 03, as we did earlier in this pre-lab.  **However, make sure to create a new spreadsheet for the start of Lab 04 to collect new data.**

# Submit

Steps for submission:

1. Click: Run => Run_All_Cells
2. Read through the notebook to ensure all the cells executed correctly and without error.
3. Correct any errors you find.
4. File => Save_and_Export_Notebook_As->HTML
5. Upload the HTML document to the lab submission assignment on Canvas.

In [None]:
display_sheets()