## Index 

1. Intended learning outcomes
2. Using Jupyter Notebooks
3. Literals, operators, and data types
4. Variables and keywords
5. NumPy, SciPy and arrays
6. Basic statistics with arrays
7. Reading and writing files
8. Matplotlib: plotting data
9. Optional: Log plots
10. Coding practices


## 1. Using Jupyter Notebooks<a id="notebooks"></a>
We will be using Jupyter Notebooks as interactive lab scripts. These notebooks include both text and code. Once you have saved a copy on your own hard drive, you can type code in the code cells (the cells preceded by "In [ ]:". 

You can execute either by pressing **ctrl+enter** (sometimes **cmd+enter** on mac), **shift+enter** (jump to cell below), **alt+enter** (create new cell below, **option+enter** on mac) or by pressing the **'run cell'** button (play symbol) in the toolbar above. You can add more code cells by pressing the 'insert cell below' button (plus symbol). 

By default any new cells you add are code cells; however you can change these to Markdown  in the drop-down list in the toolbar to allow you to make your own notes in the lab scripts. For a cheat sheet on Markdown, see [here](http://assemble.io/docs/Cheatsheet-Markdown.html).

Make sure you add new code cells every time you want to try out something new, instead of editing your previous code. This way you have a record of everything you have done, which both you can refer to. Save your work regularly - sometimes you may be forced to close and reopen your file, and you don't want to lose any of your work!

If you double-click on a Markdown cell it will change into edit mode. Run the cell to turn it back into markdown.

The notebooks will use colour-coded cells:

<div style="background-color: #00FF00">
This is a core exercise cell. These are the most important exercises you will encounter in these notebooks and  cover all of the intended learning outcomes.

<div style="background-color: #FFF8C6">

This is a non-core cell, which includes additional information or extra exercises. You can complete these if you want extra practice.

## 3. Literals, operators, and data types<a id="operators"></a>

On a very basic level, Python can be used as a calculator. 
<div style="background-color: #00FF00">
    
**Exercise 2: in the cell below, type:**

<span style="color:blue">8 + 2</span>

**Now run the cell.**

The numbers <span style="color:blue">8</span> and <span style="color:blue">2</span> you typed above are called literals. Literals are data inserted directly into your code. Here, we have used integer literals (e.g. 1, 2, -3, ... ), but more frequently we will use float literals (numbers with decimal points or given in scientific notation, even if they represent integers: 1.3, 10.0, 1e10, ...). These numbers are called “floating point” because of the manner in which they are encoded in the computer’s binary memory. 

The <span style="color:blue">+</span> symbol in first code cell is called an *operator*. Operators operate on the code on either side of them (called the operands), and produce some sort of result (e.g. the integer 10 in your code above). Other examples of arithmetic operators are:

- <span style="color:blue">\- </span>for subtraction
- <span style="color:blue">\*</span> for multiplication
- <span style="color:blue">/</span> for division
- <span style="color:blue">\*\*</span> for exponent
- <span style="color:blue">%</span> for modulus (returns the remainder of a division)
- <span style="color:blue">//</span> for floor division (returns the result of the division rounded down to an integer). 

<div style="background-color: #00FF00">
    
**Exercise 3: in the cells below, try out each of the operators given above. What happens if you mix the data types of the operands in your calculation, for example add an integer to a float?**

The order of operations is as usual, i.e. multiplication and division before addition and subtraction. For operations on the same level, Python reads code from left to right, i.e. <span style="color:blue">20/5\*2</span> will give 8.0. To change the order, or make the order explicit (and hence code more readable) use round brackets, for example <span style="color:blue">20/(5\*2)</span> to give 2.0. The shorthand for scientific notation is  <span style="color:blue">1.234e5</span>, which means $1.234\times10^5$.

So far we have encountered integer and float data types - another type of literal is a string literal, which is a piece of text that does not constitute any code. You specify a string literal by surrounding it in matched single or double quotation marks, e.g:

In [None]:
"This is a string"

You can find out the data type of any literal by using the <span style="color:blue">type()</span> command. 

<div style="background-color: #00FF00">
    
**Exercise 4: run the examples in the following code cells:**

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type("1")

In [None]:
type(1+1j)

In [None]:
type([1,2])

The cells above illustrate some of the most common data types in Python.

When you are coding, it is common to make errors, for example a typo or trying to do something that Python can't do. When this happens, Python will throw up an error message. Here are two hints to help you interpret error messages:
1. Start at the bottom of the error message: it normally give a one-line summary of the problem. If you can't immediately solve it, read the text above it to find out where in your code the error occurred.
2. If the one-line summary doesn't make sense, copy and paste it into Google. Chances are many people have encountered it before and will have asked a question about it!

<div style="background-color: #00FF00">
    
**Exercise 5: In order to get used to error messages, run the following cell, and read the error message *before* correcting the code. The cell is supposed to output the number 20!**

In [None]:
10 + 1o

The other main type of mistake is more pernicious: you may write some code that is perfectly valid, but doesn't do what you actually want it to do. The only way of catching this, is by paying very careful attention to the output at all times, and checking it makes sense. We will discuss this later - however it is important to foster the habit of checking your output from the very start of your coding career.

**A final important note**: it is expected that you will cause your system to grind down to a halt when you do calculations with extremely large numbers. This also occasionally happens due to coding errors. Don't worry - this is not a problem! If your system becomes unresponsive, restart your kernel by clicking 'Kernel'  --> 'Restart' in the menu bar at the top of the page. Make sure to change any offending code cells to markdown so that they won't be executed again, click on the next code cell, and choose 'Cell' --> 'Run all above' to re-run all your coding cells so far. 

Let's move on! Experiment with operators, brackets, scientific notation, and different data types in the code cell below. From now on, the notebook will only display one code cell when it is time for you to try your coding skills; it is up to you to add as many cells as you need. It is strongly recommended you don't delete any of your code but keep as many examples in different cells as possible so both you and your demonstrator can easily refer back to what you have tried.
<p>
<div style="background-color: #00FF00">

**Exercise 6: coding has many quirks that you will get used to with practice. Try and answer the following questions for yourself whilst experimenting with arithmetic operators:**

- **Do all answers make sense?**
- **Can you use numbers as big as you like? Is there an upper limit to the exponent in scientific notation?**
- **Can you use numbers as small as you like? Is there a lower limit to the exponent?**
- **Can numbers be as precise as you like? How many zeros do you need before 1.00000000000000000001 gets truncated?**
- **What happens if you don’t balance your brackets?**
- **Can you apply arithmetic operators to strings? If so, which, and what is the result?**


<div style="background-color: #FFF8C6">

You may have noticed floats do not have unlimited accuracy. This is an important feature of computer programming, not just a bug in Python. To read more about why this happens, have a look at [this tutorial.](https://docs.python.org/3/tutorial/floatingpoint.html) 

#### Converting between data types

In the above we defined variables to be different types of data, whether that be an int, float or string. Variables however don't have to remain the same type throughout, they can be changed. For example, below we convert an integer to a float, and a string consisting of a number to an integer.

In [None]:
a = 2
b = float(a)
print(a,b)

c = '69'
d = int(c)
print(c,d)

<div style="background-color: #FFF8C6">
Can you see the difference between the value of variable c and variable d when they are printed in the example above? How can you check which variable is of which type?
<p>
    
**Exercise: try converting different types of data below. Answer the following questions:**
- **Can you turn a string of letters into a float or integer?**
- **What happens when you convert a float with non-zero decimal points into an integer? Make sure to try different values of decimals!**
- **Can you turn a complex number into a string, float, or integer?**

## 4. Variables and keywords<a id="variables"></a>

The <span style="color:blue">=</span> operator (sometimes called the assignment operator) allows you to store data in a *variable*. Variables are ubiquitous in computer programming, and are much like variables in maths. For example, if we want to create a variable x which has the value 4 we can simply write <span style="color:blue">x = 4</span>. Note that, unlike in algebra, the operator <span style="color:blue">=</span> is directional: the variable on the left of the <span style="color:blue">=</span> is always assigned the value of what is on the right of the <span style="color:blue">=</span>, not the other way around.  

<div style="background-color: #00FF00">
    
**Exercise 7: run the cells below and see what happens:**

In [None]:
x = 4
print(x)

In [None]:
x + 2

In [None]:
y = x + 2
print(y)

Note that in the code above we used the <span style="color:blue">print()</span> command to print the value of the variables to screen. In Python, you can also simply type the name of a variable to do this. However, this should be used with caution as this only works properly when a cell has only one output. 

<div style="background-color: #00FF00">
    
**Exercise 8: to illustrate this, try running the  code cells below and pay careful attention to the output of each cell.**

In [None]:
a = 1
b = 2
a

In [None]:
a
b

In [None]:
print(a)
print(b)

So far, we have used single letters to name variables. This is usually not good practice - if you simply assigned every variable a letter of the alphabet, it would be very hard to decipher your code at a later date and understand what each variable stands for. It is therefore important to chose your variable names carefully. 
<p>
<div style="background-color: #00FF00">

**Exercise 9: below, assign variable names to data provided by literals, or to the results of computation using operators. Try and find out the answers to the following questions:**

- **Can you identify the rules that govern the possible names? Some names to try: my_glorious_variable_3, True, 1value, my favourite value, A#B...**
- **Are the values case sensitive, i.e., is 'name' the same as 'naMe'?**
- **What happens when you give the same name to two different values?**
- **What happens when you give two different names to the same value?**
- **What happens if you store the result of a calculation involving a particular variable as that very variable?**

The reason why <span style="color:blue">A#B</span> didn’t work as a name was that <span style="color:blue">#</span> is Python’s comment character, which means “Ignore everything after this character until the end of the line”. Comments are used to annotate  code to make it more human-readable, for example to describe in natural language what a complicated line of code does, to make it easier to understand.

The reason why <span style="color:blue">True</span> didn’t work as a variable name is because it is a Python keyword, one of the few words that has a special meaning to the language. A list of keywords is:
```python 
and        def       for      is      return
as         del       from     lambda  try
assert     elif      global   not     while
break      else      if       or      with
class      except    import   pass    yield
continue   finally   in       raise
```

Note that this list changes with different versions of Python. Jupyter Notebooks helpfully change the colour of a keyword to bold green, so you will immediately notice it if you use a keyword inadvertently. We will cover some of these keywords in this course, but not all of them.

# 5. NumPy, SciPy and arrays<a id="arrays"></a>

Everything we have covered so far has been part of the core Python programming language. However, the core Python programming language does not include many mathematical functions that you might expect to use. So, for example, if you needed to use trigonometric functions such as sin, cos, etc, you would have to write your own code using to implement these. If you wanted to do numerical integration you would have to write the code. If you wanted to display results as plots, you would have to write (quite a lot of) code to do it, and so on. This would be tiresome and very time consuming. Fortunately, there are libraries of code that provide for most of these common requirements, and much more!

NumPy and SciPy are large collections of open-source libraries and tools brought together to give a powerful high-level environment for mathematical and scientific computing. NumPy provides functions for basic mathematical operations (sin, cos, tan, etc.) as well as functions which handle arrays, vectors, matrices and operations upon them. SciPy provides more specialised functions that are useful for scientific programmings, such as special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others.

In this session we will use functions from NumPy. To be able to use it, you first need to import the package as follows: 

In [1]:
import numpy as np

Generally you will need to include this line at the top of your code or notebook (and make sure to run the cell). Be careful: when you reopen your notebook at a later time to continue or review your work, you will have to run the cell above again to be able to use NumPy's functions again. A good way of resuming work is to select the "Run All Above" option from the Cell menu, to ensure all previous cells have been executed before you carry on.  

You can now use all of NumPy's routines by calling them by their name preceded by <span style="color:blue">np.</span> - for example to create an array A comprising the numbers 10 to 100 in steps of 10:

In [None]:
A = np.array([10,20,30,40,50,60,70,80,90,100])

As you can see, an array is a series of objects of the same type (integers in the example above). Each individual object in the array is called an element. You can access individual elements of an array by specifying the index of the element in the array within square brackets. For example, the cell below first prints the entire array A and subsequently only the element with index 1:

In [None]:
print(A)
print(A[1])

When you run the above cell, the second output line might not be what you expected! This is because indices start from 0, so the first element has index 0, the second element index 1, and so forth. You can access any selection of elements from an array using indices; this is called slicing. 

<div style="background-color: #00FF00">
    
**Exercise 10: look at the following list - can you predict what the result will be before running these statements? Pay careful attention to which elements are included in the slices.**
- <span style="color:blue">A[9]</span>
- <span style="color:blue">A[10]</span>
- <span style="color:blue">A[-1]</span>
- <span style="color:blue">A[1:3]</span>
- <span style="color:blue">A[1:]</span>
- <span style="color:blue">A[:5]</span>
- <span style="color:blue">A[0:6:2]</span>
- <span style="color:blue">A[::2]</span>

**Slicing is a very important concept in programming, so take some time experimenting with this in the cell below (again, add as many cells as you like in the notebook). Can you figure out the rules of slicing? Challenge: use slice notation to reverse array A in one line.**

You can even create 2D (or higher-dimensional) arrays, by nesting several 1D arrays within one array: one for each row. This is a very useful data structure - it can for example represent data tables or even images. Run the following cell to see an example:

In [None]:
twoDarray=np.array([[1,2],[10,20],[100,200]])
print(twoDarray)

We can now access individual cells by taking the slice [row_index,column_index]. For example, the command below will print the element that is in the third row and second column of the 2D array above:

In [None]:
print(twoDarray[2,1])

<div style="background-color: #FFF8C6">
Arrays can be extended to an arbitrary number of dimensions. Implementing them in the above manner however would become very tedious, especially  trying to keep track of all of the square brackets you would need. Numpy has the np.zeros command, that allows you generate an arbitrary array of zeros in any dimensions:

```python
Arbitrary_array=np.zeros((dim1,dim2,dim3,.......,dimn))
```

Note the double set of brackets within the zeros, this is because for the zeros function, it is the first argument that sets the number of dimensions. 

<div style="background-color: #FFF8C6">
**Exercise: experiment with taking slices out of a 2D array - these can be 2D arrays, 1D arrays, or single elements. Can you take it a step further and create a 3D array (i.e. a data cube) and take slices from it?**

The <span style="color:blue">np.array()</span> method works fine for short arrays, or to input a small number of measurement data points by hand, but would become tedious for creating longer arrays. A quicker way of defining arrays with a fixed increment is using <span style="color:blue">np.arange()</span> or <span style="color:blue">np.linspace()</span>. 

<div style="background-color: #00FF00">
    
**Exercise 11: use Google or the help documentation on these two functions (by running e.g. <span style="color:blue">help(np.arange)</span>) to find out how they work. Subsequently create an array which includes the numbers 0 - 100 (make sure to store your array in a variable). Next, try and create an array consisting of the numbers 100 - 200 in steps of 0.01. What is the difference between using <span style="color:blue">np.arange()</span> and <span style="color:blue">np.linspace()</span>?**

Arrays are very useful data structures, which we will use throughout this course. When you analyse the data you take in lab with Python, you should always store your data in arrays. One reason arrays are so powerful within Python is because you can do arithmetic operations on them. For example, to multiply every element of array A by two, simply run the following:

In [None]:
2*A

You can perform element-wise multiplication on two arrays, for example:

In [None]:
A*A

## 6. Statistics with arrays<a id="stats"></a>

Here is a set of values obtained for the wave length of a sound wave.

* 0.76 m
* 0.79 m
* 0.84 m
* 0.75 m
* 0.80 m
* 0.79 m

We can now calculate the mean of these data with a simple set of commands:

In [None]:
x=np.array([0.76,0.79,0.84,0.75,0.80,0.79])
mean_value=np.mean(x)
print('The mean value is:', mean_value, 'm')

Similarly, we can calculate the standard deviation of the sample by calling the function <span style="color:blue">np.std()</span>. This function calculates the sample standard deviation $s$ using the following formula:

$$s^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i-\overline{x})^2$$ 

Here $x_i$ are the individual data points, $\overline{x}$ is the mean value of the data set, and $n$ is the number of data points.

<div style="background-color: #00FF00">
    
**Exercise 13: below, calculate the standard deviation of our data set using the <span style="color:blue">std()</span> function which is in the NumPy package - i.e. call the function by typing <span style="color:blue">np.std()</span>. Once again, use the inbuilt help or Google to help you on your way. Help files are always a long read -  in this case focus on the keyword <span style="color:blue">ddof</span> to ensure you use the *sample* standard deviation. You will need to set this to <span style="color:blue">ddof=1</span> when you call the <span style="color:blue">np.std()</span> function. This ensures the denominator in the equation above is set to $n-1$ rather than $n$.**

## 7. Reading and writing files<a id="files"></a>

In the previous section you have seen that Python is a powerful tool for statistics, once you have stored your measurement data in arrays - particularly because you can execute the same block of code to calculate the mean, standard deviation, and standard error of the mean on any data set you take in your different labs. However, you won't usually *record* your data directly in Python. Normally, you will save your data in a file, which you can read in using your Python code.

There are numerous ways in Python to read in data from files, each with their own pros and cons. One method is to use the <span style="color:blue">loadtxt()</span> function, which is included in the NumPy package. It offers a straightforward way of reading in data sets which consist of columns of measurement points. 
<div style="background-color: #00FF00">
    
**Exercise 15: to try this out, open the file [Resistivity.txt]( 
https://cclewley.github.io/ComputingYr1/Data1/Resistivity.txt
) . This file (as well as all other data files) are stored in the 'Data' folder that you have downloaded from Blackboard for this session. Click on the link to inspect this data file within your browser: you will see it includes 3 columns, separated by spaces. The first column is temperature, the second the measured resistivity of copper, and the third that of aluminium. You may recognise these from your Measurements & Uncertainties tutorial if you have already done this! To read in this data file and print the data, run the following cell:**

In [None]:
data=np.loadtxt("Data/Resistivity.txt")
print(data)

Note that to read the file, we have to include the subfolder name in the file name, i.e. <span style="color:blue">"Data/Resistivity.txt"</span>. If we do not include the subfolder name, the computer would look for the file in the same folder as where the script for this code is stored (i.e. this Jupyter Notebook). Since usually you would have your data stored in a different place than your programming scripts, it is good to get used to specifiying folders (or 'path names') right from the start. <p>
You will see that the data has been read into a 2D array. If we wish, we can now slice the 2D array to create three 1D arrays, each representing a separate physical quantity (T for temperature, R_Cu for the copper resistivity, and R_Al for the aluminum resistivity):

In [None]:
T = data[:,0]
R_Cu = data[:,1]
R_Al = data[:,2]
print('Temperature:', T)
print('Copper resistivity', R_Cu)
print('Aluminium resistivity', R_Al)

<div style="background-color: #FFF8C6">
The <span style="color:blue">loadtxt()</span> function incorporates more features: it can also skip header rows and deal with different types of delimiters (how the data is separated). As an example, look at the file [Resistivity.csv](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.csv), which is a csv (comma separated variable) file of the same data. You will need to download this file and open it in Excel to be able to inspect it. Note that now the data is delimited by commas, and includes two header rows. Now run the cell below:

In [None]:
T, R_Cu, R_Al = np.loadtxt("Data/Resistivity.csv", skiprows=2, delimiter=',', unpack=True)
print('Temperature:', T)
print('Copper resistivity', R_Cu)
print('Aluminium resistivity', R_Al)

<div style="background-color: #FFF8C6">
We have changed three things: we use the <span style="color:blue">skiprows</span> keyword to skip the first two header rows, we use the <span style="color:blue">delimiter</span> keyword to set the delimiter to comma (a tab would be <span style="color:blue">delimiter='\t'</span>), and we have used the <span style="color:blue">unpack</span> keyword to store each column of the data in a separate array (T, R_Cu, and R_Al in this case).

**Exercise: try reading [Resistivity.txt](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.txt) with the keyword <span style="color:blue">delimiter=','</span> and see what happens. Next try it with  <span style="color:blue">delimiter=' '</span>. Finally, try reading [Resistivity.csv](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.csv) without the <span style="color:blue">skiprows</span> keyword. Open up the .csv file in a text editor such as WordPad and look at how the data are stored.**

When reading in more complex data, the function <span style="color:blue">genfromtxt()</span> in the NumPy package can be more appropriate. It incorporates more flexibility, such as dealing with missing data.

Conversely, at some point you will want to write data to an output file. To do this, we can use the <span style="color:blue">savetxt()</span> function included in NumPy. Below is a simple example of how to use this. 

<div style="background-color: #00FF00">
    
**Exercise 16: run the cell below and have a look at the resulting file.**

**Tip: to do this, you need to make an 'Output' folder in the folder where this worksheet is located, and click on the newly created file to see a preview in your browser window.**

In [None]:
np.savetxt('Output/Test_outputfile1.txt',data)

Note that this time we specified the output file to be saved to the subfolder 'Output', which is a pre-existing folder. It is good practice to keep your files organised in a sensible folder structure, instead of dumping all output in the same folder as your script.

<div style="background-color: #FFF8C6">
Note that <span style="color:blue">savetxt()</span> only takes one argument for the data to be printed, so if you want to print various 1D arrays (instead of the single 2D array in the example above), you need to combine them into a single 2D array. To do this successfully, you will need to use the NumPy <span style="color:blue">column_stack()</span> function. 

**Exercise: the two examples below illustrate this - extend the code to print out the two arrays and have a look at the result.**

In [None]:
combined_1 = [T,R_Cu,R_Al]
combined_2 = np.column_stack([T,R_Cu,R_Al])

<div style="background-color: #00FF00">
    
**Exercise 17: in the cell below, find the mean of the resistivity of copper and subtract it from the copper resistivity data set, so you are left with the residuals. Do the same for aluminium. Now save your new data set (which includes temperature, copper resistivity residuals, and aluminium resistivity residuals) to a file. Make sure to choose a sensible name for your output file! Use the <span style="color:blue">help()</span> function to find out what arguments and keywords the <span style="color:blue">savetxt()</span> function takes. Can you include a header line with the column names and space the columns with tabs?**

## 8. Plotting data<a id="plotting"></a>

When you take measurements in lab, you will want to create a graph of your data. Python has a package that specialises in plotting: <span style="color:blue">matplotlib</span>. To use the plotting routines of this package, we only need to import the <span style="color:blue">pyplot</span> part of the matplotlib package. 

In [None]:
import matplotlib.pyplot as plt

Within Jupyter Notebooks the plots are created within the notebook itself. In most other environments (such as the Spyder IDE which you will use in Session 3) plots will be created in a separate window.

We can now use the <span style="color:blue">plot()</span> function to create a plot of our resistivity data:

In [None]:
plt.plot(T,R_Cu)
plt.show()

In essence the first line in the cell above creates a plot object, and the second line shows the plot (akin to creating a variable and printing it to screen with the the <span style="color:blue">print()</span> command). To be able to use a figure in a report, you will want to save it as an image or pdf file. To do this, instead of the <span style="color:blue">plt.show()</span> command, use the <span style="color:blue">plt.savefig()</span> command.

Looking at the plot above, you may guess that the resistivity increases linearly with time, but there is some scatter in the data. A line plot is therefore not the most suitable graph; instead we want to use a scatter plot. In fact, even if the data did follow a perfect line, we would still want to plot the data points themselves as well - otherwise we would not be able to tell if the graph is the result of two datapoints or two hundred! We can do that by specifying a plotting symbol as a third argument in the <span style="color:blue">plot()</span> function. Below we plot the resistivity for both copper and aluminium using different plotting symbols. We also save the image as a png file in the subfolder 'Output' (check this for yourself!).

In [None]:
plt.plot(T,R_Cu,'x')
plt.plot(T,R_Al,'+')
plt.savefig("Output/Resistivity_plot.png")
plt.show()

**Important note:** Make sure you put the <span style="color:blue">plt.show()</span> command *after* the <span style="color:blue">plt.savefig()</span> command. If you save the figure after showing it on screen, it will create an empty file!

<div style="background-color: #FFF8C6">
There are many more features you can change if you have time to research them. For example, you may have noticed that the 'tick marks' on the x-axis (i.e. where the vertical grid lines are placed) in the example plot are different from the default values. 
<p>
    
**Exercise: can you find out how to change your tick marks accordingly? Another thing to try is changing the font to Arial using the <span style="color:blue">rcParams</span> data structure.**

In [None]:
plt.style.use("default")

<div style="background-color: #FFF8C6">
The <span style="color:blue">plt.style.use("default")</span> function loads the default style sheet. There are other pre-defined style sheets that change the look of your plots. To list all available style sheets, run: 

```python
print(plt.style.available)
```

**Exercise: experiment with a few different style sheets. You can even create your own style sheet and load it in this way!**

## 10. Coding practices<a id="codingpractices"></a>

In the previous sections we have written brief snippets of Python code to achieve a specific task, i.e. defining a variable or data structure and printing out the values of these, or creating a plot from existing data. However from the next session we will start to write longer pieces of code, and there are some best practices that should be used to keep your code and results legible.

#### Comments

Comments use the following syntax:

```python
# This is a python comment
```

Any line that begins with the <span style="color:blue">#</span> symbol is not executed as a piece of python code and is purely for human benefit. **You should *always* comment your code extensively**. Comments don't have to be incredibly detailed but should give a brief description of what you are doing. Consider a snippet of code that calculates the hypotenuse of a right-angled triangle:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle, given the height and width
a = 3 #Height in cm
b = 4 #Width in cm
c = np.sqrt(a**2 + b**2)
print(c) #Print the result

The comments are short but clearly describe what is happening in the code. As your work becomes more complicated, the use of comments will save a lot of hassle in the future. When you write full programmes, you may want to include comments at the top with your name and data, to act as a fossil record of when work was implemented, and what the purpose is of your programme.


#### Print statements and formatting

When printing variables to the screen thus far, we  sometimes only printed the variables themselves, with no further information about what they are respresenting physically. Like the idea of comments above, adding extra text will make it easier to keep track of what is going on when you return to work at a later date. Consider the following example:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle,given the height and width
a=3 #Height in cm
b=4 #Width in cm
c=np.sqrt(a**2+b**2)
print(c) #Just printing the number
#Isn't the output of the statement below much more useful?
print("The hypotenuse of a triangle with width ", a, " cm and height ", b, " cm is ", c, " cm ") 

Looking at the above outputs, which one is more descriptive and will be useful when you return to the code later on? Note the use of comments to help you understand what the code is trying to achieve. Also take note of including units in your print statement, this will help to improve legibility.

<div style="background-color: #FFF8C6">
Having control of how you present your results is also very important. As scientists we should only quote results to the relevant number of significant figures. Look at the following code:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle,given the height and width
a=1 #Height in cm
b=2 #Width in cm
c=np.sqrt(a**2+b**2)
print("The hypotenuse of a triangle with width ", a, "cm and height ", b, " cm is ", c," cm ")

<div style="background-color: #FFF8C6">
Here we have taken the the above hypotenuse code but this time given it inputs that do not give an integer output. As scientists when we take measurements, we are unlikely to be able to quote to the precision that Python gives us by default. When presenting our work we therefore need to put in an appropriate format that reflects our confidence in the result. To do this, when we print our results to an output, we can use format statements. Run the code cell below and look at the output of the revised print statement.

In [None]:
print("The hypotenuse of a triangle with width %d cm and height %d cm is %.2f cm" % (a,b,c))

<div style="background-color: #FFF8C6">
This method of creating a print statement may look more complicated, but it gives you greater control on your outputs. Let's break down what is used in this statement. Firstly, in between the print brackets we have replaced:
```python 
, a,
, b,
, c,
``` 
with the commands:
```python
"%d"
"%d"
"%.2f"
```
within the string itself. The % symbol here acts as a place holder: it reads as "insert variable value here". The 'd' means that the variable should be printed as an integer. The '.2f statement means that the variable should be printed as a float with two decimal point.

Now we need to specify which variables should be printed. To do this, we add the command:

```python
% (a,b,c)
```

after the string. Note that the variables need to be given in the order of which they should appear in the text.

**Exercise: look at the code below that calculates the area and volume for a sphere for a variable radius which is in units of meters. Interpret this code, and make it code legible by including comments, string outputs and format statements.**

In [None]:
r=1
a=4*np.pi*r**2
v=4*np.pi/3*r**3
print(r,a,v)

<div style="background-color: #FFF8C6">
    
#### Formatting file output
Open the file you created in Exercise 17. You may notice that the formatting of your file may not be exactly to your liking. For example, the header names might not line up with your data columns, and the data may have too many decimal places. All this can be fixed by changing the formatting of the output that is written to file. For example, we can force the number of decimal places in the three columns of data by setting the following keyword in the <span style="color:blue">savetxt()</span> function: <span style="color:blue">fmt="%.d %.2e %.2e"</span>.
This ensures the first column is printed as an integer, and the other two columns are printed using scientific notation with two decimal places. If you have time, experiment with changing the format of the data in your output file. 

For a more comprehensive guide on formatting output, have a look at [this tutorial](https://www.python-course.eu/python3_formatted_output.php). Knowing how to format Python output appropriately will come in very handy in the future.

# Non-Array Data Structures

Previously, we restricted ourselves to working with Numpy array data structures. As physicists this is natural; arrays are a n-dimensional structure that can hold integers or floats and allow statistics to be performed on them. Python however offered multiple types of data structures that each serve an unique and useful purpose when programming.

### Lists

Lists in Python are defined using square brackets surrounding zero or more comma separated literals: 
```python
some_primes = [2,3,5,7,11,13]
names_of_cats = ["Ginger", "Princess", "Zorxo the Clawful"]
```

Lists don't even have to be of the same type:

```python
Mixed_list = [2,"Python",16.5]
```

is an allowed list. Moreover lists don't need to contain any elements, they can be initalised as an empty list:

```python
Empty_list=[]
```
This is particularly useful when you will be adding elements to the list as your code progresses. To add elements you can use the <span style="color:blue">.append</span> method to a list:

```python
Empty_list.append(2)
```


### Tuples
Tuples behave very similarly to lists, but are immutable (i.e. they cannot be changed). Tuple literals are created by a writing a sequence of items separated by commas, optionally surrounded by parentheses. To get a tuple with only one element, you need to have a comma after the element.<br>
```python
my_tuple = 1,2,3
my_tuple = (1,2,3)        # equivalent
not_a_tuple = 1           # same as: not_a_tuple=1
a_tuple = 1,
a_tuple = ("first!",)     # here the first and only element of the tuple is "first!".
```

Many aspects of Python are implicit tuples. For instance, the assignment operator = will happily assign tuples of names to tuples of values:<br>
```python
A,B,C = 1,2,3
```
which is the same as:
```python
(A,B,C) = (1,2,3)
```
which is the same as:
```python
A = 1
B = 2
C = 3
```

This behaviour can be easily used to swap the names of data:<br>
```python
A,B = 1,2
A,B = B,A
print(A,B)   # prints 2,1
```

### Dictionaries
The third most common collection type used in Python is the Dictionary, or dict, which store mappings from keys to values. For every key, there is a value, which can be almost any Python object. Keys are usually strings, but it is possible to use certain other objects as keys. Dictionary literals are written as a comma-separated list of key:value pairs, with a colon separating key from value, surrounded by (curly) braces. Dict items are accessed using the key within square brackets.<br>
```python
student_grades = {"Simon": 60, "Jenny":68, "Laura":112}
student_grades["Laura"] = 100 # Change Laura's grade.
student_grades["Pug"] = 58    # New student!
print(student_grades["Jenny"])
68
```
<br>

### When should you use each data type?

An important question to ask yourself when first thinking about a problem is what data structure is the best for you to use? There is no universal answer for us and will really depend on the situation. When to use a dictionary is pretty straightforward, they are used when you want to link one set of values to another. A phonebook which links a person name to their number is a good use of a dictionary.

When to use a list or tuple can be harder to understand. The key difference between between lists and tuple is that tuples are immutable, i.e once created they cannot be altered. Therefore a sequence of tuples can be used if you don't expect the sequence (or indeed) want them to change. However if you want to add or remove elements from a sequence during your code execution then lists are the data structure to go with.

For most physics applications, we will be dealing with 2D, 3D or even higher dimensionality data that we need to operate on. This is best achieved using arrays contained within the numpy module.

# Converting data types and data structures

It was shown previously that variables can be converted from one data type to another. The exact same can be done with data structures, whether it is to change the data types of the structure, or change the actual structure itself. The first thing to consider is: What is my data structure type and what do I want it to be? For physicists the most common change will be to go between a list and an array. To find the type of a data structure, use the <span style="color:blue">type</span> command.

In [2]:
import numpy as np
Array=np.array([1,2,3,4])
print(type(Array))
List=[1,2,3,4]
print(type(List))

<class 'numpy.ndarray'>
<class 'list'>


Knowing the data structure is crucial to ensuring you handle your data in the most efficient way. For example the Numpy library is optimised to run operations on arrays. We can convert a list to an array but calling the <span style="color:blue">np.array</span> function and passing it the list as an argument.

In [3]:
Array_of_list=np.array(List)
print(Array_of_list,type(Array_of_list))

[1 2 3 4] <class 'numpy.ndarray'>


To go the other way, we use the <span style="color:blue">np.ndarray.tolist</span> function

In [4]:
List_of_array=np.ndarray.tolist(Array)
print(List_of_array,type(List_of_array))

[1, 2, 3, 4] <class 'list'>


# Advanced Data Structures - The Pandas DataFrame

## Opening and examing a file with Pandas

The first thing to is to import the pandas library using:

```python
import pandas as pd
```

To open a file with pandas, the most common function is the read_csv command where csv stands for comma separated variable. To open the resistivity data used previously, we use:

```python
df=pd.read_csv('Data/Resistivity2.csv')
```

(Note that this is a slightly different .csv file)
<div style="background-color: #00FF00">
    
**Exercise: open the Resistivity2.csv data contained with the name df using the pd.read_csv command. Use the type command and print what data structure has been generated, and then print the data itself. What does this return?**

In some cases, the data you are reading in might not be in a comma separated variable format, there may be whitespaces or semicolons. To ensure that the data is read in properly you will need to pass the delim_whitespace keyword to tell the function how the data is separated. Look at the read_csv documentation to learn about the other options available.

## The DataFrame

**(Note: for students taking the MRes in Machine Learning and Big Data in the Physical Sciences, Pandas will be covered on the course in week X.)**

The data structure that has been generated by the read_csv is called a DataFrame, this is at the heart of how Pandas stores and manipulates data. A DataFrame is a 2D data structure that is composed of the following components:

1) The data

2) The index

3) The columns

Looking as the DataFrame printed above the data should be obvious, it is all of the numbers that was contained in the Resistivity2.csv file. The index, which is the row number of the DataFrame, defines all of the instances when data was taken. In this case data was taken every 20 K between 200-360 K. The columns contain the data that was taken at each index, which for this data was the resistivity of copper and aluminium.

The top of the DataFrame contains what was in the top of the file that was loaded in, in this instance it was the labels of the resistivity data. These are known as headers, and will allow you to access your data without needing to use indices. You should be careful when loading in data that it has header data, or else Pandas will place your first data row into that slot. 

The first column of the DataFrame contains the numbers 0-9. These are the index labels that can be used to access rows of data. These indices were generated automatically because we did not tell it what to use. Looking at our data however, we can see a more convenient index to use: The temperature of our data! To set the indices of our DataFrame to be the temperature, we can use the index_col keyword argument when reading in our data or use the set_index function on our DataFrame. Look up these two methods to understand.

<div style="background-color: #00FF00">
    
**Exercise: using the set_index function, make a new DataFrame df2 that has the temperature column as the index. Then, create a new DataFrame df3 that sets the index column to temperature during read in. Print these two DataFrames to look at their structure.**

For larger data sets, it can be disadvantageous to print the whole DataFrame to the screen. We can look at only parts of the data using the .head() and .tail() functions on the DataFrames. Try it yourself, if you wish.

We sometimes don't need to look at the information in the DataFrame itself, we only want a top level summary of the data. This is achieved using the DataFrame.info() command.

### Creating a DataFrame
Often, we are dealing with data that is not in a format that can immediately be turned into a DataFrame as it may be missing headers or an index. It is then down to the user to create the necessary information to turn the data into a DataFrame compatible format. To do this we use the pd.DataFrame function. For example:

In [15]:
import pandas as pd
import numpy as sp

d=[[2,3,'e',5],
   [4,3,'f',5],
   [5,3,'g',4]]
Headers=['a','b','c','d']
df=pd.DataFrame(data=d,columns=Headers)
print(df)

   a  b  c  d
0  2  3  e  5
1  4  3  f  5
2  5  3  g  4


Notice that one of the column values are strings and not numbers; this is one of the big advantages of using a DataFrame to store your data.

## Plotting with Pandas

Being able to store heterogeneous data in a single data structure is useful, but the real power of Pandas comes when it comes to plotting using a DataFrame. This is accomplished by the following line of code:

```python
df['column name'].plot()
```

Alternatively you can use:

```python
df.plot('x column name','y column name')
```
Note how we only reference the name of the column, we don't need to know its index. For the first method we didn't set an x-axis; with that plotting nomenclature Pandas will use whatever the index is as an x-axis. Let's look at these methods of plotting:

In [None]:
import matplotlib.pyplot as plt # this needs to be imported to show the plot

df=pd.read_csv('Data/Resistivity2.csv')
df['Resistivity Cu (ohm/m)'].plot()
df.plot('Temperature (K)','Resistivity Cu (ohm/m)')
plt.show()

In the second case we had to use the DataFrame df, which didn't have temperature set as the index. In the first plotting style it has taken the DataFrame index, in this case temperature, and has set it to be the x-axis. Plotting data without having to reference column numbers is more intuitive and will make your code easier to understand.

**It however relies on the programmer to be careful when making the column names into something sensible, so take care !!**

On a more general note, we can access columns by querying their column name, for example:

In [None]:
print(df['Temperature (K)'])

## Filtering DataFrames

So far we have made use of the whole DataFrame. A powerful feature of the DataFrame is when you have a large amount of data and want to analyse only a small subset of it based on a condition that is already within the data. As an example, we can filter the above DataFrame df to only contain temperatures below 300 eV:

In [None]:
df3=df[df['Temperature (K)'] < 300]
print(df3)

Notice that compared to the original DataFrame, we have 4 fewer values. This becomes powerful when we have a large 2D DataFrame that contains lots of data that are grouped by the value contained in a certain column. Look at the following DataFrame:

In [None]:
car_df=pd.read_csv('Data/Car_Data.csv')
print(car_df.info())
print('')
print(car_df.head())

This DataFrame contains information about various aspects of 392 cars. There is a lot of information here, however we will focus only focus on a few columns to illustrate some key features of DataFrames. Let's plot the weight of the car versus the mpg, the miles per gallon it can achieve.

In [None]:
car_df.plot.scatter(x='weight',y='mpg')
plt.show()

Note that we have used the scatter plot function and not just the plot function. This is because the DataFrame is not sorted in ascending weight. Looking at the plot, we can see a clear trend that is not suprising: heavier cars get worse mileage. Looking at the summary of the DataFrame, we see that there is an origin column that says where the car was made. To see all the different values contained in the origin column, we use the following command:

In [None]:
print(car_df.origin.unique())

Note that using the command:

```python
print(car_df.mpg.unique())
```

would be a bad idea as there are lots of different values of mpg; we can see this from the graph we plotted! Just as we filtered our resistivity DataFrame based on temperature, we can filter our car data based on region. The syntax for this is a little different from above, and looks like:

```python
New_df=Old_df[old_df['column_name'].str.contains('condition')]
```

So to extract the US car data from the DataFrame, we would use:

```python
car_df_US=car_df[car_df['origin'].str.contains('US')]
```


<div style="background-color: #00FF00">

**Exercise: extract the data for each origin into a separate DataFrame, like the code snippet above. Then plot the weight vs mpg of each car region on the same graph in different colours to answer the question: which region produces cars with the worst mpg?**

**Hint: To get each scatter plot on a common axis, you will need to use the ax keyword argument. Consult Google, the Pandas documentation, and Stack Overflow about how to go about this!**

## Index

1. Intended learning outcomes
2. Histograms
3. Plotting error bars
4. Line fitting
5. Optional: other fitting models

## 2. Histograms<a id="histograms"></a>
In this section you will practise the array statistics you learnt last session, as well as learn to create histograms. Histograms are frequently used when displaying a set of repeated measurements, and are therefore an incredible useful tool to use for your labs.

With the matplotlib pyplot package it is straightforward to create a histogram of a data array. For the following example, we use the "Dataset.txt" file, which you can find in the 'Data' folder. This data file contains 20 measurements of the speed of light (in units of $10^8\mathrm{ms}^{-1}$). The code below creates a histogram of the data. Inspect the code and subsequently run the code cell.

In [None]:
data = np.loadtxt('Data/Dataset.txt')# read the data from file
plt.ylabel("Number of measurements")# set the y-label
plt.xlabel("Speed (m/s)")# set the x-label
plt.hist(data)# create a histogram of the data
plt.show()# show the plot

We can use Python to make a new data sample without this data point using the NumPy <span style="color:blue">delete()</span> function (remember that the elements in the array start counting at 0). Note however that we do not 'delete' the data point from our data set altogether - we still record the outlier and keep it in our data file (it would be terribly bad practice to simply altogether discard data that doesn't match our expectations!). We use the <span style="color:blue">delete()</span> function to create a new array without the outlier, so we can do further statistics on this sample.

In [None]:
clean_data=np.delete(data,5)
clean_data

<div style="background-color: #00FF00">
    
**Display a histogram of the new data sample.**

<div style="background-color: #FFF8C6">

We often would like to compare two datasets or plots visually; for this purpose it is helpful to plot two (or more) graphs side by side. This can be done using the <span style="color:blue">plt.subplot()</span> function to present them in a grid structure that contains $n \times m$ plots, where $n$ is the number of rows and $m$ is the number of columns. You call the <span style="color:blue">plt.subplot()</span> function like this:

```python
plt.subplot(n,m,k)
```

where n is the number of rows you want, m is the number of columns and k is the number of the plot you are creating at this moment in time.

**Exercise: Use the <span style="color:blue">plt.subplot()</span> function to plot the two histograms you created above side by side.**

## 3. Plotting error bars<a id="errorbars"></a>
So far we have learnt to plot our data using linear plots, scatter plots, and histograms. However, normally all data we take will have errors associated with it. Our plots should include these errors in the form of error bars. Fortunately this is straightforward with matplotlib as it includes the function <span style="color:blue">errorbar()</span> which creates a plot with error bars for us. 

<div style="background-color: #00FF00">
    
**Exercise 4: have a look at the help for the <span style="color:blue">errorbar()</span> function to see which input arguments it takes. Pay particular attention to the keywords <span style="color:blue">yerr</span>, which takes an array that includes the y-error bars, and <span style="color:blue">fmt</span>, with which you specify the plotting symbol.**

**Exercise 5: load the resistivity data we used last session ("Resistivity.txt" - included in this session's 'Data' folder), create an array containing 5% errors on the resistivity data, and plot a scatter plot of the data including error bars.**

Note: the exercise above includes little guidance - this is to help you get used to finding programming solutions to problems you have not encountered before. Start by looking through the help file, and possibly googling example uses of the function. Also feel free to discuss with your fellow students! If you still find you have trouble getting started (don't worry - this is normal for novice coders), skip ahead to the blue box below and after showing your previous results ask your demonstrator to help you on your way.

<div style="background-color: #FFF8C6">
You can customise many features on your error bar plot. For example, can you find out how to put caps on your error bars (so they are displayed as **T** rather than **I**)? Also, your independent variable may have error bars too, which you would need to add to your plot. 

**Exercise: add fixed temperature error bars of 2K to your plot. Check the <span style="color:blue">errorbar()</span> documentation for this!**

## 4. Line fitting Speak to Prof Colling<a id="linefitting"></a>
Previously, we noticed that the resistivity data appears to show a linear relationship with temperature. In order to find this relationship, we want to fit a straight line to our data. We will do this by using the Numpy routine <span style="color:blue">polyfit()</span>. This routine takes as arguments an array of the x values of the data, an array of the y values, and the *order* of the polynomial (what power of x) - in this case 1 for a straight line. We can then tell it to weight each data point by the inverse of its error with the <span style="color:blue">w</span> keyword, and ask it to return the uncertainty of the fit parameters by setting <span style="color:blue">cov=True</span>. Look at the documentation to see all of the available keyword arguments.

The routine returns an array which contains the best fit values for the coefficients of the polynomial ($P[0]$ and $P[1]$) 

$$ f(x) = P[1] + P[0] x $$

and a *covariance matrix* which contains the information on the uncertainties on the fit parameters *i.e.* how well we have measured the slope ($P[0]$) and intercept ($P[1]$).

$$ \left( 
\begin{array}
\ C_{00} & C_{10} \\
 C_{01} & C_{11} 
\end{array}\right)$$

We are interested in the diagonal elements of this matrix, where for instance the uncertainty on fit parameter P[1] (the intercept) :

$$\sigma_{P[1]}=\sqrt{C_{11}}$$

The other two elements of the covariance matrix describe the covariance between the two different parameters, which is something we do not need to use for our error analysis.

The code below returns the linear fit to the Aluminium resistivity data. Carefully look through this code and make sure you understand it.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

T,R_Cu,R_Al = np.loadtxt('Data/Resistivity.txt',unpack=True)# Read in the data
errors_Al = 0.05*R_Al# Calculate 5% errors
errors_Cu = 0.05*R_Cu

# The line below stores the fit coefficients in the fit_Al variable, and the covariance matrix in the cov_Al variable.
# Note that the input arguments for polyfit() below are:
# (1) the independent variable (T)
# (2) the dependent variable (R_Al)
# (3) the order of the polynomial to be fitted (1)
# (4) the weights of each data point (w = 1/errors_Al)
# (5) whether or not to return the covariance matrix (cov = True)
fit_Al,cov_Al = np.polyfit(T,R_Al,1,w=1/errors_Al,cov=True)
print('Aluminium fit coefficients')
print(fit_Al)
print('covariance matrix')
print(cov_Al)

sig_0 = np.sqrt(cov_Al[0,0]) #The uncertainty in the slope
sig_1 = np.sqrt(cov_Al[1,1]) #The uncertainty in the intercept

print('Slope = %.3e +/- %.3e' %(fit_Al[0],sig_0))# Note the %.3e forces the values to be printed in scientific notation with 3 decimal places.
print('Intercept = %.3e +/- %.3e' %(fit_Al[1],sig_1))

We can now use the convenient <span style="color:blue">poly1d()</span> routine from the NumPy package, which takes the fit parameters returned by <span style="color:blue">polyfit()</span> and returns a function which calculates the corresponding fit values at any given point. We then plot the linear fit on top of our data. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Calculate the fit 
pAl=np.poly1d(fit_Al)
print('Aluminium polynomial')
print(pAl)

# Create the original data figure with error bars
plt.grid()
plt.xlabel("Temperature (K)") 
plt.ylabel("Resistivity (Ohm m)") 
plt.title("Resistivity Plot") 
plt.errorbar(T,R_Cu,yerr=errors_Cu, fmt='o', mew=2, ms=3, capsize=4)
plt.errorbar(T,R_Al,yerr=errors_Al, fmt='o', mew=2, ms=3, capsize=4)
plt.legend(["Copper", "Aluminium"], loc=2 ) 
plt.xticks(np.arange(200, 400, 50))

# Overlay the linear fit
# Note that we create the y-coordinates for the fit data points by calling pAl 
# (which was the return value of poly1Dfit), with the x-coordinates stored in T as the input argument.
plt.plot(T,pAl(T))
plt.show()

The above is quite a complicated piece of code. It is therefore important to understand what exactly is going on. Answer the below questions to help you gain a better understanding - make sure to note the answers down in your Computing Notebook.

<div style="background-color: #00FF00">
    
**Exercise 6: To understand what exactly is returned by the function <span style="color:blue">polyfit()</span>, find out the data types of <span style="color:blue">fit_Al</span> and <span style="color:blue">cov_Al.</span>**

**Next, to check what the command <span style="color:blue">pAl(T)</span>  does, plot the linear fit with crosses as symbols, rather than a line. In order to do this, edit the following line in the code cell above:**
```python
plt.plot(T,pAl(T))
```
**Use the <span style="color:blue">pAl()</span> command to calculate the predicted value of the Resistivity at 250 K and 400 K. Check that your answer is sensible by inspecting the above plot of the fit together with the data.**

**Now explain the <span style="color:blue">pAl(T)</span> command to a neighbour and/or a demonstrator.**

<div style="background-color: #00FF00">
    
**Exercise 7: now we have created and plotted the linear fit to the Aluminium resistivity data, can you do the same for the Copper resistivity data? Your final output should be a scatter plot of the data including error bars and both linear fits.**

### Practical example

Let's consider an experiment where the resistance of a pair of identical resistors is to be found. In this experiment, a measurement is made of the voltage difference across the two resistors and the current running through them is also measured. The resistance of each resistor can be described by the equation:<p>
$$R= \frac{1}{2}\frac{V_1 −V_2}{I}$$,<p>
where $V_1$ and $V_2$ are the voltages at the two ends of the resistors and $I$ is the current through them.

Consider two approaches to finding the value of $R$:

1. We take one measurement of each of $V_1$, $V_2$ and $I$ and accept the equipment manufacturer’s error estimates giving the following values: $V_1 = 6.9 \pm 0.5 \rm\, V$, $V_2 = 0.7 \pm 0.1 \rm \,V$ and $I = 0.43 \pm 0.03\rm\, A$. Find a value for $R$ and its error $\sigma_R$ using the appropriate methods for combining errors. <p>

2. We take a series of measurements of $V_1$, $V_2$ and $I$ with results as given in [Resistors.csv](https://cclewley.github.io/ComputingYr1/Data2/Resistors.csv). Plot $(V_1 − V_2)$ against $I$ and use a linear fit to find $R$ and $\sigma_R$.

<div style="background-color: #00FF00">
    
**Exercise 8: use both methods to calculate the resistance $R$ and its associated error with Python. Do the two approaches give the same results?**

<div style="background-color: #FFF8C6">
    
## 5. Optional: other fitting models Speak about this<a id="nonlinear"></a>
### Higher-order polynomial fits
So far we have only considered fitting straight lines to a dataset. More complex relationships might require the relationship to be represented by e.g. a higher-order polynomial or an exponential function. In the following exercise we will try to fit a second-order polynomial to the atmospheric CO$_2$ concentration, which can be found in "CO2_data.csv" (in the 'Data' folder).

**Exercise: read in the CO$_2$ data set and plot it.** 

<div style="background-color: #FFF8C6">
The CO$_2$ concentration varies periodically; this is caused by the change in uptake of CO$_2$ by vegetation during the seasons. However, there is also a year-on-year increase in the CO$_2$ concentration. 


**Exercise: fit the trend with a straight line, and overplot the result.**

<div style="background-color: #FFF8C6">
It doesn't look like a straight line is the appropriate fit to the trend. This is even more clearly seen when a residuals plot is created, which shows the data minus the fit. 

**Exercise: below, subtract the fitted values from the measured data points and show the residuals plot. Important note: store your residuals in a variable called 'residuals', for further use later on.**

<div style="background-color: #FFF8C6">
This shows there is clearly further structure in the trend. We can use the <span style="color:blue">polyfit()</span> function to fit a second order polynomial to the CO$_2$ concentration data. This will give a fit of the form:

$$ f(x) = P[2] + P[1] x + P[0] x^2. $$

Here $P[0]$, $P[1]$, and $P[2]$ are the fit parameters. A third order fit would give:

$$ f(x) = P[3] + P[2] x + P[1] x^2 + p[0] x^3. $$

The <span style="color:blue">polyfit()</span> function can create arbitrarily high orders of polynomial fits; however it is best to use the lowest order of polynomial that gives a good fit in order to avoid 'overfitting'.

**Exercise: fit a polynomial of a higher order to the data and recreate your data plot with the fit overplotted. Also recreate the residual plot. Which order of polynomial would you pick to fit the trend?**

<div style="background-color: #FFF8C6">
    
### Non-linear fits

If you plot your residuals as a line plot, you can see the periodic variability in the data. To test whether this really is a yearly cycle, we are going to fit a sine function to the residuals, of the form:

$$ f(t) = A\sin\Big({\frac{2\pi}{T}t+\phi}\Big) $$

Here $A$ is the amplitude, $T$ is the period, $t$ is time, and $\phi$ is the phase offset of the sine function. This is an example of a non-linear function because one of the coefficients, in this case $T$, is within the sine function (see the Advanced worksheet for a full definition of linear vs non-linear functions). 

There is no pre-built routine that fits a sine function; instead we have to use the generic <span style="color:blue">curve_fit()</span> function which is in the scipy.optimize package. The <span style="color:blue">curve_fit()</span> function allows us to define our own fit function. Below is the example code to create our sine function fit. Carefully read through the code before running it; note how the fit function is defined in a function that we have called <span style="color:blue">my_sin</span>, and we have to specify an initial guess for the fit parameters. We do this because the fit is non-linear; all non-linear fits have to be done via iteration, where an initial guess is refined until it convergences onto an answer.

In [None]:
from scipy.optimize import curve_fit# Import the function curve_fit from the optimize package in Scipy

year,CO2=np.loadtxt('Data/CO2_data.csv',skiprows=2, delimiter=',',unpack=True)# Load the data

# This is the function we want to fit - you will learn how to create a function from scratch in the next session.
def my_sin(t, period, amplitude, phase):
    return amplitude*np.sin(t * 2*np.pi/period + phase)

# Our initial guess for the parameters
guess_period = 1# Period in years
guess_amplitude = 2
guess_phase = np.pi

p0=[guess_period, guess_amplitude, guess_phase]# Array of initial parameter values

# now do the fit
# curve_fit arguments: 
# 1. the name of the function to fit (my_sin)
# 2. the independent function values (year)
# 3. the dependent function values to be fitted 
#    (note that here we are fitting to the data stored in a variable named 'residuals' which you evaluated earlier)
# 4. an array with the initial parameter values (p0 = p0)
fit = curve_fit(my_sin, year, residuals, p0=p0)
# The fit variable contains the optimized parameters as its first element, and the covariance matrix as its second element.
print('The fit parameters are: ',fit[0])

# recreate the fitted curve using the optimized parameters
# The *fit[0] notation 'unpacks' the first element of the fit. It is the same as saying: 
# fit[0][0], fit[0][1], fit[0][2] (each of which contains one of the optimized variables)
data_fit = my_sin(year, *fit[0])

# Create a plot of the fit with the residuals overlaid.
plt.xlabel('Year')
plt.ylabel('Residuals')
plt.plot(year,data_fit)
plt.plot(year,residuals)
plt.show()


<div style="background-color: #FFF8C6">
You can see that the peaks and troughs in the residuals are well matched by the fit (although our fit has a fixed amplitude whereas the residuals clearly are more variable). If you pay attention to the printed fit parameters, you will see that the fitted period is very close to 1, i.e. there is indeed a yearly cycle in the data. Our initial guess of a period of 1 year was very good, which allowed the fitting routine to find the optimized parameters rapidly and accurately. It is very important to have good starting values, otherwise the fit might make no sense at all. You often only notice this when you actually plot the fit on top of the data! 
<p>
    
**Exercise: our particular fit function is extremely sensitive to the starting value of the period: try and change the initial guess for the period in the code above and see what happens. What happens when you change the starting guess for the other fit parameters?**
<p>
We have now fitted the CO$_2$ concentration data with a sine function to represent the yearly variability and a polynomial to represent the overall trend. The two fits added together are our best fit for the data as a whole. 
<p>
    
**Exercise: plot a graph of the full CO$_2$ concentration data with the total fit overlaid.**

<div style="background-color: #FFF8C6">
Normally when we try and fit a function, we first decide what kind of function makes physical sense. When you expect a linear relationship, it is sensible to fit a straight line. In the data above however, we fitted a polynomial of arbitrary order to the trend, without a theoretical reason for it. In the case of the atmospheric CO$_2$ concentration, it is more common to fit an exponential curve (a good fit would then indicate "exponential growth"). 
<p>
    
**Exercise: create an exponential fit to the trend by using the <span style="color:blue">curve_fit()</span> function. To do this, first copy the <span style="color:blue">my_sin</span> function from the code cell above, change its name to <span style="color:blue">my_exp</span> and alter it to return a function of the form:**

$$f(t) = P[2] + P[1]e^{P[0]t}$$

**Here $P[0]$, $P[1]$, and $P[2]$ are the fit parameters, and $t$ is the time in years. Make sure your <span style="color:blue">curve_fit()</span> function calls your new <span style="color:blue">my_exp</span> function instead of <span style="color:blue">my_sin</span>!**

Hint: subtract the starting year from your time array so that it starts at 1 instead of around 1960. Also, as you have seen when we tried to fit a sine function, the initial guess of the fit is very important. To make an educated guess of the initial parameters, first plot the data and your guess by hand so you can choose an offset and exponent that are at least of the right order of magnitude.

## Index of Session 3

1. Intended learning outcomes
2. General coding practices
3. Functions
4. For loops
5. Mathematical inequalities and Boolean logic
6. If statements
7. While loops
8. Further examples

## 2. The Spyder IDE environment<a id="spyder"></a>

Previously, all the coding was carried out within the Jupyter Notebook environment. Notebooks are useful when executing small blocks of code for quick analysis, and using them as a record of results is highly recommended. However a common method of handling larger coding projects is through using an Integrated Development Environment, or an IDE. An IDE provides a platform with which to develop and execute code, similar to Jupyter Notebooks. 

For using Python, Anaconda comes with its own IDE application, known as Spyder.

## 4. Functions <a id="functions"></a>

Functions are fundamental in producing a program and are where you will spend most of your time when coding in Python. Functions are pieces of code that accept one or more inputs and return an output. Some functions that you have already encountered include the trigonometric functions such as <span style="color:blue">cos(x)</span> or the functions you had to create when using <span style="color:blue">curve_fit</span>. 

When you create a new function in Python, it should be laid out like this:

```python
def function_name(input_arguments):
    "Your code here"
    return [expression] 
    ```

The first line defines the function: it specifies the name and the input parameters the function will accept. Below that is any number of lines of code that 'do the work' of the function. The final line is the return statement: if your function needs to return information to the main program, these will be entered here after the <span style="color:green">**return**</span>.

Two important things of note when writing a function are the indentation of code that follows the <span style="color:blue">def</span> keyword and the colon at the end of the def statement. A common mistake of first-time programmers are to forget these key features. Code written without indentations will not be part of the function. 

An example function is shown below:

In [1]:
# An example function, that takes two inputs variables x, y, adds them together and then
# squares their sum. 
# Different ways the function can be called and how arguements are passed to it are shown.

# Define the function here:
def square_two_numbers(x,y):
    z=(x+y)**2 # This is the arithmetic calculation
    return z # The value of z is returned to the main program

# Below this is the main program.

# One way of calling the function is with two integers.
# Note: the result of the function (i.e. the value of z) is stored in the variable result_ints.
result_ints=square_two_numbers(1,1) 
print(result_ints)

# Now we call the function with two floats. 
result_floats=square_two_numbers(4.2,2.3)
print(result_floats)

# This time we create two variables, one a float and the other an integer.
num1=5.3
num2=10
# We call the function using the variables we created, instead of using literals. 
result_mixed=square_two_numbers(num1,num2)
print(result_mixed)


4
42.25
234.09000000000003


Notice that although the function was defined using input variables x and y, you can put any variables or numerical data types within this function and it will perform the same operation. Pay particular attention to the variable types of the result when we call the function with integers, floats, or mixed variable types! 

We could have have also chosen any name for this function, <span style="color:blue">square_two_numbers</span> is not a special name. 
<p>
<div style="background-color: #00FF00">

**Exercise 1: copy the above function into the Spyder script environment, change the name of the function and execute it for different inputs. Is there any limitation as to what variables you can pass to the function?** 

**Remember to also change the name of the function when calling it!**

Now that we have seen how functions work, let's start creating our own! Remember to use the Spyder IDE for these exercises (creating new code blocks for each exercise).

<div style="background-color: #00FF00">

**Exercise 2: Newton's Law of Gravitation states that the force, $F$, felt between two objects is proportional to their masses, $m_1, m_2 $, and inversely proportional to the square of their displacement, $r$. In 1D, the magnitude of this force is written as (replacing r with $x_2 - x_1$):**

$$F=\frac{Gm_1m_2}{(x_2 - x_1)^2}$$

**Where $G=6.67x10^{-11}m^{3}kg^{-1}s^{-2}$ is the gravitational constant. Write a function that calculates and returns the force felt by masses of 1kg at arbitrary positions (i.e. $x_1$ and $x_2$ should be your input parameters). Remember to think about code layout with regards to constants, and in particular what type of variables (integers or floats) everything should be. Try this for different distances. What happens if the masses are in the same position? We will return to this later.**

**Hint: You might want to change the value of $m_1$ or $m_2$, therefore these should be variables that are passed to the function.**
 


<div style="background-color: #FFF8C6"> 
    
#### Optional Keywords 

Functions in Python can be created with default arguments; this means that unless specified they will assume a value that is set in the function definition itself.In this case, the function would look like:

```python
def add_two_number(x,y=5):
    z=x+y
    return z
```
Copy this function into Spyder, and run it with one and two inputs. With this is mind, try the following:
 

## 5. For loops <a id="forloops"></a>

When writing programs, it is often required to repeat a code block multiple times within the execution of your Python script. The <span style="color:blue">for</span> loop allows you to execute the same line of code a fixed number of times, and is laid out like this:

```python
for [variable] in [range of variables]:
    [Execute code]
    ```

Notice the similarities in layout between a <span style="color:blue">for</span> loop and the functions we saw earlier, with the indents and the colon. Since we execute the for loop within the main body of code, there is no need for a return statement to return any results. The same rule for indents applies to loops as to functions: any code not indented is not part of the loop. Below you will find a simple example of a <span style="color:blue">for</span> loop:

In [128]:
for i in range(0,5):
    print("Hello World")

Hello World
Hello World
Hello World
Hello World
Hello World


In practical applications involving a <span style="color:blue">for</span> loop, you may be operating over an array of values. To accommodate this, Python allows multiple ways to interface <span style="color:blue">for</span> loops with arrays in the following manner:

In [5]:
y=['One','Two','Three','Four','Five']

#This for loop loops over the indices of elements of array y
for i in range(0,5):
    print (y[i])

print ('') #Whitespace to separate code executions

#This for loop queries array y directly
for x in y:
    print(x)


One
Two
Three
Four
Five

One
Two
Three
Four
Five


In the first example we iterate over indices and then assign those indices to the array we want. Here i is a counter that starts off with value 0 and finishes at value 4, as defined by the range(0,5) command. The counter can then be used to specify which element of the array should be printed.

In the second example we query the array directly. This means that x takes the value of each element of array y in turn.

You should recognise that the square brackets are used both when creating the y array and when calling an index of y. 

Loops can also be nested one inside another as follows:

In [130]:
for i in range(0,2):
    for j in range(0,2):
        print("i index =",i,"j index =",j,)

i index = 0 j index = 0
i index = 0 j index = 1
i index = 1 j index = 0
i index = 1 j index = 1


Attempt the following examples:
<p>
<div style="background-color: #00FF00">

**Exercise 4: the following array contains the names of the Fellowship of the Ring:**
```python
Fellowship=["Gandalf","Aragorn","Boromir","Legolas","Gimli","Merry","Pippin","Samwise","Frodo"]
```
**Write a for loop that greets each member with the following message:**
```python
Hello [Name], welcome to the Fellowship of the Ring!
```
**Note: you can query the array itself directly or access the array using its index.**

Now onto a numerical example:

<div style="background-color: #FFF8C6"> 

#### Enumerate
When writing a for loop, you can access the elements of a list directly or using an index. There are situations when you want to do both. To accomplish this, Python has the enumerate keyword, which is included in a for loop as:

```python
for number,value in enumerate(list):
    ---Write some code---
```

**Exercise: modify exercise 4 to print the following:**

```python
Hello [value], you are Fellowship member number [number]!
```

**Where 'value' is the name in the Fellowship list.**

## 6. Mathematical inequalities and Boolean logic<a id="inequalities"></a>

In the sections to follow, we will develop the methodology to execute code only if a certain criterion is met. This criterion is given as a condition which is either <span style="color:red">TRUE</span> or <span style="color:red">FALSE</span>. These <span style="color:red">TRUE</span> or <span style="color:red">FALSE</span> outcomes will usually be formed by comparing two quantities; this is done through mathematical inequalities. The types of mathematical inequality operators are:

- ==      equal to
- <       less then
- \>      greater than
- <=      less than or equal to
- \>=      greater than or equal to
- !=       not equal to

Inequality statements are therefore formed by comparing quantities with these operators, such as:

<span style="color:blue">1 == 1</span>

<span style="color:blue">5 < 3</span>

<span style="color:blue">2 != 2</span>

<div style="background-color: #00FF00">
    
**Try each of these inequality operators.**



Try assigning them to variables and look at their type in the variable explorer window; these type of variables are known as <span style="color:blue">Boolean</span> variables and are a critical part of program flow. Boolean operators can be formed out of more than one comparison of values if a more complex situation needs to be evaluated. This is done by using the boolean operators

- <span style="color:blue">and</span>
- <span style="color:blue">or</span>
- <span style="color:blue">not</span>

An example of this would be:

<span style="color:blue">2 < 3 and 5 > 3</span>

<span style="color:blue">20 == 20 or 20 <= 25</span> 

Although these examples were done for numbers directly, they are exactly the same when comparing variable values. Try out more complex mathematical inequality statements. The <span style="color:blue">not</span> command word is unique in that it reverses the condition that is to be satisfied, for example the following two are equivalent:

- <span style="color:blue">not 5 == 3 </span>  and <span style="color:blue">5 !=3</span>

<span style="color:red">**IMPORTANT NOTE**</span>

One very important point of note is the difference between = and ==. This is a very common error first-time programmers make and is frequently the reason why codes do not work as intended. The first usage is merely a way to set a variable equal to a value, whereas the second usage compares two variables in a Boolean manner.



## 7. If statements<a id="ifstatements"></a>

In the mathematical inequalities section seen above, you can form Boolean variables by comparing the values of numerical variables. These become useful when combined with a conditional clause to modify the behaviour of the code: the <span style="color:blue">if</span> statement. An <span style="color:blue">if</span> statement will only trigger when the argument it is evaluating returns <span style="color:red">TRUE</span> (or <span style="color:red">FALSE</span> if accompanied with a <span style="color:blue">not</span> statement) and is written in a similar fashion to a <span style="color:blue">for</span> loop:

```python 
if [STATEMENT RETURNS TRUE]:
    Execute code
    ```

A simple example of an <span style="color:blue">if</span> statement would look like:

In [134]:
x=5
if x > 3:
    print(x, "is greater than 3")

5 is greater than 3


Just like with <span style="color:blue">for</span> loop, functions can be included as part of an <span style="color:blue">if</span> statement.

Related to the <span style="color:blue">if</span> statements are the <span style="color:blue">else</span> and <span style="color:blue">elif</span> (short for else if) statements. These extend the original <span style="color:blue">if</span> statement to return different results depending on whether the boolean being evaluated returns <span style="color:red">TRUE</span> or <span style="color:red">FALSE</span>. An extension of the above code to include these extra statements would be:

In [None]:
x=5
if x > 3:
    print(x, "is greater than 3")
elif x < 3:
    print(x, "is less than 3")
else:
    print(x, "is equal to 3")

<div style="background-color: #00FF00">
    
**Execute the above code for different values of x.** 


Notice that the <span style="color:blue">elif</span> statement requires an additional expression to evaluate whereas the <span style="color:blue">else</span> statement does not. In the above we could have used the expressions <span style="color:blue">elif x == 3</span>. It would then be good practice to use the <span style="color:blue">else</span> statement for any other possibility we might not have thought of (e.g., what if x is a string instead of an integer?). These set of expressions can be powerful modifiers in the behaviour and flow of a code. Just as <span style="color:blue">if</span> statements can be used in a block of standalone code, they can just as easily be integrated into functions and <span style="color:blue">for</span> loops.


<div style="background-color: #00FF00">
    
**Exercise 6: the array from exercise 4 is extended to include other characters from the Lord of the Rings franchise. together with a 'yes' or 'no' indicating whether the character is part of the fellowship:**

```python
Fellowship=[["Gandalf","Yes"],["Theoden","No"],["Aragorn","Yes"],["Boromir","Yes"],["Galadriel","No"],["Arwen","No"],["Legolas","Yes"],["Gimli","Yes"],["Merry","Yes"],["Pippin","Yes"],["Samwise","Yes"],["Frodo","Yes"]]
```

**Once again use the for loop to print the hello statement as in exercise 4 for all characters that are part of the fellowship. For those that are *not* part of the fellowship, print the following statement for members not part of the fellowship:**

```python
Sorry [Member], you are not part of the fellowship. Have a good day.
```
**Note that in order to do this you will need to include an if statement in your for loop.**

Below are further exercises to practise for loops and if statements, together with the use of random number generators. Attempt these if you would like more practice.

<div style="background-color: #FFF8C6">
    
**Exercise: the flip of a coin can have two results: heads or tails. If the coin was fair, then you would expect an equal number of heads and tails to appear over a large sample size. Test out whether or not a computer makes a fair coin tosser by writing a code block that flips a coin 10000 times.**

**Hint: Assign heads a value of 1 and tails a value of 0. Then use the np.random.randint(2) function to get either a 0 or a 1 randomly**

<div style="background-color: #FFF8C6">

**Exercise: write a function that compares your input to the roll of a six-sided dice, containing the integers 1-6, and tells you whether or not you guessed correctly.**

**Hint: Use the np.random.randint(1,7) function to generate the integers 1-6 randomly**

<div style="background-color: #FFF8C6"> 

**Exercise: generate a 2D array of random integers between 1 - 3 using the command:**

```python
np.random.randint(1,4,size=[n,m])
```
**Here n and m are the dimensions of the array. Write a code block that will loop over this 2D array and count the number of distinct integers it contains. Separately, produce a histogram of the distribution of the numbers in the 2D array. What are some checks that can be included to ensure every value has been counted (are the number of counts in your histogram equal to the number of elements in your array)?**

<div style="background-color: #FFF8C6"> 
    
**Exercise: the Rydberg formula, which gives the wavelength of an atom's spectral lines is given as:**

$$\frac{1}{\lambda}=R_{D}\big(\frac{1}{n_{1}^2}-\frac{1}{n_{2}^2})$$

**Where $\lambda$ is the wavelength, $R_{D}=1.097$ x $10^7\rm\,m^{-1}$ is the Rydberg constant and $n_{1} < n_{2}$ are the integer energy levels that exist within an atom. Using the formula, calculate the wavelength of the first 5 transitions to the ground state of the atom (i.e. $n_2 = 2 \rightarrow n_{1}=1$, $n_2 = 3 \rightarrow n_{1}=1$ etc, known as the Lyman series) and the first 4 transitions to the 1st excited state $n_{1}=2$ (i.e. $n_2 = 3 \rightarrow n_{1}=2$, $n_2 = 4 \rightarrow n_{1}=2$ etc, known as the Balmer series). For the ground state, what happens as $n_{2}\rightarrow\infty$?**

## 8. While loops<a id="whileloops"></a>

The final loop that will modify the flow of a code execution will be the <span style="color:blue">while</span> loop. This will execute code as long as the prescribed condition returns <span style="color:red">TRUE</span>. The <span style="color:blue">while</span> loop follows the same nomenclature as the <span style="color:blue">if</span> loop:

```python
while [STATEMENT RETURNS TRUE]:
    Execute code
    ```

Two simple examples of this are below:

In [None]:
# Declare two variables
x=1
y=10
# Start the while loop: compare the values of x and y
while x < y:
    print(x)# while x is smaller than y, print its value
    x+=1# increment the value of x by 1 

In [None]:
x=6
while x >= 0:
    print(x)
    x-=1

Run the code blocks above for different values of x and y. Note that care must be taken when using a <span style="color:blue">while</span> loop. There exists the possibility for your code to become locked if it is given a condition that can never be false. In the above example remove the x+=1 statement and rerun the code. You will need to stop running the cell, using the stop button, or by restarting the kernel.
Unlike the <span style="color:blue">if</span> statement, there are no additional keywords associated with the <span style="color:blue">while</span> loop. 

Notice in the above example the command <span style="color:blue">x+=1</span>. In Python this is a shorthand representation of <span style="color:blue">x = x + 1</span>. The full list of like-minded shorthand commands are:

 - x+=dx is equivalent to x=x+dx
 - x-=dx is equivalent to x=x-dx
 - x\*=dx is equivalent to x=x\*dx
 - x/=dx is equivalent to x=x/dx
 
This type of statement  may seem confusing to first-time programmers; how can x be equal to itself plus an additional amount? In a computer programme, the computer first evaluates the section of code on the right-hand side of the <span style="color:blue">=</span> operator and then assigns it to the variable on the left-hand side. The fact that the same variable appears on both sides does not impact on its working.

With these tools at our disposal, we can use them to create more complicated functions.

<div style="background-color: #00FF00">
    
**Exercise 7: create a function that calculates the factorial of a positive integer. The factorial function is defined as:** 

 $$n!=n*(n-1)! $$
 
**with the end conditions $1!=1$ and $0!=1$.**

**Hint: Use a while loop with the condition that the integer n remains greater than 0.**

<div style="background-color: #FFF8C6">
    
**Example: create a function that generates the first n fibonacci numbers. The fibonacci sequence is defined as:**

$$f(n)=f(n-1)+f(n-2)$$

**with the initial conditions $f(0)=0$ and $f(1)=1$.**

**Hint: You will need if statements to catch special cases. What are they?**

Think carefully of the input and how to handle any exceptions that may arise, for example what if a negative number is used? For a greater challenge, try and calculate the fibonnaci sequence via recursion, which means that a function is called from within the function itself.


### Break and Continue Statements

Sometimes when in a loop you may want to exit it prematurely. This is handled by using the <span style="color:blue">break</span> statement. This exits the loop at that point and does not execute any further code that was indented with it. The cell below shows an example.

In [10]:
numbers=np.array(np.arange(1,101))# create the array

cumulative=0# This will be the cumulative value of the summed array
for i in np.arange(1,len(numbers)): # start the for-loop
    cumulative+=numbers[i]# calculate the cumulative value so far
    if(cumulative > 100): break # stop the for loop when the cumulative value exceeds 100

print(i)# print the index


NameError: name 'np' is not defined

As you can see when you run the code cell above, the cumulative value is reached very soon. This means the for loop does not unnecessarily run over the entire array, using excessive computing power. Note that the above example could have also been solved by using a while loop. This highlights there are multiple ways to achieve the same result when coding up problems, and the implementation depends on the problem at hand.

The <span style="color:blue">continue</span> statement serves a similar function to the break statement, in that when called whatever code is below in the loop is not executed. One major difference is that instead of breaking from the loop entirely, it returns to the top of loop and starts the next iteration.
<div style="background-color: #00FF00">
    
**Exercise 8: write a code block that uses a continue statement to only print the odd numbers from 1-30.**

<div style="background-color: #FFF8C6"> 
    
## 10. Further examples<a id="examples"></a>

Now that we are familiar with the basics of workflow in codes, the time has come to combine that knowledge towards building complete all-inclusive programs that accomplish a set task. These will incorporate aspects of all 3 sessions and will test your skills. 

**Exercise: imagine a circle of radius $r=1$ m enclosed within a square of length 2m such that the circle tangentially touches the square. The area of the circle is $\pi r^2$ whereas the square has area $4r^2$. The ratio of these areas therefore give an estimate of $\pi$. For $N_{\rm tot}$ points evenly distributed within the square, $N_{\rm inner}$ will fall within the circle, with the ratio of those quantities giving an approximation of the area and therefore pi:**
 
 $$\frac{N_{\rm inner}}{N_{\rm tot}}=\frac{\pi}{4}$$
 
**Write a program that will approximate $\pi$ via this method.  How does the accuracy of your answer change as you increase the number of points? Represent this convergence graphically (i.e. plot your value of $\pi$ as a function of $N_{\rm tot}$).**


# Advanced Error Handling

When using functions from the numpy, scipy and matplotlib libraries, if you use them improperly by using the wrong arguements or passing it the wrong data type, they would crash and raise an error, letting you know that something has gone wrong. All of the functions we have written do not have such a feature, and therefore may be used improperly without letting you know. The process of writing your function to be able to handle these types of errors is known an error handling or execption catching. There are 3 classes of errors that occur in python:

- Syntax Error
- Logic Error
- Exceptions

Syntax errors are the most basic errors and are the easist to detect. These occur when python cannot interpret a line of code, and are most likely down to human error. For example, run the following cell:

In [2]:
whille x >2:

SyntaxError: invalid syntax (<ipython-input-2-dcf3d76a4e10>, line 1)

We have mispelled while, causing python to crash. Logic errors are when the program returns an erronous result owing to something unexpected happening in the code, which does not by itself create any explicit errors. These are the hardest to catch and will be covered later. For now we will focus on Exceptions, which occur when python encounters an error or an unusual condition.

### Exception Handling

 As an example, run this square root function and the cells below.

In [2]:
def square_root(x):
    return x ** 0.5

In [3]:
square_root(4)

2.0

In [4]:
square_root(9.0)

3.0

In [5]:
square_root('hello')

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'float'

When the function was passed a string, it crashed and gave a TypeError, which is related to the type of data that it recieved. In this case the ** operator works on floats and ints, but not on strings. We can rewrite our code to handle these sorts of errors by using the try-except framework. This works by first trying to execute the code block, and if it fails it will default to the except clause. To try this out, we can rewrite our code as:

In [1]:
def square_root_error(x):
    try:
        return x ** 0.5
    except:
        print("The value given must be an int or float")

In [2]:
square_root_error(4)

2.0

In [8]:
square_root_error(9.0)

3.0

In [9]:
square_root_error("Hello")

The value given must be an int or float


Notice how instead of returning the TypeError exception, it instead printed the except statement. Having these except statements in our functions will allow us to catch these errors and then provide an error message that is informative.

<div style="background-color: #00FF00">

**Exercise - Modify the gravitational force function in exercise X with the try-except statements to ensure that the function will only accept ints and floats as its inputs.**

While the try-except clauses are useful for general error handling, there are some times where you want to catch a specific clause that will not necessarily raise an exception. For example, say that we want our square root function to only return a value if its an integer (ie the number is a perfect square). We can do this by using the raise command to tell python to stop execution and return an exception.

In [10]:
def square_root_error_int(x):
    try:
        if((x ** 0.5)%1 != 0.0 ):
            raise ValueError("Number provided must be a perfect square")
        return x ** 0.5
    except:
        raise TypeError("The value given must be an int or float")

In [11]:
square_root_error_int(4.0)

2.0

In [12]:
square_root_error_int(5.0)

TypeError: The value given must be an int or float

<div style="background-color: #00FF00">

**Exercise - Further modify your gravitational force function to raise an exception if that masses are at the same positions (this will stop your function from calculating infinity).**

# Session 4

## Index 
1. Algorithms
3. Roots of polynomials
4. Numerical equation solving
5. Optimisation
6. Integration and differentiaton
7. Optional: non-analytic integration and differentiation

## 2. Algorithms<a id="critical"></a>

### Note: The exercises in section 2 are merely meant to make you think about how to tackle problems. There are easier in-built Python solutions that will accomplish the same task.

Before we move onto using the numerical analysis techniques already present in Python, we will first attempt to code up some common functionality already present. Although this may feel redundant, the thought process and code structure required to solve these issues are an invaluable learning task.

<div style="background-color: #00FF00">
    
**Exercise: One common matrix operation is to calculate its transpose. This works by replacing its row values with column values. Take the follow matrix:**

\begin{equation}
\begin{bmatrix}
    1 & 2 & 3 \\
    4 & 5 & 6 \\
    7 & 8 & 9
\end{bmatrix}
\end{equation}

**And calculate its transpose.**

<div style="background-color: #00FF00">
    
**Exercise: One common function of a Python list is the sort command, which takes an unordered sequence of numbers and returns an ordered list like:** 

$$ list1=[2,4,3,6,8,7,9,5] --> list2=[2,3,4,5,6,7,8,9] $$

**Take list1 and write an algorithm that will return the ordered list2. This is a non-trivial task and there are multiple ways that the problem can be attempted**

**Hint: Ordering the list will involve comparing values, so a double for loop will be required. To start with, write a single loop that will go through the list and identify which elements need to be swapped. Remember, you can assign multiple variables in a single line of python with code like:**

$$b,a=a,b$$

**This will swap the values of b and a.**

## 3. Roots of polynomials<a id="polynomials"></a>
Previously, we worked with polynomial functions to fit data. In numerical modelling, we can encounter problems where we will need to solve polynomials. To this end, Numpy includes the function <span style="color:blue">roots()</span> which finds solutions to polynomials very quickly. The only calling argument for <span style="color:blue">roots()</span> is an array containing the coefficient in front of each polynomial term; e.g. array $P$ where the array elements represent the following:

$$ f(x) = P[3] + P[2]x + P[1]x^2 + P[0]x^3$$

The function uses the size of the list to know what order of polynomial you are using. The output is an array with the solutions.

Below is an example to solve $x^2-2=0$:

In [3]:
import numpy as np
import matplotlib.pyplot as plt # For later use

In [None]:
a=[1,0,-2]
result = np.roots(a)
print(result)

<div style="background-color: #00FF00">
    
**Exercise 1: Find the roots of $2x^5+3x^2=1$. Check that the roots are valid solutions (i.e. the value of the polynomial is zero at these points). Can you plot the poynomial and include an <span style="color:blue">if</span> statement that checks whether each solution is real, and if so overplot that solution on your graph? For this, the NumPy function <span style="color:blue">isreal()</span> is useful to determine if a number is real or complex.**

**Hint: the NumPy function <span style="color:blue">polyval()</span> calculates the y-values of a polynomial with given coefficients and x-values - this can be helpful to check if the solutions are valid**.



<div style="background-color: #FFF8C6">
Below is a more complicated example for you to attempt.
<p>
    
**Exercise: use a random number generator to generate a 50th-order polynomial with integer coefficients between -5 and 5. Find the roots, print all valid solutions, and indicate them on a plot of the polynomial.**

## 4. Numerical equation solving<a id="solving"></a>
There are some equations which you cannot solve analytically. These include so called transcendental equations such as $x=\cos{x}$. Fortunately there are graphical and numerical approaches to solving such equations.

<div style="background-color: #FFF8C6">
    
### Solving functions of multiple variables
The function <span style="color:blue">fsolve()</span> can be used to solve sets of non-linear equations with multiple variables. So for example if you wished to solve the set of equations:</b>

$$x+y=y\cos{y} \\ x^2+y^2=3$$

you can do so with the script below:

In [4]:
import numpy as np
from scipy import optimize as op

In [5]:
# Define the function - note that there are now 2 variables, given in 1 array argument
def two_equations(variables):
    #print(variables)# If you print variables, you will see which values the algorithm tries for x and y
    x,y = variables[0],variables[1]
    output = [x+y-y*np.cos(y),x*x+y*y-3]# The function returns an array: 1 element per equation
    return output

initial_guess=[1,2]

solution = op.fsolve(two_equations,initial_guess)
print("Solution:", solution)
print("Checking the solution:", two_equations(solution))


Solution: [-1.07364009  1.35915303]
Checking the solution: [5.126177260450504e-12, 2.2531310150952777e-11]


<div style="background-color: #FFF8C6">
By printing the output of the function when called with the numerical solution you can see how close the numerical solution actually is to the true solution. If you uncomment the print statement in the function you can see how quickly the algorithm converges.
<p>
It is important to understand if the solution makes sense and if it is the only solution. The first solution to which <span style="color:blue">fsolve()</span> iterates may be mathematically sound but might not be the solution that represents the physical situation you are modelling. Often graphing the solution can help (especially if you are only working in 2 variables). Below is a graph showing the two functions that we have been looking at and you can see that the solution found is only one of two possible solutions. 

![simultaneous_equations](https://cclewley.github.io/ComputingYr1/Images4/simultaneous_equations.png)
<p>
    
**Exercise: can you alter the example code above to find the other solution?**

## 5. Optimisation<a id="optimisation"></a>

You will often encounter complex systems that cannot be differentiated analytically. For these you need numerical techniques. Numerical techniques for optimization in practice (almost always) mean numerical techniques for minimisation. If your problem requires you to maximise a function then, in general, you redefine the problem so that you end up minimising something else. 

Numerical optimisation is an interesting and complicated area. SciPy has an optimize module that contains a plethora of different methods. However, until you become an expert, we suggest that you only use some of the main functions. A basic minimization function is <span style="color:blue">fmin()</span>. It does not use derivatives so can be used in cases where you don’t know the analytical function or its derivate. The numerical optimization starts at an initial guess and then iterates to a minimum. A simple 1-dimensional example illustrates this. The code below minimises the function $f(x) = x^2 + x$. Run the code cell - does it give the answer you would expect?

In [None]:
#define the function to be minimised

def func1(x):
     return x**2+x

#now minimise with a starting guess of 1
initial_guess = 1
x=op.fmin(func1, initial_guess)

print(x)

You may have noticed that the <span style="color:blue">fmin()</span> works along similar lines as <span style="color:blue">fsolve()</span>.
<div style="background-color: #00FF00">
    
**Exercise 5: modify <span style="color:blue">func1</span> in the code above so that it also prints out the value x that it is called with and run the optimization again.** 

The exercise above should give you insight into how <span style="color:blue">fmin()</span> finds the minimum. For further details of how <span style="color:blue">fmin()</span> works use <span style="color:blue">help(op.fmin)</span>.

<div style="background-color: #00FF00">
    
**Exercise 6: now write a programme to minimise the function  $x^4−5x^3−20x^2+50x$ with an initial guess of -4. Try the same process with an initial guess of 3. Plot the function and mark on it the initial guess and the minimum found in each case. Explain what has happened. What conclusions can you draw about how to use <span style="color:blue">fmin()</span>?**

<div style="background-color: #FFF8C6">
You can also minimise functions of multiple parameters by passing an array to <span style="color:blue">fmin()</span>, just like we did for <span style="color:blue">fsolve()</span>.  

**Exercise: find the *maximum* of the 2-D function: $f(x,y) = \sin(xy+y)\,e^{−(x^2+y^2)}$**

<div style="background-color: #FFF8C6">
So far we have used <span style="color:blue">fmin()</span> only to minimise (or optionally, maximise) given functions. Let's now apply it to a hypothetical problem.

**Exercise: a cyclist is biking along a path when she witnesses a child falling into the river. At this moment, the cyclist is at point $A$ and the child at point $B$, as shown in the picture below. The cyclist's maximum speed is 30 km/h on the path, but only 15 km/h on the grass separating the path from the river. At which point should the cyclist cross the grass to reach the child in the shortest time? How long does it take the cyclist to get to the child?**

![cyclist](https://cclewley.github.io/ComputingYr1/Images4/cyclist.jpg)

<div style="background-color: cyan"> 

**Discuss your work on optimisation with a demonstrator.**

## 6. Integration and Differentiation<a id="integration"></a>
You may be wondering how differentiation and integration work in Python. One of the most important tools in a physicist's toolkit is the ability to integrate and differentiate data, whether with analytic functions or with data that cannot be described by an equation. The SciPy package has a wealth of ways with which to accomplish this, but we will only focus on a select few to introduce you to important concepts.

### Numerical integration of functions

There are many integrals that turn out to be very difficult and tedious to integrate analytically and many that are simply impossible. For these SciPy has a range of numerical integrators; the default is a function called <span style="color:blue">quad()</span> which is part of the <span style="color:blue">scipy.integrate</span> set of functions (if you want to know how it works, do a web search for QUADPACK on which it is based). The first calling argument is the function to be integrated and the second and third arguments are the limits of integration. What is returned is a tuple which has the value of the integration as its first element and an upper bound on the error as the second. The example below calculates the integral $ \int_0^{\frac{\pi}{2}}{\sin{(x)}}dx $:

In [None]:
import scipy.integrate as spi

results=spi.quad(np.sin,0.,0.5*np.pi)# Integrate sin(x) from 0 to 0.5 pi
print(results)

You can of course supply your own function to integrate. If the function you want to integrate takes more than one argument, it is integrated along the axis of the first argument. You should use the keyword argument <span style="color:blue">args</span> to specify the values of the other arguments to the integrand. For example, the code below solves the following integral for $n = 2$ and $c = 3$:

$$ \int_0^1({-nx^2+c})dx $$

In [None]:
# Define the function to be integrated
def my_func(x,n,c):
   # print(x,n,c)
   return -n*x*x+c

result=spi.quad(my_func,0,1,args=(2,3))# Integrate the pre-defined function from 0 to 1 pi
print(result)

Note: if you add a print statement inside the function being integrated you can see the values with which it is being called. If you include the variable over which you are integrating you can begin to see how the algorithm actually works.
<p>
<div style="background-color: #00FF00">

**Exercise 7: use <span style="color:blue">quad()</span> to integrate the following functions:**

**1. $\frac{1}{x}$, with different sets of integration limits [1,10], [0,1]**

**2. $\frac{1}{x^2}$, [1,$\infty$], (Hint: read the help for <span style="color:blue">quad()</span> to find out how to deal with the limit of $\infty$; remember that NumPy routines are also present in the Scipy package)**

**3. $\frac{1}{x^2}$, [-1,1]**

**4. $\tan(x)$, with different limits: [0,1], [1, $\pi \over 2$], [0,$\pi$]**

**Compare your results with what you would expect. Do you encounter any problems with any of these integrals? What are your conclusions about how and when to use <span style="color:blue">quad()</span>?**

<div style="background-color: #FFF8C6">

To perform double integrals, we need to use the <span style="color:blue">dblquad()</span> function in the SciPy Integrate package.

**Exercise: use <span style="color:blue">dblquad()</span> to perform the following double integral:**

$$\int_0^1 \int_0^1 \cos(x^2+y^3)dxdy$$

**Hint: you may want to look up how to use lambda functions in Python.**

### Numerical differentation: the finite difference method

In contrast to integration, Scipy does not provide general functionality to compute derivatives. The simplest method to compute derivatives computationally is by employing a finite difference method. You may recall that the formal definition of the derivative of a function $f(x)$ is:

$$f^\prime(x) = \lim_{h\to 0} \frac{f(x+h)-f(x)}{h}$$

The forward finite difference method approximates the derivative at $f(x)$ by using the formula above with a suitably small value of $h$. The code below is an example of how to use this method to numerically differentiate $f(x) = x^3$.


In [None]:
# Function for the forward finite difference method which returns the derivative of an array 
# Input paramaters: fx, the array of function vallues; h, the step size
def forward_finite(fx,h):
    Delta_fx = fx[1:]-fx[0:-1]
    return Delta_fx/h# Note that the returned array is 1 element shorter than the original array

h = 0.1 # Define the step size
x = np.arange(0,10,h) # create an array of x-values between x = 0 and x = 10

# Calculate the function values (f(x) = x^3)
y = x**3

# Calculate the numerical derivative
dy = forward_finite(y,h)

# Plot the result (excluding the final element in x)
plt.plot(x[:-1],dy,'.')
plt.xlabel('x')
plt.ylabel('df/dx')
plt.show()

<div style="background-color: #00FF00">
    
**Exercise 8: run the code cell above and extend the code to calculate the analytic derivative of $f(x) = x^3$ and plot it in the same plot as the numerical derivative. Next, calculate the error in the numerical derivative (i.e. the difference between the analytic and numerical derivative) and plot it in a separate plot. What happens to the error when you change the value of the step size $h$?**

<div style="background-color: #FFF8C6">
    
Let's inspect how well our method works for a couple of other functions.
<p>
    
**Exercise: use the forward finite difference method to calculate the derivative of:**

**1. $f(x) = \sin(x)$**

**2. $f(x) = 1/x$**

**on the interval $[0,10]$ with $h=0.01$.**

**For both of these functions, create a plot of your numerical derivative and the analytic derivative, as well as a plot of the error in your numerical derivative. What happens to the error when you reduce $h$?**

<div style="background-color: #FFF8C6"> 
A somewhat more sophisticated version of the finite difference method is the central finite difference method. This is calculated using the function value on both sides of $x$:

$$ f^\prime(x) = \lim_{h\to 0}\frac{f(x+\frac{1}{2}h)-f(x-\frac{1}{2}h)}{h}$$

**Exercise: write a new function that calculates the numerical derivative according to the central finite difference method. Using this method, calculate the numerical derivative of the same three equations as above and compare its output with the forward finite difference method.**

<div style="background-color: cyan"> 
    
**Discuss your work on integration and differentiation with a demonstrator.**

<div style="background-color: #FFF8C6">
    
## 7. Optional: non-analytic integration and differentiation. Maybe remove the integration bit<a id="non_analytic"></a>

Data you will work with as a physicist may not be well described by an analytic form. This may be due to noise  making the data very hard to fit or your data isn't represented by an equation. In these circumstances we need to return to the fundamentals of what an integral or derivative actually means if we want to calculate one.

### Integration of non-analytic data

When you integrate a function, you are calculating the area under some curve that the function defines. Calculating areas leads to some basic methods for non-analytic integration. As an example, we have collected data points that describe the velocity of a car driving on the motorway. These are shown in the graph below:

![diff_example](https://cclewley.github.io/ComputingYr1/Images4/Car_velocity.png)


We want to calculate the total distance that the car travelled. Looking at the data, we can see that it is not well described by an analytic function. We will therefore need to tackle this challenge using some numerical methods.

#### Rectangle Rule

One of the simplest approximations we can make with our data set is that the velocity is constant at the observed values and changes instantaneously at some point between the measurements. Intuitively this is a non-physical scenario, but the underlying maths provides a useful tool. Graphically, this approximation would look like:

![rectangle_rule](https://cclewley.github.io/ComputingYr1/Images4/Rectangle_integration.png)

Notice that we have centred our observations such that they are in the middle of our boxes. The integral of dataset therefore becomes


$$\int_a^b f(x)dx = \sum_{x=c_i}^{c_n}f(x_i)dx_i=\Delta x \times \sum_{x=c_i}^{c_n}f(x_i)$$
 
 
under the assumption of constant spacing $\Delta x$ between the data points. The integral is then approximated the sum of these rectangles. This method is therefore called the rectangle rule. With that in mind, let's integrate our data set.
<p>
 
**Exercise: load in the velocity data from the file [car_velocity.txt](https://cclewley.github.io/ComputingYr1/Data4/car_velocity.txt) (which is in this session's 'Data' folder). Plot the data and reproduce the figure shown above. Calculate the interval at which the measurements were taken and hence integrate the data.**

**Note: The rectangle rule assumes constant spacing around a point, but we have a point that is defined at t=0 and t=end. How do you handle the boundaries?**


<div style="background-color: #FFF8C6">
    
#### Trapezoidal (or Trapezium) Rule 

Above we made the approximation that our data was constant in between observation points. This is a obviously a big approximation and as physicists we can do better! The next level of approximation we can make is a linear variation of the data in between observations. Graphically this looks like:

![trap_int](https://cclewley.github.io/ComputingYr1/Images4/Trap_integration.png)

Looking at the graph, this seems to be a much more accurate representation of our data. We are again calculating the area underneath the graph, now with the integral approximated by:

$$\int_a^b f(x)dx = 0.5\times\sum_{x=c_i}^{c_{n-1}} (c_{i+1}-c_i)\times(f(c_{i+1})+f(c_{i})) $$

**Exercise: using the data from [car_velocity.txt](https://cclewley.github.io/ComputingYr1/Data4/car_velocity.txt) (which is in this session's 'Data' folder), apply the trapezoidal rule to integrate the data and calculate the total distance travelled by the car.**

Calculating the above integrals, we are left with a just a single number: the total area under the curve (in our case the total distance travelled). Sometimes however we want to know the running total of the integral as more points are integrated.

**Exercise: use the trapezoidal rule to calculate the distance travelled as a function of time. At what time has the car travelled half the total distance?**

<div style="background-color: #FFF8C6">
    
### Differentiaton of non-analytic data

In the above data set we recorded the speed of a car as it travelled along the motorway and calculated the distance it had travelled. This time we do the opposite: we have taken measurements of the distance travelled by a car and want to calculate the speed it was travelling. The data looks like:

![distance2](https://cclewley.github.io/ComputingYr1/Images4/Car_distance2.png)

We know that:

$$v=\frac{dx}{dt}$$

for an infinitesimal change in position and time. However we have real data and can represent this exact differential as:

$$v=\frac{dx}{dt}=\frac{\Delta x}{\Delta t} = \frac{x_{i+1}-x_{i}}{t_{i+1}-t_{i}}$$

for our data points $x_i$ and $t_i$. Note that when we represent the derivative this way, we are calculating a value that is defined in the centre of the time interval.

**Exercise: load in the dataset [car_distance.txt](https://cclewley.github.io/ComputingYr1/Data4/car_distance.txt) (which is in this session's 'Data' folder). Plot the data, remembering to label to axes, and then differentiate this data. Plot the resulting velocity, remembering to create a new time array that is defined on the half intervals.**

You may have noticed that if your data is spaced at regular intervals, $t_{i+1}-t_{i} = h$, the above method works in the same way as the finite difference method we used to differentiate analytic functions! The only difference is that here we are using data points instead of function values, and for our analytic function we can reduce the numerical error by making $h$ arbitrarily small, whereas here we are required to use the interval at which our measurements are taken.