# Session 1

## Index 

1. Intended learning outcomes
2. Using Jupyter Notebooks
3. Literals, operators, and data types
4. Variables and keywords
5. NumPy, SciPy and arrays
6. Basic statistics with arrays
7. Reading and writing files
8. Matplotlib: plotting data
9. Optional: Log plots
10. Coding practices


## 1. Intended learning outcomes<a id="outcomes"></a>
After this session, you should be able to:
- create variables and carry out arithmetic operations on them;
- import the NumPy and Matplotlib packages;
- create and manipulate arrays, and do basic statistics on them;
- read in a data file and save your data to an output file;
- create line plots and scatter plots;
- amend the plot format to create a figure of a publishable standard.
- be able to comment your code.

## 2. Using Jupyter Notebooks<a id="notebooks"></a>
Throughout your 4 computing sessions we will be using Jupyter Notebooks as interactive lab scripts. These notebooks include both text and code. Once you have saved a copy on your own hard drive, you can type code in the code cells (the cells preceded by "In [ ]:" and execute it either by pressing shift+enter or by pressing the 'run cell' button (play symbol) in the toolbar above. You can add more code cells by pressing the 'insert cell below' button (plus symbol). By default any new cells you add are code cells; however you can change these to Markdown  in the drop-down list in the toolbar to allow you to make your own notes in the lab scripts. For a cheat sheet on Markdown, see [here](http://assemble.io/docs/Cheatsheet-Markdown.html).

Make sure you add new code cells every time you want to try out something new, instead of editing your previous code. This way you have a record of everything you have done, which both you and your demonstrator can refer to. Save your work regularly - sometimes you may be forced to close and reopen your file, and you don't want to lose any of your work!

A couple of notes on usage: when you run a cell, the notebook will automatically scroll to the next cell. Note however that you must first click the next cell to explicitly highlight it before you start typing (otherwise unexpected things will happen!). 

### If you double-click on a Markdown cell it will change into edit mode. Press shift+enter to run the cell and turn it back into markdown.

**Exercise 1: Try double-clicking this markdown block. To revert to the normal view you can run the cell in the same way as you would run a code cell by pressing shift+enter.**

The lab scripts use various colour-coded cells:

<div style="background-color: #00FF00">
This is a core exercise cell. These are the most important exercises you will encounter in these notebooks and  cover all of the intended learning outcomes. You should complete these first, before moving on to the non-core exercises. 

<div style="background-color: #FFF8C6">

This is a non-core cell, which includes additional information or extra exercises. The extra exercises are relevant to the intended learning outcomes and are meant to enhance your understanding; however you should only attempt these if you are clearly ahead in time, or have extra time to spend on this at home. Your first priority should be to complete the entire core lab script for each session before the next session begins.

<div style="background-color: cyan"> These cells indicate locations where it is good to pause, review your work and discuss your findings. Do you understand what you have done? Do you have any questions to ask the demonstrators or to discuss with your colleagues?

Every now and then the lab script will remind you to make notes of what you have learnt in your lab book. Your computing lab book should be a running commentary of your work: for example, answers to questions posed in the lab script, new functions you have found, salient points you have learnt, etc. Later on it should also include block diagrams of the programmes you write, and solutions to common bugs you encounter. You should be making notes in your computing lab book not only when you encounter a reminder, but continuously as you go along.

When you finish working on each worksheet, please make sure to fill out the (anonymous) Mentimeter Poll! The link to the poll is given at the end of each worksheet. Whether or not you finished the work, we want to know how it went - this is vital information for us to ensure we are providing a high-standard course for all of our students. 

## 3. Literals, operators, and data types<a id="operators"></a>

On a very basic level, Python can be used as a calculator. 
<div style="background-color: #00FF00">
    
**Exercise 2: in the cell below, type:**

<span style="color:blue">8 + 2</span>

**Now run the cell by pressing shift+enter or by clicking the 'run cell button' (play symbol) in the toolbar at the top of this notebook.**

The numbers <span style="color:blue">8</span> and <span style="color:blue">2</span> you typed above are called literals. Literals are data inserted directly into your code. Here, we have used integer literals (e.g. 1, 2, -3, ... ), but more frequently we will use float literals (numbers with decimal points or given in scientific notation, even if they represent integers: 1.3, 10.0, 1e10, ...). These numbers are called “floating point” because of the manner in which they are encoded in the computer’s binary memory. 

The <span style="color:blue">+</span> symbol in first code cell is called an *operator*. Operators operate on the code on either side of them (called the operands), and produce some sort of result (e.g. the integer 10 in your code above). Other examples of arithmetic operators are:

- <span style="color:blue">\- </span>for subtraction
- <span style="color:blue">\*</span> for multiplication
- <span style="color:blue">/</span> for division
- <span style="color:blue">\*\*</span> for exponent
- <span style="color:blue">%</span> for modulus (returns the remainder of a division)
- <span style="color:blue">//</span> for floor division (returns the result of the division rounded down to an integer). 

Some operators treat the values on either side of them the same way, e.g. <span style="color:blue">8 + 2</span> gives the same result as <span style="color:blue">2 + 8</span>. Others however are directional, for example the result of <span style="color:blue">3\*\*2</span> is different from <span style="color:blue">2\*\*3</span>. 
<div style="background-color: #00FF00">
    
**Exercise 3: in the cells below, try out each of the operators given above - use one cell for each arithmetic statement. You can add as many cells as you like by using the 'insert cell below' (plus symbol) button. Make sure you understand how each operator works (if not, do ask one of your demonstrators - they will be happy to explain). What happens if you mix the data types of the operands in your calculation, for example add an integer to a float?**

The order of operations is as usual, i.e. multiplication and division before addition and subtraction. For operations on the same level, Python reads code from left to right, i.e. <span style="color:blue">20/5\*2</span> will give 8.0. To change the order, or make the order explicit (and hence code more readable) use round brackets, for example <span style="color:blue">20/(5\*2)</span> to give 2.0. The shorthand for scientific notation is  <span style="color:blue">1.234e5</span>, which means $1.234\times10^5$.

So far we have encountered integer and float data types - another type of literal is a string literal, which is a piece of text that does not constitute any code. You specify a string literal by surrounding it in matched single or double quotation marks, e.g:

In [None]:
"This is a string"

You can find out the data type of any literal by using the <span style="color:blue">type()</span> command. 

<div style="background-color: #00FF00">
    
**Exercise 4: run the examples in the following code cells:**

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type("1")

In [None]:
type(1+1j)

In [None]:
type([1,2])

The cells above illustrate some of the most common data types in Python. For a bit more information on lists and some other data types, have a look at the optional part at the end of the next section.

It is now almost time to combine what we have learned so far and put it into action. Before we do that however, a note about errors. When you are coding, it is common to make errors, for example a typo or trying to do something that Python can't do. When this happens, Python will throw up an error message. These error messages can be very difficult to understand owing to their technical nature. Here are two hints to help you interpret error messages:
1. Start at the bottom of the error message: it normally give a one-line summary of the problem. If you can't immediately solve it, read the text above it to find out where in your code the error occurred.
2. If the one-line summary doesn't make sense, copy and paste it into Google. Chances are many people have encountered it before and will have asked a question about it!

<div style="background-color: #00FF00">
    
**Exercise 5: In order to get used to error messages, run the following cell, and read the error message *before* correcting the code. The cell is supposed to output the number 20!**

In [None]:
10 + 1o

The other main type of mistake is more pernicious: you may write some code that is perfectly valid, but doesn't do what you actually want it to do. The only way of catching this, is by paying very careful attention to the output at all times, and checking it makes sense. We will discuss this further in Session 3 - however it is important to foster the habit of checking your output from the very start of your coding career.

**A final important note**: it is expected that you will cause your system to grind down to a halt when you do calculations with extremely large numbers. This also occasionally happens due to coding errors. Don't worry - this is not a problem! If your system becomes unresponsive, restart your kernel by clicking 'Kernel'  --> 'Restart' in the menu bar at the top of the page. Make sure to change any offending code cells to markdown so that they won't be executed again, click on the next code cell, and choose 'Cell' --> 'Run all above' to re-run all your coding cells so far. 

Let's move on! Experiment with operators, brackets, scientific notation, and different data types in the code cell below. From now on, the notebook will only display one code cell when it is time for you to try your coding skills; it is up to you to add as many cells as you need. It is strongly recommended you don't delete any of your code but keep as many examples in different cells as possible so both you and your demonstrator can easily refer back to what you have tried.
<p>
<div style="background-color: #00FF00">

**Exercise 6: coding has many quirks that you will get used to with practice. Try and answer the following questions for yourself whilst experimenting with arithmetic operators:**

- **Do all answers make sense?**
- **Can you use numbers as big as you like? Is there an upper limit to the exponent in scientific notation?**
- **Can you use numbers as small as you like? Is there a lower limit to the exponent?**
- **Can numbers be as precise as you like? How many zeros do you need before 1.00000000000000000001 gets truncated?**
- **What happens if you don’t balance your brackets?**
- **Can you apply arithmetic operators to strings? If so, which, and what is the result?**

**Make sure to note down the answers (along with anything interesting or unexpected you find out) in your lab book!**

<div style="background-color: #FFF8C6">
You may have noticed floats do not have unlimited accuracy. This is an important feature of computer programming, not just a bug in Python. To read more about why this happens, have a look at [this tutorial.](https://docs.python.org/3/tutorial/floatingpoint.html) 

#### Converting between data types

In the above we defined variables to be different types of data, whether that be an int, float or string. Variables however don't have to remain the same type throughout, they can be changed. For example, below we convert an integer to a float, and a string consisting of a number to an integer.

In [None]:
a = 2
b = float(a)
print(a,b)

c = '69'
d = int(c)
print(c,d)

<div style="background-color: #FFF8C6">
Can you see the difference between the value of variable c and variable d when they are printed in the example above? How can you check which variable is of which type?
<p>
    
**Exercise: try converting different types of data below. Answer the following questions:**
- **Can you turn a string of letters into a float or integer?**
- **What happens when you convert a float with non-zero decimal points into an integer? Make sure to try different values of decimals!**
- **Can you turn a complex number into a string, float, or integer?**

## 4. Variables and keywords<a id="variables"></a>

The <span style="color:blue">=</span> operator (sometimes called the assignment operator) allows you to store data in a *variable*. Variables are ubiquitous in computer programming, and are much like variables in maths. For example, if we want to create a variable x which has the value 4 we can simply write <span style="color:blue">x = 4</span>. Note that, unlike in algebra, the operator <span style="color:blue">=</span> is directional: the variable on the left of the <span style="color:blue">=</span> is always assigned the value of what is on the right of the <span style="color:blue">=</span>, not the other way around.  

<div style="background-color: #00FF00">
    
**Exercise 7: run the cells below and see what happens:**

In [None]:
x = 4
print(x)

In [None]:
x + 2

In [None]:
y = x + 2
print(y)

Note that in the code above we used the <span style="color:blue">print()</span> command to print the value of the variables to screen. In Python, you can also simply type the name of a variable to do this. However, this should be used with caution as this only works properly when a cell has only one output. 

<div style="background-color: #00FF00">
    
**Exercise 8: to illustrate this, try running the  code cells below and pay careful attention to the output of each cell.**

In [None]:
a = 1
b = 2
a

In [None]:
a
b

In [None]:
print(a)
print(b)

So far, we have used single letters to name variables. This is usually not good practice - if you simply assigned every variable a letter of the alphabet, it would be very hard to decipher your code at a later date and understand what each variable stands for. It is therefore important to chose your variable names carefully. 
<p>
<div style="background-color: #00FF00">

**Exercise 9: below, assign variable names to data provided by literals, or to the results of computation using operators. Try and find out the answers to the following questions:**

- **Can you identify the rules that govern the possible names? Some names to try: my_glorious_variable_3, True, 1value, my favourite value, A#B...**
- **Are the values case sensitive, i.e., is 'name' the same as 'naMe'?**
- **What happens when you give the same name to two different values?**
- **What happens when you give two different names to the same value?**
- **What happens if you store the result of a calculation involving a particular variable as that very variable?**

The reason why <span style="color:blue">A#B</span> didn’t work as a name was that <span style="color:blue">#</span> is Python’s comment character, which means “Ignore everything after this character until the end of the line”. Comments are used to annotate  code to make it more human-readable, for example to describe in natural language what a complicated line of code does, to make it easier to understand.

The reason why <span style="color:blue">True</span> didn’t work as a variable name is because it is a Python keyword, one of the few words that has a special meaning to the language. A list of keywords is:
```python 
and        def       for      is      return
as         del       from     lambda  try
assert     elif      global   not     while
break      else      if       or      with
class      except    import   pass    yield
continue   finally   in       raise
```

Note that this list changes with different versions of Python. Jupyter Notebooks helpfully change the colour of a keyword to bold green, so you will immediately notice it if you use a keyword inadvertently. We will cover some of these keywords in this course, but not all of them.

<div style="background-color: cyan">Now that we've covered the very basics of coding in Python, pause and discuss your findings with your colleagues or a demonstrator.

# 5. NumPy, SciPy and arrays<a id="arrays"></a>

Everything we have covered so far has been part of the core Python programming language. However, the core Python programming language does not include many mathematical functions that you might expect to use. So, for example, if you needed to use trigonometric functions such as sin, cos, etc, you would have to write your own code using to implement these. If you wanted to do numerical integration you would have to write the code. If you wanted to display results as plots, you would have to write (quite a lot of) code to do it, and so on. This would be tiresome and very time consuming. Fortunately, there are libraries of code that provide for most of these common requirements, and much more!

NumPy and SciPy are large collections of open-source libraries and tools brought together to give a powerful high-level environment for mathematical and scientific computing. NumPy provides functions for basic mathematical operations (sin, cos, tan, etc.) as well as functions which handle arrays, vectors, matrices and operations upon them. SciPy provides more specialised functions that are useful for scientific programmings, such as special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others.

In this session we will use functions from NumPy. To be able to use it, you first need to import the package as follows: 

In [1]:
import numpy as np

Generally you will need to include this line at the top of your code or notebook (and make sure to run the cell). Be careful: when you reopen your notebook at a later time to continue or review your work, you will have to run the cell above again to be able to use NumPy's functions again. A good way of resuming work is to select the "Run All Above" option from the Cell menu, to ensure all previous cells have been executed before you carry on.  

You can now use all of NumPy's routines by calling them by their name preceded by <span style="color:blue">np.</span> - for example to create an array A comprising the numbers 10 to 100 in steps of 10:

In [None]:
A = np.array([10,20,30,40,50,60,70,80,90,100])

As you can see, an array is a series of objects of the same type (integers in the example above). Each individual object in the array is called an element. You can access individual elements of an array by specifying the index of the element in the array within square brackets. For example, the cell below first prints the entire array A and subsequently only the element with index 1:

In [None]:
print(A)
print(A[1])

When you run the above cell, the second output line might not be what you expected! This is because indices start from 0, so the first element has index 0, the second element index 1, and so forth. You can access any selection of elements from an array using indices; this is called slicing. 

<div style="background-color: #00FF00">
    
**Exercise 10: look at the following list - can you predict what the result will be before running these statements? Pay careful attention to which elements are included in the slices.**
- <span style="color:blue">A[9]</span>
- <span style="color:blue">A[10]</span>
- <span style="color:blue">A[-1]</span>
- <span style="color:blue">A[1:3]</span>
- <span style="color:blue">A[1:]</span>
- <span style="color:blue">A[:5]</span>
- <span style="color:blue">A[0:6:2]</span>
- <span style="color:blue">A[::2]</span>

**Slicing is a very important concept in programming, so take some time experimenting with this in the cell below (again, add as many cells as you like in the notebook). Can you figure out the rules of slicing? Challenge: use slice notation to reverse array A in one line.**

You can even create 2D (or higher-dimensional) arrays, by nesting several 1D arrays within one array: one for each row. This is a very useful data structure - it can for example represent data tables or even images. Run the following cell to see an example:

In [None]:
twoDarray=np.array([[1,2],[10,20],[100,200]])
print(twoDarray)

We can now access individual cells by taking the slice [row_index,column_index]. For example, the command below will print the element that is in the third row and second column of the 2D array above:

In [None]:
print(twoDarray[2,1])

<div style="background-color: #FFF8C6">
Arrays can be extended to an arbitrary number of dimensions. Implementing them in the above manner however would become very tedious, especially  trying to keep track of all of the square brackets you would need. Numpy has the np.zeros command, that allows you generate an arbitrary array of zeros in any dimensions:

```python
Arbitrary_array=np.zeros((dim1,dim2,dim3,.......,dimn))
```

Note the double set of brackets within the zeros, this is because for the zeros function, it is the first argument that sets the number of dimensions. 

<div style="background-color: #FFF8C6">
**Exercise: experiment with taking slices out of a 2D array - these can be 2D arrays, 1D arrays, or single elements. Can you take it a step further and create a 3D array (i.e. a data cube) and take slices from it?**

The <span style="color:blue">np.array()</span> method works fine for short arrays, or to input a small number of measurement data points by hand, but would become tedious for creating longer arrays. A quicker way of defining arrays with a fixed increment is using <span style="color:blue">np.arange()</span> or <span style="color:blue">np.linspace()</span>. 

<div style="background-color: #00FF00">
    
**Exercise 11: use Google or the help documentation on these two functions (by running e.g. <span style="color:blue">help(np.arange)</span>) to find out how they work. Subsequently create an array which includes the numbers 0 - 100 (make sure to store your array in a variable). Next, try and create an array consisting of the numbers 100 - 200 in steps of 0.01. What is the difference between using <span style="color:blue">np.arange()</span> and <span style="color:blue">np.linspace()</span>?**

Arrays are very useful data structures, which we will use throughout this course. When you analyse the data you take in lab with Python, you should always store your data in arrays. One reason arrays are so powerful within Python is because you can do arithmetic operations on them. For example, to multiply every element of array A by two, simply run the following:

In [None]:
2*A

You can even multiply two arrays, for example:

In [None]:
A*A

<div style="background-color: #00FF00">
    
**Exercise 12: refer back to section 3 and experiment using the various arithmetic operators on arrays, both with scalar values and arrays. For example, try:**
- <span style="color:blue">A + 10</span>
- <span style="color:blue">A + A</span>
- <span style="color:blue">A**3</span>

**Can you use all arithmetic operators on arrays, both with scalar values and two arrays? What happens if you try and apply an operator to two arrays of different lengths (i.e. a different number of elements)?**

<div style="background-color: cyan"> Take a moment to discuss the rules of slicing and array arithmetics.</div>

## 6. Statistics with arrays<a id="stats"></a>

Here is a set of values obtained for the wave length of a sound wave.

* 0.76 m
* 0.79 m
* 0.84 m
* 0.75 m
* 0.80 m
* 0.79 m

We can now calculate the mean of these data with a simple set of commands:

In [None]:
x=np.array([0.76,0.79,0.84,0.75,0.80,0.79])
mean_value=np.mean(x)
print('The mean value is:', mean_value, 'm')

Note that when writing a lab report, you would not simply copy this value into your report. You would need to choose the correct number of significant numbers to quote! 

Similarly, we can calculate the standard deviation of the sample by calling the function <span style="color:blue">np.std()</span>. This function calculates the sample standard deviation $s$ using the following formula:

$$s^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i-\overline{x})^2$$ 

Here $x_i$ are the individual data points, $\overline{x}$ is the mean value of the data set, and $n$ is the number of data points.

<div style="background-color: #00FF00">
    
**Exercise 13: below, calculate the standard deviation of our data set using the <span style="color:blue">std()</span> function which is in the NumPy package - i.e. call the function by typing <span style="color:blue">np.std()</span>. Once again, use the inbuilt help or Google to help you on your way. Help files are always a long read -  in this case focus on the keyword <span style="color:blue">ddof</span> to ensure you use the *sample* standard deviation. You will need to set this to <span style="color:blue">ddof=1</span> when you call the <span style="color:blue">np.std()</span> function. This ensures the denominator in the equation above is set to $n-1$ rather than $n$.**

We have calculated the mean $\overline{x}$ which is an estimate of the true value of the quantity we are measuring.  How accurate is this estimate?  The sample standard deviation $s$ is *not* a measure of the error in this estimate.  We need the standard error of the mean, written as $\sigma_m$, which tells you the accuracy with which the mean of the data points gives the *true* value of the quantity you are measuring. The standard error of the mean reduces with the number of data points taken, and is given by:

$$\sigma_m=\frac{s}{\sqrt{n}}$$

To calculate $\sigma_m$ with Python you need to know how many data points there are in our array which
you can get using the Python function <span style="color:blue">len()</span>. You also need to use the <span style="color:blue">sqrt()</span> function which is in the NumPy package. 

<div style="background-color: #00FF00">
    
**Exercise 14: calculate the standard error of the mean for data set $x$.**

## 7. Reading and writing files<a id="files"></a>

In the previous section you have seen that Python is a powerful tool for statistics, once you have stored your measurement data in arrays - particularly because you can execute the same block of code to calculate the mean, standard deviation, and standard error of the mean on any data set you take in your different labs. However, you won't usually *record* your data directly in Python. Normally, you will save your data in a file, which you can read in using your Python code.

There are numerous ways in Python to read in data from files, each with their own pros and cons. For now, we will use the <span style="color:blue">loadtxt()</span> function, which is included in the NumPy package. It offers a straightforward way of reading in data sets which consist of columns of measurement points. 
<div style="background-color: #00FF00">
    
**Exercise 15: to try this out, open the file [Resistivity.txt]( 
https://cclewley.github.io/ComputingYr1/Data1/Resistivity.txt
) . This file (as well as all other data files) are stored in the 'Data' folder that you have downloaded from Blackboard for this session. Click on the link to inspect this data file within your browser: you will see it includes 3 columns, separated by spaces. The first column is temperature, the second the measured resistivity of copper, and the third that of aluminium. You may recognise these from your Measurements & Uncertainties tutorial if you have already done this! To read in this data file and print the data, run the following cell:**

In [None]:
data=np.loadtxt("Data/Resistivity.txt")
print(data)

Note that to read the file, we have to include the subfolder name in the file name, i.e. <span style="color:blue">"Data/Resistivity.txt"</span>. If we do not include the subfolder name, the computer would look for the file in the same folder as where the script for this code is stored (i.e. this Jupyter Notebook). Since usually you would have your data stored in a different place than your programming scripts, it is good to get used to specifiying folders (or 'path names') right from the start. <p>
You will see that the data has been read into a 2D array. If we wish, we can now slice the 2D array to create three 1D arrays, each representing a separate physical quantity (T for temperature, R_Cu for the copper resistivity, and R_Al for the aluminum resistivity):

In [None]:
T = data[:,0]
R_Cu = data[:,1]
R_Al = data[:,2]
print('Temperature:', T)
print('Copper resistivity', R_Cu)
print('Aluminium resistivity', R_Al)

<div style="background-color: #FFF8C6">
The <span style="color:blue">loadtxt()</span> function incorporates more features: it can also skip header rows and deal with different types of delimiters (how the data is separated). As an example, look at the file [Resistivity.csv](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.csv), which is a csv (comma separated variable) file of the same data. You will need to download this file and open it in Excel to be able to inspect it. Note that now the data is delimited by commas, and includes two header rows. Now run the cell below:

In [None]:
T, R_Cu, R_Al = np.loadtxt("Data/Resistivity.csv", skiprows=2, delimiter=',', unpack=True)
print('Temperature:', T)
print('Copper resistivity', R_Cu)
print('Aluminium resistivity', R_Al)

<div style="background-color: #FFF8C6">
We have changed three things: we use the <span style="color:blue">skiprows</span> keyword to skip the first two header rows, we use the <span style="color:blue">delimiter</span> keyword to set the delimiter to comma (a tab would be <span style="color:blue">delimiter='\t'</span>), and we have used the <span style="color:blue">unpack</span> keyword to store each column of the data in a separate array (T, R_Cu, and R_Al in this case).

**Exercise: try reading [Resistivity.txt](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.txt) with the keyword <span style="color:blue">delimiter=','</span> and see what happens. Next try it with  <span style="color:blue">delimiter=' '</span>. Finally, try reading [Resistivity.csv](https://cclewley.github.io/ComputingYr1/Data1/Resistivity.csv) without the <span style="color:blue">skiprows</span> keyword. Open up the .csv file in a text editor such as WordPad and look at how the data are stored.**

When reading in more complex data, the function <span style="color:blue">genfromtxt()</span> in the NumPy package can be more appropriate. It incorporates more flexibility, such as dealing with missing data.

Conversely, at some point you will want to write data to an output file. To do this, we can use the <span style="color:blue">savetxt()</span> function included in NumPy. Below is a simple example of how to use this. 

<div style="background-color: #00FF00">
    
**Exercise 16: run the cell below and have a look at the resulting file.**

**Tip: to do this, you need to make an 'Output' folder in the folder where this worksheet is located, and click on the newly created file to see a preview in your browser window.**

In [None]:
np.savetxt('Output/Test_outputfile1.txt',data)

Note that this time we specified the output file to be saved to the subfolder 'Output', which is a pre-existing folder. It is good practice to keep your files organised in a sensible folder structure, instead of dumping all output in the same folder as your script.

<div style="background-color: #FFF8C6">
Note that <span style="color:blue">savetxt()</span> only takes one argument for the data to be printed, so if you want to print various 1D arrays (instead of the single 2D array in the example above), you need to combine them into a single 2D array. To do this successfully, you will need to use the NumPy <span style="color:blue">column_stack()</span> function. 

**Exercise: the two examples below illustrate this - extend the code to print out the two arrays and have a look at the result.**

In [None]:
combined_1 = [T,R_Cu,R_Al]
combined_2 = np.column_stack([T,R_Cu,R_Al])

<div style="background-color: #00FF00">
    
**Exercise 17: in the cell below, find the mean of the resistivity of copper and subtract it from the copper resistivity data set, so you are left with the residuals. Do the same for aluminium. Now save your new data set (which includes temperature, copper resistivity residuals, and aluminium resistivity residuals) to a file. Make sure to choose a sensible name for your output file! Use the <span style="color:blue">help()</span> function to find out what arguments and keywords the <span style="color:blue">savetxt()</span> function takes. Can you include a header line with the column names and space the columns with tabs?**

## 8. Plotting data<a id="plotting"></a>

When you take measurements in lab, you will want to create a graph of your data. Python has a package that specialises in plotting: <span style="color:blue">matplotlib</span>. To use the plotting routines of this package, we only need to import the <span style="color:blue">pyplot</span> part of the matplotlib package. 

In [None]:
import matplotlib.pyplot as plt

Within Jupyter Notebooks the plots are created within the notebook itself. In most other environments (such as the Spyder IDE which you will use in Session 3) plots will be created in a separate window.

We can now use the <span style="color:blue">plot()</span> function to create a plot of our resistivity data:

In [None]:
plt.plot(T,R_Cu)
plt.show()

In essence the first line in the cell above creates a plot object, and the second line shows the plot (akin to creating a variable and printing it to screen with the the <span style="color:blue">print()</span> command). To be able to use a figure in your report, you will want to save it as an image or pdf file. To do this, instead of the <span style="color:blue">plt.show()</span> command, use the <span style="color:blue">plt.savefig()</span> command.

Looking at the plot above, you may guess that the resistivity increases linearly with time, but there is some scatter in the data. A line plot is therefore not the most suitable graph; instead we want to use a scatter plot. In fact, even if the data did follow a perfect line, we would still want to plot the data points themselves as well - otherwise we would not be able to tell if the graph is the result of two datapoints or two hundred! We can do that by specifying a plotting symbol as a third argument in the <span style="color:blue">plot()</span> function. Below we plot the resistivity for both copper and aluminium using different plotting symbols. We also save the image as a png file in the subfolder 'Output' (check this for yourself!).

In [None]:
plt.plot(T,R_Cu,'x')
plt.plot(T,R_Al,'+')
plt.savefig("Output/Resistivity_plot.png")
plt.show()

**Important note:** Make sure you put the <span style="color:blue">plt.show()</span> command *after* the <span style="color:blue">plt.savefig()</span> command. If you save the figure after showing it on screen, it will create an empty file!

In general, when you create plots they can have three different functions:
1. A quick look at your data or model in the middle of your work to see what it looks like (i.e. work in progress).
2. A graph that you can show to someone else or save for later use.
3. A graph that you can use in a presentation, report, or publication.

For point 1, the above graph we created might suffice - it's a 'quick and dirty' plot that shows your data but doesn't include any information. For point 2, you would need to include at the very least enough information so that someone else (or yourself at a later point) can make sense of it. For example, the plot below includes axis titles and a legend to this purpose.

![Resistivity_plot_example](https://cclewley.github.io/ComputingYr1/Images1/Resistivity_plot.png)

For point 3, the layout of your graphs is incredibly important: they are the main showcase of your results! In your lab reports, you will need to create figures to a publishable standard (following the IEEE guidelines). In presentations, you may want to create bigger images with oversized fonts and thick markers of contrasting colours so that they will be projected legibly. An example of an improved Resistivity figure for this purpose is below. 

![Resistivity_plot](https://cclewley.github.io/ComputingYr1/Images1/Resistivity_plot_improved_final.png)

<div style="background-color: #00FF00">

**Exercise 18: take some time to try and recreate the figure above as exactly as possible. To get you started, some code is given below to change the size of the plotting symbols, the various text elements, and the figure itself. Experiment with changing each of the values in turn. Next, add lines of code to add the title, axes labels, legend, and grid lines to the plot. Tip: use the following functions from the matplotlib library: <span style="color:blue">title()</span>, <span style="color:blue">xlabel()</span>, <span style="color:blue">ylabel()</span>, <span style="color:blue">legend()</span>, and <span style="color:blue">grid()</span>.**

In [None]:
# Plot parameters: experiment with different values
params = {
   'axes.labelsize': 8,
   'font.size': 8,
   'legend.fontsize': 8,
   'xtick.labelsize': 8,
   'ytick.labelsize': 8,
   'figure.figsize': [6, 4]
   } 
plt.rcParams.update(params)

# Try and find out what happens when you vary the value for the mew and ms keyword
plt.plot(T,R_Cu, 'x', mew=1, ms=5, color='red') 
plt.plot(T,R_Al, '+', mew=1, ms=5, color='blue')
#plt.savefig("Output/Resistivity_plot_improved.png")


<div style="background-color: #FFF8C6">
There are many more features you can change if you have time to research them. For example, you may have noticed that the 'tick marks' on the x-axis (i.e. where the vertical grid lines are placed) in the example plot are different from the default values. 
<p>
    
**Exercise: can you find out how to change your tick marks accordingly? Another thing to try is changing the font to Arial using the <span style="color:blue">rcParams</span> data structure.**

For your lab report, a useful way of proceeding is to first determine the size your figure will be in your report (e.g. width of one column) and then creating a figure of exactly that size in your code, using the same font size and type as the rest of your report. This ensures that you will not need to rescale your figure, which can lead to undesirable results. 

Note that once you have run <span style="color:blue">plt.rcParams.update(params)</span>, your subsequent plots will retain the same parameters. You can reset the plotting style in your notebook by running the following:

In [None]:
plt.style.use("default")

<div style="background-color: #FFF8C6">
The <span style="color:blue">plt.style.use("default")</span> function loads the default style sheet. There are other pre-defined style sheets that change the look of your plots. To list all available style sheets, run: 

```python
print(plt.style.available)
```

**Exercise: experiment with a few different style sheets. You can even create your own style sheet and load it in this way!**

<div style="background-color: cyan"> Do you have any questions about making figures, array statistics and reading / writing files?

The next section looks at creating plots with logarithmic scales. This is optional, and if you have less than 20 minutes left we recommend you skip straight to section 10: Coding Practices.

<div style="background-color: #FFF8C6">
    
## 9. Optional:  Log plots<a id="logplots"></a>

Two other types of plots you will need to create frequently are logarithmic plots and histograms. Logarithmic plots are particularly good at highlighting different types of relationships between two parameters. Three examples are:
1. Sound intensity $\beta$ in dB: $$\beta = 10 \log{\frac{I}{I_0}},$$ where $I$ is the intensity in $\rm W\,m^{-2}$ and $I_0 = 1.00\times10^{-12}\rm W\,m^{-2}$ for air.
<p>
2. Radioactive decay: $$N = N_0 e^{-t/\tau},$$ 
<p>where $N$ is the number of particles in the sample, $N_0$ is the initial number of particles, and $\tau$ is the lifetime of the particles.
<p>
3. The period of oscillation $T$ of a pendulum (for small amplitudes): $$T = 2\pi \sqrt{\frac{L}{g}}.$$ Here $L$ is the length of the pendulum and $g$ is the gravitational constant.

Below is the code to create a plot for example 1. 

In [None]:
I_0 = 1e-12
I = np.linspace(0.1,10,20)# Array with 20 intensity spanning 0.1 - 10 Wm^-2 
beta = 10*np.log(I/I_0)# The sound intensity in dB
plt.plot(I,beta)
plt.plot(I,beta,'x')
plt.xlabel("Intensity (W/m^2)")
plt.ylabel("Sound intensity beta (dB)")
plt.xscale('log')
plt.show()

<div style="background-color: #FFF8C6">
When you run the cell above, you will notice that the x-axis is scaled to logarithmically to allow the relationship between $\beta$ and $I$ to be displayed as a straight line. This is done using the <span style="color:blue">xscale('log')</span> command from the pyplot package. The sound intensity is plotted both as a line and as crosses. You will see that the data points are all clustered in the top right of the graph. This is because we have spaced the intensity linearly. It would be better to space the datapoints evenly in log space: to do this we can use the <span style="color:blue">logspace()</span> function that is included in NumPy. 

**Exercise: below, recreate the above plot but with data points evenly spaced in log space.**

<div style="background-color: #FFF8C6">
For examples 2 and 3 we need two different types of logarithmic plots in order to display the relationships as a straight line. 
<p>
    
**Exercise: create both plots yourself below.**

<div style="background-color: #FFF8C6">
Often in axis labels you might want to use greek symbols or superscript (e.g. m$^2$). This can be achieved by using LaTeX's math text commands. For a simple example, replace the label commands of the first example plot in this section (sound intensity $\beta$ versus $I$) with:

```python
plt.xlabel(r"Intensity (W/m$^2$)")
plt.ylabel(r"Sound intensity $\beta$ (dB)")

```
Note the insertion of the 'r' in front of the label string, and the dollar signs around the parts that of the string that use math text commands. For a more comprehensive introduction, take a look at [this guide](https://matplotlib.org/users/mathtext.html).

## 10. Coding practices<a id="codingpractices"></a>

In the previous sections we have written brief snippets of Python code to achieve a specific task, i.e. defining a variable or data structure and printing out the values of these, or creating a plot from existing data. However from the next session we will start to write longer pieces of code, and there are some best practices that should be used to keep your code and results legible.

#### Comments

In this course, your experimental labs and your time as a physicist, you will be working on projects that can span days, weeks or even months. There may be substantial time between when you first write the code and when you next look at it. Additionally, you may want to send this code to a collaborator to work on, and they may not understand all of the steps you have taken (it is notoriously hard to read and interpret someone else's code). To that end, Python (and all programming languages) allow you to comment your code, which will allow you to write reminders and any important information that is needed when running your code. As we've already seen, comments use the following syntax:

```python
# This is a python comment
```

Any line that begins with the <span style="color:blue">#</span> symbol is not executed as a piece of python code and is purely for human benefit. **You should *always* comment your code extensively**. Comments don't have to be incredibly detailed but should give a brief description of what you are doing. Consider a snippet of code that calculates the hypotenuse of a right-angled triangle:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle, given the height and width
a=3 #Height in cm
b=4 #Width in cm
c=np.sqrt(a**2+b**2)
print(c) #Print the result

The comments are short but clearly describe what is happening in the code. As your work becomes more complicated, the use of comments will save a lot of hassle in the future. When you write full programmes, you may want to include comments at the top with your name and data, to act as a fossil record of when work was implemented, and what the purpose is of your programme.


#### Print statements and formatting

When printing variables to the screen thus far, we  sometimes only printed the variables themselves, with no further information about what they are respresenting physically. Like the idea of comments above, adding extra text will make it easier to keep track of what is going on when you return to work at a later date. Consider the following example:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle,given the height and width
a=3 #Height in cm
b=4 #Width in cm
c=np.sqrt(a**2+b**2)
print(c) #Just printing the number
#Isn't the output of the statement below much more useful?
print("The hypotenuse of a triangle with width ", a, " cm and height ", b, " cm is ", c, " cm ") 

Looking at the above outputs, which one is more descriptive and will be useful when you return to the code later on? Note the use of comments to help you understand what the code is trying to achieve. Also take note of including units in your print statement, this will help to improve legibility.

<div style="background-color: #FFF8C6">
Having control of how you present your results is also very important. As scientists we should only quote results to the relevant number of significant figures. Look at the following code:

In [None]:
#This snippet of code calculates the hypotenuse of a triangle,given the height and width
a=1 #Height in cm
b=2 #Width in cm
c=np.sqrt(a**2+b**2)
print("The hypotenuse of a triangle with width ", a, "cm and height ", b, " cm is ", c," cm ")

<div style="background-color: #FFF8C6">
Here we have taken the the above hypotenuse code but this time given it inputs that do not give an integer output. As scientists when we take measurements, we are unlikely to be able to quote to the precision that Python gives us by default. When presenting our work we therefore need to put in an appropriate format that reflects our confidence in the result. To do this, when we print our results to an output, we can use format statements. Run the code cell below and look at the output of the revised print statement.

In [None]:
print("The hypotenuse of a triangle with width %d cm and height %d cm is %.2f cm" % (a,b,c))

<div style="background-color: #FFF8C6">
This method of creating a print statement may look more complicated, but it gives you greater control on your outputs. Let's break down what is used in this statement. Firstly, in between the print brackets we have replaced:
```python 
, a,
, b,
, c,
``` 
with the commands:
```python
"%d"
"%d"
"%.2f"
```
within the string itself. The % symbol here acts as a place holder: it reads as "insert variable value here". The 'd' means that the variable should be printed as an integer. The '.2f statement means that the variable should be printed as a float with two decimal point.

Now we need to specify which variables should be printed. To do this, we add the command:

```python
% (a,b,c)
```

after the string. Note that the variables need to be given in the order of which they should appear in the text.

**Exercise: look at the code below that calculates the area and volume for a sphere for a variable radius which is in units of meters. Interpret this code, and make it code legible by including comments, string outputs and format statements.**

In [None]:
r=1
a=4*np.pi*r**2
v=4*np.pi/3*r**3
print(r,a,v)

<div style="background-color: #FFF8C6">
    
#### Formatting file output
Open the file you created in Exercise 17. You may notice that the formatting of your file may not be exactly to your liking. For example, the header names might not line up with your data columns, and the data may have too many decimal places. All this can be fixed by changing the formatting of the output that is written to file. For example, we can force the number of decimal places in the three columns of data by setting the following keyword in the <span style="color:blue">savetxt()</span> function: <span style="color:blue">fmt="%.d %.2e %.2e"</span>.
This ensures the first column is printed as an integer, and the other two columns are printed using scientific notation with two decimal places. If you have time, experiment with changing the format of the data in your output file. 

For a more comprehensive guide on formatting output, have a look at [this tutorial](https://www.python-course.eu/python3_formatted_output.php). Knowing how to format Python output appropriately will come in very handy in the future.

### How to tackle coding problems

In general when faced with a coding problem, writing code right away sometimes isn't the most efficient route. Taking the time to think about and prepare yourself is key to getting your code running as quickly as possible. This can come in may forms, but some of the most important methods are:

- Writing Flow Diagrams
- Writing Pseudo-code
- Rubber Duck Programming (i.e. say what you see)

**Writing flow diagrams** is useful when you are planning how the workflow of your program will progress. This will become crucial when we look at concepts such as for loops and if/while statements in session 3, but a basic flow diagram can help you think about how you want your code to be structure.
 

![flow_diagram](https://cclewley.github.io/ComputingYr1/Images1/flow-diagram.PNG)

The above flow diagram is very general and lays out how most coding will be tackled. In reality, your flow diagrams will be centred around the 'perform calculation' section.


**Pseudo-code** is a collection of code snippets and plain text thatare  meant to describe *how* you will implement code. It comes after designing the structure of your code with a flow diagram, and is not meant to be executable code; rather it will help clarify your thoughts and give you a methodology with which to build your code. Look at the pseudo-code below of the triangle example we have used previously.

In [None]:
import numpy as sp

Define length and height of triangle

calculate hypotenuse - sqrt(length**2 + height**2)

Print out results

Finally if you are having issues with the execution of your code, sometimes it is worth talking about it. **Rubber duck programming** is an affectionate term for talking through your code line by line. You will find that demonstrators often ask you for a rubber duck explanation when you have a problem with your code (although they won't usually use this term)! The idea is that you describe exactly what each line of your code is supposed to do. By saying out loud what the code is *meant* to do, and really focussing on what you have actually *written*, it will be easier to identify issues with your code.

You have now finished the core material of this session. If you have time left, and have not yet attempted the optional material in yellow, this is a good moment to go back and look at it. There are many tips and exercises in there that will be useful to your coding in the future. Alternatively, if you have already completed *all* the material in this work sheet, you may want to attempt the advanced work sheet. This deals with different data types and introduces you to the pandas package, which is designed for the analysis of large data sets.
## Please complete the [Mentimeter Poll](https://www.menti.com/jpihm6f36h) for this session! 