In [1]:
# this is some code to get pretty highlighted cells in the notebook - ignore this
from IPython.display import HTML
style1 = "<style>div.warn { background-color: #fcf2f2;border-color: #dFb5b4; border-left: 5px solid #dfb5b4; padding: 0.5em;}</style>"
HTML(style1)

In [2]:
# this is some code to get pretty highlighted cells in the notebook - - ignore this
style2 = "<style>div.tip { background-color: #D6EAF8;border-color: #D6EAF8 ; border-left: 5px solid #5DADE2; padding: 0.5em;}</style>"
HTML(style2)

# Getting Started 

This notebook we will help you get familiar with using Python in the Jupyter Notebook environment - this is important not only for practice required for the assessements, but also to go through some of the basic concepts of the course. Some core concepts of Data Analysis will only be taught via the Jupyter notebooks and live sessions (and of course the notes).

Jupyter notebooks allow us to write text in cells using the **Markdown** format.

It also allows us to write code in cells, we will use **Python** in this course.

Please ensure you have watched the Chapter 0 video and have set up Google Colab and read through the Chapter 0.

So how do we use Python to work with data? Useful things including reading in datafiles from experiments or telescopes, plotting data, plotting error bars, calculating some useful statistics, and perhaps probabilities, and fit models (lines/curves) to the data.  Python is really useful because we can *import* packages that already have built in tools to make publication quality plots, or estimate the mean or median and so on. These packages include  Numerical Python `numpy` and Scientific Python `SciPy` as well as plotting packages such as `Matplotlib`.   

This introduction and the weekly guides are to support you to carry out scientific analysis using data. You do not need to be an experienced programmer, since there are some easy hints and tips that will allow you to do things in a simple way. Eg knowing that you can write an equation in python in the following way `y = x**2` and not `y=x^2`.

## You will learn the following things this week:

- How to use Jupyter (Colaboratory) notebooks
- How to set up your Google Colaboratory 
- How to access and submit Jupyter Notebook assignments
- How to write a Python program and run it

## How to get started - python

If Python is new to you, we recommend getting started with [Cardiff Python tutorials](https://alexandria.astro.cf.ac.uk/Joomla-python/index.php/labs). This website also contains other resources (in left hand menu), such as a quick overview of python for coders. This url may require you to click Advanced to bypass the security exception.


For further help with Python, these sites are good: 
- [The Python Tutorial](https://docs.python.org/3/tutorial/)
- [problem-solving with python](https://problemsolvingwithpython.com/)
- [short guide to python/data analysis](https://www.authorea.com/users/18589/articles/304710-a-short-guide-to-using-python-for-data-analysis-in-experimental-physics) 
- and an online book from one of the best educators in python: [Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/).

Also this notebook will help you get to grips with the basics. Any questions, please do raise them at the weekly live sessions or at office hours.

## How to get started -  Jupyter Notebooks


You are already using a Jupyter notebook if you are reading this! They are an excellent way to interactively write and execute codes in cells, whilst you can plot your data, discuss your results, fit a model and write the equations and interpretation all in one document. No need to cut and paste code from files into word documents.

The first thing you should when you open a Jupyter Notebook is go to (Colab) Runtime in the top menu and Run All or (for Anaconda) Cell and select Run All. This will ensure you have everything formatted correctly. Do this now for this notebook. 

Jupyter notebook files end in .ipynb.

Open your notebook. Hopefully it will not look to strange to you since Jupyter Notebooks is simply a word processor and code compiler in one. Take some time to look at the menus and see what different options there are.  

The notebook contains cells. A cell is a place to either display text or to write some code to be executed by the notebook’s kernel. A kernel is the engine that runs the code in the notebook.

**Using the notebook to do some analysis** 

When you open the notebook, you can see that the first cell is automatically configured to be a code cell. But we want to write a bit of introduction at the top of our notebook, to remind ourselves what this notebook is for.

To do this we want to change from a code cell to a text cell. Select the drop down menu in the toolbar that currently says Code and select Markdown instead. Markdown is a writing language with plain text formatting syntax which is nice and simple but can also be converted to HTML. You should notice that the In []: that previously appeared to the left of the cell has disappeared. Now we can type directly into the cell using normal text. Type in something like:

This is my first Jupyter notebook

Let’s now insert a new cell by clicking on the Cell > insert cell below. This time we’ll keep it in the code format and do some basic maths in python.

Enter the following text to this new cell:

`x=5
y = x**2.
print(y)`

Your cell will look like this:

In [3]:
x = 5
y = x**2.

print(y)

25.0


And we can see that the cell has output the answer y.  But we only get the output answer by compiling the code which in notebooks means we have to click ▶Run or Cell > Run Cells.  

Note that to run a cell on its own, select it with the mouse and click ▶Run in the toolbar menu. Or you can go to the Cell menu and click Run, Run All, Run All Above or Run All Below etc. To stop a
cell from compiling/running we can click the ◼︎ button to stop the notebook or interrupt it. This is useful if we have complicated code or lots of data being plotted, as it can be very slow at times.

Each cell remembers what is above it, this is both good and bad as you only need to
define things/import packages once, but you will need to be careful when writing code with similar looking parameters in one notebook!

### Saving and exporting Jupyter Notebooks

Now is a good time to **Save** your notebook. Do this via clicking "File > Save" and checkpoint or pressing Ctrl + S, or by selecting the floppy disk symbol below the File menu.

It is best practice to save your changes regularly as computers or browser can crash or 'hang'.  

Jupyter notebooks use a Checkpoint which is hidden in a subdirectory called .ipynb_checkpoints. Jupyter autosaves to the checkpoint without altering your notebook file every 120 seconds. When you click "Save and checkpoint", it updates your notebook and the checkpoint files. 

This means that you can recover your unsaved work if something goes wrong by reverting to the checkpoint from the menu via "File > Revert to Checkpoint"

### Submitting Assessed Coursework

For submitting assessments for this course, you will submit a Jupyter Notebook to Turnitin on Learning Central. 

You can export your files to HTML or PDF under the menu File > Download As.  

As practice for the Assessments, try the following: File > Download as > HTML.

This will download a html version of your notebook to your own computer. 
Don’t forget to double check that the .html file you created has **all the cells compiled** (as we need them to be compiled to output the answers or make the Markdown text readable) **before** you export to a html. 

Make sure you name your file something sensible eg Assessment1_studentno.ipynb and save it to a folder called something like Module Code/Notebooks/Coursework.  

Open your html file and check everything looks like it should!

***

# Getting started with Markdown

Markdown is a really simple way to write text in notebooks. We can use it to describe our methods, state equations that we're using and to write our results and interpretation.   So in that sense we wish to use it as a word processor - ie with titles, subheadings, text, formatted equations etc. 

We can use the following in a Markdown cell to bold text by writing \__quantum__ or \**quantum\** 

to make __quantum__ or **quantum**

Here are a few examples to get you started with Markdown, you won't need to do anything more complicated than these examples for this course.

#### <div class="warn">Example</div>

<div class="warn">How to write a title, major heading, a subheading and an even smaller heading in Markdown
    
</div>

To do the following we write:
\# for titles

\## for major headings

\### for subheadings

\#### for 4th level subheadings 


When compiled this looks like:
# for titles
## for major headings
### for subheadings
#### for 4th level subheadings 

#### <div class="warn">Example</div>

<div class="warn">How to write a table in Markdown
    
</div>

A table example is given below <br>
\|a \|b \| c\| <br>
\|--\|--\|--\| <br>
\|1\|2\|3\| <br>

This outputs a nicely formatted table like such:

|a |b | c|
|--|--|--|
|1|2|3|

#### <div class="warn">Example</div>

<div class="warn">How to write an equation in Markdown
    
</div>

 `$\alpha = 10^{2}$` produces $\alpha = 10^{2}$.

`$y=x^2+4$` produces $y=x^2+4$.

`$y = a+sin(bx)+c$` produces $ y =a + sin(bx)+c$

`$y = \dfrac{x}{x+3}+\left(\dfrac{z}{z^2-1}\right)$` produces $y = \dfrac{x}{x+3} + \left(\dfrac{z}{z^2-1}\right)$


#### <div class="warn">Example</div>

<div class="warn">How to write a list in Markdown.
    
</div>

Sometimes we want to include lists.
* Which can be indented.

Lists can also be numbered.

1. Item a
2. Item b

Or:
 
* Item 1
* Item 2

One can also use "-" to make a bullet point list.

#### <div class="tip"> Tips</div>

- You can attach image files (jpg, png) directly to a notebook in Markdown cells by dragging and dropping it into the cell.

- Paragraphs must be separated by an empty line. 

- For additional markdown tips, see this [blog](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) and this [blog](https://guides.github.com/features/mastering-markdown/). 

***

# Getting started with Python

## Importing packages

In general, it is good practice to start each notebook with a cell that explains what the notebook is for (select Markdown in drop menu at top) and a cell that imports useful python libraries e.g.

In [4]:
import numpy as np
import scipy as sp
import pylab as plt
%matplotlib inline

What does the above cell do? 

`numpy` is a package consisting of lots of numerical tools. 
`scipy` is a package consisting of lots of useful scientific tools.
`pylab` is a  package for scientific plotting.
`%matplotlib inline` is a 'magic command' used when running Python within a Jupyter notebook. It allows the display of data plots within the notebook. 

Other useful packages to be used in this course include

`import scipy.stats as st`

`import pandas`

and `import math as m`

The `math` package includes tools like trig functions eg $sine$ etc.  Once we've imported the packages, we can use tools inside them by writing `mean = np.mean(x)` to get the mean of a set of values $x$ calling `numpy` to do the work for us. Similarly we can use `y = np.sqrt(x)` to get the square root of $x$, `y=np.log(x)` for log (base 10) of $x$ and `np.pi` to call $\pi$.

***

## Using Python as a calculator

You can use Python as a calculator. Below are some examples.

In [5]:
(1+2+3+4+5+6+7)/5

5

In [6]:
9.**2

81.0

In [7]:
(3./1.) + (4+5+3+2)**2.


199.0

Sometimes Python spits out a zero when it is not expected, or a number with no decimal places (1).  This is likely because you have typed 7/4 instead of 7./4. or 3/1 instead of 3./3 and Python has then rounded down to the nearest integer. To be safe always use the decimal point to let Python know it is a floating point value and not an integer.  

#### <div class="tip">Tip</div> 

If something seems slightly "off" with your numbers, then go back and check the positioning of your brackets - often a misaligned bracket in an equation is responsible for all sorts of mishap.

***

## Printing out results and text

You can get Python to do some calculations and then print results out (or even print out text if you'd like). We can do this by using `print("some text here")`.  

To print out a result $y$ and some explanation text, we can write `print("some text here",y)`.

Note that we can use words in Python code by defining them as strings. These are made clear by using the `""` or '`''`. You can use strings and do manipulations on them - see examples later on in the notebook.

#### <div class="warn">Example </div>

<div class="warn">Make a cell print out Hello World.
</div>

In [8]:
print('Hello World')

Hello World


#### <div class="warn">Example </div>

<div class="warn">You've found a model $y(t) = \dfrac{1}{2}gt^2+v_0t+y_0$ explains the experiment you've been working on in the lab where you've been searching for an equation that describes the position of a falling body $y$ as a function of time $t$ in free-fall. $g$ is the acceleration due to gravity and $v_0$ and $y_0$ are the initial conditions.  
    
What is the value of $y(t)$ for $t = 2.56$s if $v_0 =1.26$m/s and $y_0 = 1.35$m?
    
</div>

In [9]:
t=2.56
v_0 = 1.26
y_0 = 1.35
g = 9.81

y = 0.5*g*t**2+v_0*t+y_0
print(y)

36.721008


To explain our result to the reader of the notebook, we can write a little more information:

In [10]:
print("the value of y is",y)

('the value of y is', 36.721008000000005)


Note that I didn't have to do all of the same numbers and equations again, because Jupyter notebooks *remembers all of the definitions and everything in the code above it* if it is compiled.

Now we want to be even more scientific and include units and formatting in our result, since the large amount of the significant figures above are not realistic (we very rarely know parameters to this level of precision).

This brings us to the point of formatting numbers.  There are many ways to do this, we can format numbers using Python's `str.format()`. To do this, we can use 

`print("FORMAT".format(NUMBER))` 

where `FORMAT` is what we want our output to look like. Eg 2 significant figures is `"{:.2f}"` or `"{:.0f}"`. `NUMBER` is either a number which needs formatting or your result (in the case above we would replace this with y).

In [11]:
print('This is a bad example: the value of y is',y) 
print()

print("This is a formatted example:","{:.2f}".format(y))
print()

print('Now this is a great example:')
print('The position of a falling body $y$ at time $t=2.56$s in free-fall is {:.2f} m.'.format(y)) # makes it 3 sig figs after decimal pt

('This is a bad example: the value of y is', 36.721008000000005)
()
('This is a formatted example:', '36.72')
()
Now this is a great example:
The position of a falling body $y$ at time $t=2.56$s in free-fall is 36.72 m.


Note you might have seen above, that you can also put *comments in the code cells* by writing what your line of code is doing using a #.  This is really good practice so that when you return to your code later, you can tell yourself why that line of code is there.

***

## Lists

You can create a list of objects in Python. This is handy because we can do all sorts of cool things with them, such as adding two separate lists together, pulling out values from the list.   *The elements of a list are numbered from zero.* So if you have a list of 5 values, the elements in that list are numbered 0, 1, 2, 3 and 4.

In [12]:
a = [1,2,3,4,5]
b = [6,7,8,9,10]

c = a+b
print(c)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [13]:
print('first element in c is',c[0])
print('fifth element in c is',c[4])
print('the number of data points in c is',len(c))

('first element in c is', 1)
('fifth element in c is', 5)
('the number of data points in c is', 10)


***

## Iterating over values - for loops

Python can iterate over the items in a list. For loops great for repeating something a fixed number of times. Python has a `range` function so that `range(10)` will give a set of 10 integers from 0 to 9.  `range(1,5)` will give a set of integers from 1 to 5, and `range(6,10,2)` will give a range of 6-10 in steps of 2.


#### <div class="warn">Example </div>

<div class="warn">Write a for loop to print out "I have a pet...." for all of the animals in the list animals = ["lemur","panda","shark","bat"].  
    
Write another loop to print out all the letters in "sloth".
</div>

In [14]:
animals=["lemur","panda","shark","bat"]

for item in animals:
    print("I have a pet " + item)

print()
for item in "sloth":
    print(item)


I have a pet lemur
I have a pet panda
I have a pet shark
I have a pet bat
()
s
l
o
t
h


The for loop needs a colon (:) and everything after that (related to the for loop) must be indented.

Instead of only having one value of $t$ to calculate $y(t)$, what happens if we have lots of values of $t$ and want to check the position $y$ for each of those times?  We can make an array of times and then figure out what $y(t)$ is for each time.  

#### <div class="warn">Example </div>

<div class="warn"> Calculate the position of a falling body in free-fall from $t=0$s to 1000s in steps of 50s.  
</div>

In [15]:
t=range(0,1000,50) 
# set up an array of times which starts at t=0, ends at t =
# 1000s in steps of 50s

# now call the equation for each time by using the for loop:
for i in range(0,len(t)):   # each value of t is called as t[i] where i is an index
                            # from t[start] to t[end]
    y = 0.5*g*t[i]**2+v_0*t[i]+y_0
    print(y)

1.35
12326.85
49177.35
110552.85
196453.35
306878.85
441829.35
601304.85
785305.35
993830.85
1226881.35
1484456.85
1766557.35
2073182.85
2404333.35
2760008.85
3140209.35
3544934.85
3974185.35
4427960.85


If instead we used the following code, we would only get the last value of $y$.

In [16]:
# now call the equation for each time by using a for loop:
for i in range(0,len(t)):   
    y = 0.5*g*t[i]**2+v_0*t[i]+y_0

print(y)
print('so we see we only get one value of y if we print outside the for loop')

print()
# though we should use the following for our print statement:
print('Note: the correct way to write this would be something like:')
print('The position of body in free fall at t=1000s is {:.2e} m'.format(y)) 

4427960.85
so we see we only get one value of y if we print outside the for loop
()
Note: the correct way to write this would be something like:
The position of body in free fall at t=1000s is 4.43e+06 m


One way to get around this issue (where you can only get the values of $y$ for all $t$ inside the for loop and not outside) is to use Python `append` function.

In [17]:
y = [] # tell the code that y will have more than one value

for i in range(0,len(t)):   
    y.append(0.5*g*t[i]**2+v_0*t[i]+y_0)

print(y)

[1.35, 12326.85, 49177.35, 110552.85, 196453.35, 306878.85, 441829.35, 601304.85, 785305.35, 993830.85, 1226881.35, 1484456.85, 1766557.35, 2073182.85, 2404333.35, 2760008.85, 3140209.35, 3544934.85, 3974185.35, 4427960.85]


*Yay - we have basically saved all the values of y into one list.*

A much simpler alternative way of calculating $y$ for every value of $t$ without having to do the for loop iteration over each $t$ value is shown below (in fact we can do it in one line):

In [18]:
y = [0.5*g*x**2+v_0*x+y_0 for x in t]

print(y)

[1.35, 12326.85, 49177.35, 110552.85, 196453.35, 306878.85, 441829.35, 601304.85, 785305.35, 993830.85, 1226881.35, 1484456.85, 1766557.35, 2073182.85, 2404333.35, 2760008.85, 3140209.35, 3544934.85, 3974185.35, 4427960.85]


Now there may be times when you wish to take your list of $y$ values and do some mathematical operations on them, for example what is $y^2$?  

#### <div class="warn">Example </div>

<div class="warn"> Calculate the square of the position of a body in free fall.   
</div>

Naively we might think we can simply just multiply all the values in our $y$ list by itself.

In [19]:
print(y*y)

TypeError: can't multiply sequence by non-int of type 'list'

Obviously this has not worked. This is because it's a list and we can't multiply a list this way (we'd have to do another for loop). We can convert our Python list into a Python array using `numpy` and then do mathematical calculations with our data - let's say $y^2$ for example:

In [None]:
y_arr = np.array(y)
y_sq = y_arr*y_arr
print(y_sq)

To find out if you have a list or an array you can use the `type` command.  We will return to `numpy` arrays later.

#### <div class="warn">Example </div>

<div class="warn"> Find out the types of the following `a=1`, `b=1.0`, and `c="1.0"`. 
</div>

In [None]:
a=1; b=1.0; c='1.0' 

print(type(a))
print(type(b))
print(type(c))

So what is the type of our data $y (t)$?

In [None]:
print(type(y))

It's a list!

***

## If Statements

If statements allow us to have conditions. For example, if the speed of the falling body goes above a critical speed $v_{\rm crit}$, then perhaps a different equation applies.
   

#### <div class="warn">Example </div>

<div class="warn">You are asked whether you have any symptoms of a fever or a continuous cough. Use Python `if` statements to print an appropriate message if the answer to this question is yes, no, or not sure.
</div>

In [None]:
your_symptoms = "ill"

if your_symptoms == 'ill':
    print('You are '+ your_symptoms + ' - You need to go home and rest in isolation.')
elif your_symptoms == 'not ill':
    print('You are '+ your_symptoms +' - OK, take care of yourself.')
elif your_symptoms == 'not sure':
    print('You are '+ your_symptoms + '- Best to be safe and go home and rest.')
else: 
    print('I am sorry, I do not know what to say.')

Note that here we are making use of strings `" "` instead of numbers.

#### <div class="warn">Example </div>

<div class="warn">Let's suppose we want to estimate the position of a body falling in free fall $y$ only when $t \leq 100$s. At any other times, we set it to equal to zero (ie they hit the ground at 100s). We can do this using Python `if` statements.
</div>

In [None]:
y = []

for i in range(0,len(t)):
    if (t[i] <= 100):
        y.append(0.5*g*t[i]**2+v_0*t[i]+y_0)
    else:
        y.append(0.)

print(y)

In these examples you will see that Python tests for equality uses the `==` to separate it from the `a = [1,2,3,4,5]` assignments of variables. We can also use `<`, `<=`, `>=` and `>` in Python code.

***

### Arrays


In science we often have data $x_0, x_1, x_2, ..., x_{n-1}$ and related $y_0, y_1, y_2, ..., y_{n-1}$.  We can use a Python list a for the $x$ values with elements a[0], a[1], a[3],..., a[n-1] and so on.  Python lists can contain any type of Python object, but sometimes we want one to contain numbers only. We can use arrays, which can be mutidimensional (eg 3x3 array instead of the 1d list). Python arrays are lists of objects of the same type and can make your code faster if you use them.

#### <div class="warn">Example </div>

<div class="warn"> Convert the $y(t)$ values from above (which are currently in a Python list) into an array and check it has worked. 
</div>

In [None]:
y = np.array(y)

print(type(y))

It works! We have used `numpy` to convert our list of values into an array that can be manipulated mathematically. 

As we used `range` earlier to make a list of time array of values, we can use `np.arange()` to make an array of time values. Examples are below:

In [None]:
a = np.arange(10) # make an array with 10 values (produces integers)

print(a)

In [None]:
a = np.arange(0,10,1) # now go from 0-10 in steps of 1 (produces integer)
print('a=',a)


b = np.arange(0,10,0.5)  #now go from 0-10 in steps of 0.5 (produces float)
print('b=',b)



How can we check the types of the numbers in a and b? Are they integers or floats? If we use `type(a)` this gives us the type of a (an array) so we need to call one of the numbers in the array and check it's type. We will do this using the 1st value in the array `a[0]`.

In [None]:
# what is the type of the 1st number in a 
# what is the type of the 1st number in b 
print('The type of a is', type(a[0]), '-- is it an integer? Yes.')
print('The type of b is', type(b[0]), '-- is it a float? Yes.')

#### Slicing data, getting subsets of data

Some examples of slicing data samples where `a` is an array of data with length `len(a)` and has a starting data point `a[0]` and an end data point `a[len[a]-1]`.  We call the starting and ending data points we wish to pull out of the array `start` and `stop` in the examples below.

`a[start:stop]`  ----  items start through stop-1

`a[start:] ` ---- items start through the rest of the array

`a[:stop]`   ---- items from the beginning through stop-1

`a[:]  `    ---- a copy of the whole array

`a[start:stop:step]` ---- can have a step size to pull out say every 100 points from the array

#### <div class="tip">Tip</div>

You can use `numpy` to get the min, max and mean, standard deviation for your arrays using `np.mean(a)` etc.

***

## Equations and functions

If you have to use the same function a few times, or like tidy code, it's easiest to define a function.  Here is an example function for $y=x^2+2$.

In [None]:
def my_equation(x):
    return x**2+2

In [None]:
a = 3.4
b = my_equation(a)

print(b)

In [None]:
print('for a =  3.4s, the value of b is {:.2f}'.format(b)) # no units given in question.

#### <div class="warn">Example </div>

<div class="warn">Write a function for the position of a body falling in free fall for time $t=30$s. The initial speed and position are 3m/s and 1m.
</div>

In [None]:
def position(time,accel,v_init,y_init):
    value = 0.5*accel*time**2+v_init*time+y_init
    return value

t = 30.
g = 9.81
v_0 = 3.
y_0 = 1.

y = position(t,g,v_0,y_0)

print('at t =  30s, the value of position is {:.2f} m'.format(y))

#### <div class="warn">Example </div>

<div class="warn">Now get values of the position of a body falling in free fall for times from $t=0$s to $t=100s$ in steps of 10s. The initial speed and position are 3m/s and 1m.
</div>

In [None]:
t = np.arange(0,100,10) # set up time values

y = position(t,g,v_0,y_0)

print('time in secs',t)
print('position in m',y)

Now we can use a for loop to print this out nicely so that we have $t$ and $y$ values in columns:

In [None]:
for i in range(0,len(t)):
    y = position(t[i],g,v_0,y_0)
    print(t[i],y)

***

## Importing data files

#### <div class="warn">Example </div>

<div class="warn">Read in the data file DataAnalysis_testfile.dat.
</div>

In [None]:
import numpy as np

data = np.genfromtxt('DataAnalysis_testfile.dat')

The data file must be in the same directory as the notebook or you will need to tell the notebook where to look for the file.   In this example we will use the `numpy` package (shortened to `np`) and call the `genfromtxt` tool to read the data file. First thing I would do is print out the data to see what it looks like.

In [None]:
print(data)

So we can see we have 3 columns of data.  Let's check if there are column headings:

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat',names=True)
print(data.dtype.names) # this gives us our column names from the data file

It looks like our data file contains $x$, $y$ data and error bars on the $y$ data.

#### <div class="tip">Tip for looking at your data </div>

We can pull out some numbers and take a look at our data.  we'll look at the first value, the number of datapoints we have, and then the first column and second row:

In [None]:
# print first value in first column
print('first value in first column:')
print(data[0,0])
print()

# print length of data
print('number of datapoints:')
print(len(data))
print()

#Print second row
print('all of the second row:')
print(data[1,:])
print()

# print first column
print('all of the first column:')
print(data[:,0])
print()

Sometimes an error occurs with the `np.genfromtxt`. If this happens, you might want to skip the first row (often in data files there are words in the first row, known as the header information. To get around this you can use:

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat',skip_header=1)

Sometimes you need to specify the delimiter in the data file (tab/space/comma), in this case it looks like:

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat',delimiter=' ')

Now if we have a header row in our data file and we want to know what the columns are, we can use: 



***

## Plotting data

The most commonly used library for plotting sciencitific data in Python is matplotlib which can be used by importing the `pylab` module.  There are a wealth of [online examples](https://matplotlib.org/gallery.html) for making plots, some publication level quality, and some more simple. In this course, we don't need to spend time on making publication level quality plots, but rather to try and make scientific data appear clear - ie labelled axes, large enough fontsize, legends.

Plotting in Python is very intuitive, we basically use plot(x,y) where x and y are lists of numbers or arrays. You can add more data via using plot(x2,y2) to add more lines/datapoints on the same curve.  We can logscale our plots or our data points if we wish using `plt.loglog(x,y)` or `plt.semilogy(x,y)`. We can show scatter plots (data points) using scatter(x,y).  Histograms can be drawn using hist(x,bins) and so on.

#### <div class="warn">Example </div>

<div class="warn">Plot $y=sin(x)$ for $x$ ranging from 0 to $6\pi$ in steps of $0.1\pi$. 
</div>

In [None]:
x = np.arange(0,6*np.pi,0.1*np.pi)
plt.plot(x,np.sin(x))
plt.xlabel('x')
plt.ylabel('y')

We can do more advanced stuff, such as 

In [None]:
plt.plot(x,np.sin(x),label='sin(x)',lw=2) #lw = thickness of line, label = the name of curve fn or datafile
plt.plot(x,np.cos(x),label='cos(x)',lw=2,c='magenta',alpha=0.6) # c=colour,alpha=transparency
plt.xlabel('x',fontsize=16)
plt.ylabel('y(x)',fontsize=16)
plt.legend(loc='upper right')  # makes the legend
plt.title('The variation of sin(x) and cos(x)',fontsize=16) # add a title

We can also limit the range of $x$ and $y$ plotted using `plt.xlim(0,15)` and `plt.ylim(-1,1)` for example. This is useful for zooming in on areas of the plot.

#### <div class="warn">Example </div>

<div class="warn">Plot $y={\rm exp}\left(\dfrac{-x}{5}\right)$
</div>

In [None]:
import numpy as np
import pylab as plt # for plotting
# the line below makes the plot appear in the jupyter notebook
%matplotlib inline  

# define function to calculate y
def func(x):
    return np.exp(-x / 5.0)

t = np.arange(0.01, 20.0, 0.01)
plt.plot(t, func(t),c='magenta',lw=2)
plt.xlabel('x (units)',fontsize=16)
plt.ylabel('y (units)',fontsize=16)


#### <div class="warn">Example </div>

<div class="warn">Plot $y={\rm exp}\left(\dfrac{-x}{5}\right)$ on different logscales.
</div>

In [None]:
plt.subplot(221) # makes a row=2 x column=2 grid of plots, 3rd number is index of each subplot 1-4
plt.plot(t, func(t),c='magenta',lw=2,label='linear scale')
plt.legend(loc='upper right')

plt.subplot(222)
plt.semilogy(t, func(t),label='semilog y')
plt.legend(loc='lower left')

plt.subplot(223)
plt.semilogx(t, func(t),label = 'semilog x')
plt.legend(loc='lower left')

plt.subplot(224)
plt.loglog(t,func(t),basex=2,label='log base 2')
plt.legend(loc='lower left')

#### <div class="warn">Example </div>

<div class="warn">Plot the data in the data file DataAnalysis_testfile.dat.
</div>

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat',names=True)
plt.scatter(data['x'],data['y'])
plt.xlabel('x',fontsize=16)
plt.ylabel('y',fontsize=16)
plt.title('Comparison of data $x$ and $y$ from DataAnalysis_testfile.dat')

We could also plot the above if we don't have any names for columns, see below:

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat')
plt.scatter(data[:,0],data[:,1])

### Errobars

Often in physics, we have errors in our measurements. These can be easily added via the `errorbar` option in `matplotlib`.

In [None]:
data = np.genfromtxt('DataAnalysis_testfile.dat',names=True)
plt.errorbar(data['x'],data['y'],yerr=data['y_error'],fmt='o')
plt.xlabel('x',fontsize=16)
plt.ylabel('y',fontsize=16)
plt.title('Comparison of data $x$ and $y$ from DataAnalysis_testfile.dat with errorbars added',fontsize=16)

#### <div class="tip">Tip</div>

All of the plotting options that can be used in matplotlib can be found [here](https://matplotlib.org/3.3.0/api/pyplot_summary.html).

***

### <div class="tip">General Tips</div>

* Dont forget to save your notebook regularly

* Use spaces and empty lines to make program clearer.

* Add comments to your code so you know why you're doing something.

* Use sensible names for variables and functions, choose names you would be able to understand if you came back to your code a year later.

* With equations and calculations, use same variables as your equations in the notes/lectures so you will be able to directly compare.  Do all of the calculation in the python cells, don't do part of it in your head/in your notes.

* Always have a markdown box before and after code cells explaining what you're going to do, and discussing the what results.

* Don't forget to click run to get cells to compile, and run all before submitting assessed work.

Now you are ready to tackle the **Chapter 1 quiz** on Learning Central and the **Chapter 1 Introduction to Python for Data Analysis-Your-Turn** notebook.

***