# The QSMSC computing course presents...
![Python logo](python-logo.png)

## Last time we covered...
    
### Printing

In [None]:
print("Hello world")
print(5)
print("You can print", "more than", 1, "thing at a time")

### Variables

In [None]:
a = 12
my_variable = "a string"
print(a, my_variable)

### Maths

In [None]:
add = 4 + 5
sub = 6 - 4
mul = 2.5 * 2
div = 9 / 2
mod = 3 % 2
exp = 4 ** 2
who = 9 // 2

print('4 + 5 =', add)
print('6 - 4 =', sub)
print('2.5 * 2 =', mul)
print('9 / 2 =', div)
print('3 % 2 =', mod)
print('4 ** 2 =', exp)
print('9 // 2 =', who)

### Lists

In [None]:
# making a list
vegetables = ['potato', 'carrot']
print('vegetables = ', vegetables)

# appending to the list
vegetables.append('turnip')
print('after appending, vegetables =', vegetables)

# accessing items
print('the first item is', vegetables[0])
print('the last item is', vegetables[-1])

# taking an item out of the list
vegetables.pop(1)
print('after pop(1), vegetables = ', vegetables)

### The `if` statement

In [None]:
my_variable = 5
my_other_variable = 2

if my_variable >= 4:
    print('my_variable is >= 4')
else:
    print('my_variable is not >= 4')

In [None]:
my_variable = 5
my_other_variable = 2

if (my_variable < 3) and (my_other_variable >= 4):
    print('both conditions True in the second if statement')
else:
    print('one of other condition was False in the second if statement')

### The `for` loop

In [None]:
# range lets us loop over a list of numbers
for i in range(10):
    print(i)

In [None]:
# you can also use a for loop on lists
for country in ['France', 'Spain', 'Italy']:
    print(country)

### The `while` loop

In [None]:
x = 0
while x < 10:
    print(x)
    x = x + 2

### Defining functions

In [None]:
def say_hello(name):
    print('Hello', name)
    
say_hello('Jon')

In [None]:
def add_two_numbers(x, y):
    return x + y

add_two_numbers(4, 3)

## This week

This week we're going to quickly go over another type, the [dictionary](https://docs.python.org/2/tutorial/datastructures.html#dictionaries).  Then we're going to solve an example problem using Python's library to read and analyse a CSV of data.


## The dictionary

<img alt="Safe deposit boxes" src="safe_deposit_sm.jpg" style="width: 400px;">

Dictionaries are a little like the safe deposit boxes at a bank.  They hold a collection of items and you need a key to get each item.  This is very useful collecting this together into records e.g. a patient's name, age and gender.

You use the [`dict()`](https://docs.python.org/2/library/stdtypes.html#dict) function to create a dictionary but there's a useful shortcut:

    my_dictionary = {
        'key1': 'value1',
        'key2': 'value2'
    }
    
The values in a dictionary can be any python type and you can mix them so a dict can contain e.g. strings, numbers, lists and even other dictionaries.  The keys must be immutable (this means that they can't change), immutable types include: strings and numbers (note that lists and dictionaries are not immutable).  You access the values in a dictionary using square brackets:

    my_dictionary['key1']

In [None]:
# make a dictionary describing a single patient
patient = {
    'name': 'Alice',
    'age': 32,
    'symptoms': ['nausea', 'headache']
}

print(patient['name'])
print(patient['age'])
print(patient['symptoms'])

In [None]:
# make a dictionary with patient IDs as keys and another dictionary as the values
all_patients = {
    10001: {'name': 'Barry', 'age': 20, 'symptoms': ['headache']},
    22003: {'name': 'Emma', 'age': 41, 'symptoms': ['headache', 'nausea']},
    32004: {'name': 'Emma', 'age': 41, 'symptoms': ['ataxia']},
}

print(all_patients[10001])
print(all_patients[22003])
print(all_patients[32004])

**Try this out yourself in the box below.**

## Imports

<img alt="XKCD Python" src="xkcd_python.png" style="width: 500px;">

A lot of Python's power comes from its extensive [standard library](https://docs.python.org/3/library/index.html) (things included when you download it) and the large number of [third party libraries](https://pypi.python.org/pypi) available.

You can get access the functions in a library using the [`import`](https://docs.python.org/3/reference/import.html) statement.  To import a whole library use `import foo` (where foo is the name of the library you want.  You can also import just a part of a library using `import foo.bar`.  To use the function `hello()` from the `foo` library you can then do: `foo.hello()`.

In [None]:
# In this example we import the math library from the standard library and then call the factorial function

import math

math.factorial(5)

It's also possible to just import a single function from a library.  If you use this method you can access the function without using the library name.  This can make your code more compact but it can become hard to know where a function came from if you overuse this.

In [None]:
# We'll import just the shuffle() function from the random library

from random import shuffle

x = [1, 2, 3, 4]
shuffle(x)
print(x)

Third party libraries must be installed before you can use them.  The tool to do this is called [pip](https://pip.pypa.io/en/stable/).  You need your own installation of Python before can try this for yourself ([Anaconda](https://www.continuum.io/downloads) is probably the easiest way to get going).  We might have time to cover this in a later session but please ask if you need any help trying this before then.

## Working with CSV files

We're going to work through an example of extracting some data from a CSV file.  In the Jupyter Notebook you external programs commands by preceding the command name with an !.  Let's use the Unix `head` utility to have a look at the start of file.

In [7]:
!head example_data.csv

site,subject,timepoint,dob,brain_vol,lesion_vol
02,001,baseline,1983-05-16,1678.97,14.46
02,001,week_12,1976-03-19,1252.54,163.27
02,001,week_24,1988-10-11,1518.09,90.54
02,002,baseline,1973-07-09,1451.16,35.29
02,002,week_12,1983-07-21,1406.57,185.77
02,002,week_24,1973-01-15,1551.55,133.57
02,003,baseline,1988-07-21,1666.17,78.20
02,003,week_12,1989-07-12,1406.01,83.15
02,003,week_24,1980-04-02,1654.53,116.87


### Reading the file with the csv library

In [9]:
import csv

csv_file = open('example_data.csv', 'r')
csv_reader = csv.reader(csv_file)
for r in csv_reader:
    print(r)

['site', 'subject', 'timepoint', 'dob', 'brain_vol', 'lesion_vol']
['02', '001', 'baseline', '1983-05-16', '1678.97', '14.46']
['02', '001', 'week_12', '1976-03-19', '1252.54', '163.27']
['02', '001', 'week_24', '1988-10-11', '1518.09', '90.54']
['02', '002', 'baseline', '1973-07-09', '1451.16', '35.29']
['02', '002', 'week_12', '1983-07-21', '1406.57', '185.77']
['02', '002', 'week_24', '1973-01-15', '1551.55', '133.57']
['02', '003', 'baseline', '1988-07-21', '1666.17', '78.20']
['02', '003', 'week_12', '1989-07-12', '1406.01', '83.15']
['02', '003', 'week_24', '1980-04-02', '1654.53', '116.87']
['02', '005', 'baseline', '1987-05-08', '1658.24', '63.46']
['02', '005', 'week_12', '1982-02-20', '1585.39', '56.18']
['02', '005', 'week_24', '1974-11-02', '1407.46', '94.47']
['03', '001', 'baseline', '1971-05-05', '1365.18', '73.98']
['03', '001', 'week_12', '1978-01-01', '1318.63', '17.54']
['03', '001', 'week_24', '1976-01-06', '1519.73', '106.48']
['03', '002', 'baseline', '1978-03-12'

### Calculating the mean brain volume

In [17]:
import csv

total_brain_vol = 0.0
num_brains = 0

csv_file = open('example_data.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for r in csv_reader:
    brain_vol_str = r['brain_vol']
    brain_vol_float = float(brain_vol_str)
    total_brain_vol += brain_vol_float
    num_brains += 1
mean_brain_vol = total_brain_vol / num_brains
print('mean brain volume:', mean_brain_vol)

mean brain volume: 1457.4947916666667


### Calculating the lesion volume for site 03

In [18]:
import csv

total_brain_vol = 0.0
num_brains = 0

csv_file = open('example_data.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for r in csv_reader:
    if r['site'] != '03':
        continue
    brain_vol_str = r['brain_vol']
    brain_vol_float = float(brain_vol_str)
    total_brain_vol += brain_vol_float
    num_brains += 1
mean_brain_vol = total_brain_vol / num_brains
print('mean brain volume:', mean_brain_vol)

mean brain volume: 1442.3575


### Removing the date of birth

In [19]:
import csv

total_brain_vol = 0.0
num_brains = 0

csv_file = open('example_data.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for r in csv_reader:
    print(r['site'], r['subject'], r['timepoint'], r['brain_vol'], r['lesion_vol'], sep=',')

02,001,baseline,1678.97,14.46
02,001,week_12,1252.54,163.27
02,001,week_24,1518.09,90.54
02,002,baseline,1451.16,35.29
02,002,week_12,1406.57,185.77
02,002,week_24,1551.55,133.57
02,003,baseline,1666.17,78.20
02,003,week_12,1406.01,83.15
02,003,week_24,1654.53,116.87
02,005,baseline,1658.24,63.46
02,005,week_12,1585.39,56.18
02,005,week_24,1407.46,94.47
03,001,baseline,1365.18,73.98
03,001,week_12,1318.63,17.54
03,001,week_24,1519.73,106.48
03,002,baseline,1264.60,140.90
03,002,week_12,1568.30,66.12
03,002,week_24,1418.35,143.30
03,003,baseline,1662.61,51.98
03,003,week_12,1662.12,101.27
03,003,week_24,1238.56,169.19
03,005,baseline,1471.65,191.47
03,005,week_12,1202.63,98.45
03,005,week_24,1615.93,20.72
04,001,baseline,1680.93,169.55
04,001,week_12,1440.62,159.16
04,001,week_24,1393.31,91.01
04,002,baseline,1548.13,141.07
04,002,week_12,1551.23,119.59
04,002,week_24,1392.54,24.38
04,003,baseline,1583.70,190.51
04,003,week_12,1350.02,117.40
04,003,week_24,1590.40,158.17
04,005,baseline

## A much easier way using Pandas

[Pandas](http://pandas.pydata.org/) is a third-party library for data analysis in Python.  It includes functions for reading many types of data and performing analysis.  It can make some of what we did above a lot easier:

### Image credits
Safe deposit boxes: Public domain

Python comic: [XKCD](https://xkcd.com/353/) Creative Commons Attribution-NonCommercial 2.5 License