
<img src = 'https://i.imgur.com/t9igjIU.png' width = "200" height = "" >




# Week 2: A whirlwind tour of python

Here we will go quickly through some basic python expressions and touch briefly on some of the main packages for scientific computing. When you see a stop sign, let me know and we can discuss. 

## Assignment, types & lists

Here we create the variable x and assign it a value of 1.

In [None]:
x=1

We can access the type as follows

In [None]:
type(x)

It is an integer.

Note that python allows us to assign multiple values to multiple variables in one line, as follows.

In [None]:
a,b,c=1,2,3

We can assign values to a list, as follows

In [None]:
mylist=[a,b,c,4,5]
print(mylist)

We can access the length of the list with the len function

In [None]:
len(mylist)

We can access the first item of the list and last item of the list as follows. Note that indexing starts at 0 in python and [-1] is a smart way to access the last element.

In [None]:
mylist[0], mylist[-1]

We could equally do the following to get the last element, although [-1] doesnt require us to know the number of elements in the list.

In [None]:
mylist[4]

We can access the attributes of an object like this with dir()

In [None]:
dir(mylist)

For instance, lets see what the .append method does.

In [None]:
help(mylist.append)

Here for instance, we append the string 'A' to the end of the list. Illustrating that lists can contain multiple variable types

In [None]:
mylist.append('A')
mylist

Using the *N operator can duplicate the list N times. 

In [None]:
mylist*5

<img src = 'https://upload.wikimedia.org/wikipedia/commons/8/81/Stop_sign.png' width = "200" height = "" >

## For loops, list comprehension and enumerate.



Let's say that I want to add each element of the following list of strings to the second string and return the result in a new list.

In [None]:
list_of_strings=['For breakfast I eat','For lunch I eat', 'For dinner I eat']
string2=' BACON'

I could do so like this

In [None]:
mynewlist=[] # This makes an empty list

for cstring in list_of_strings:
  mynewlist.append(cstring+string2)# + is used to join strings.
mynewlist

Python also has something really handy called list comprehension, which allows us to do this all in one line.

In [None]:
mynewlist=[cstring+string2 for cstring in list_of_strings]
mynewlist

Sometimes it is nice to acess both an element within a list, but also the index of an element within a list. For instance, here we use enumerate to return both the index within the list and the corresponding element.

In [None]:
for index,element in enumerate(list_of_strings):
  print(index,element)

In [None]:
times=[' at 9 oclock',' at midday',' at 7PM']

mynewlist=[element+string2+times[index] for index,element in enumerate(list_of_strings)]
mynewlist

## Functions

Functions tend to look like this:

In [None]:
def return_food(mealtime):
  if mealtime=='breakfast':
    food='BACON'
  elif mealtime=='lunch':
      food='pot noodle'
  elif mealtime=='dinner':
      food='pasta'

  return food


In [None]:
return_food('lunch')

We can also specify default inputs and input types. Here mealtime defaults to lunch unless we change it. This is useful for setting default parameters for an analysis, for instance. 

In [None]:
def return_food(mealtime: str='lunch') -> str:
  if mealtime=='breakfast':
    food='BACON'
  elif mealtime=='lunch':
      food='pot noodle'
  elif mealtime=='dinner':
      food='pasta'

  return food

In [None]:
return_food()

If you do not know how many keyword arguments that will be passed into your function, adding ** before the parameter name in the function definition will allow the function to receive a dictionary of arguments, and can access the items accordingly:



In [None]:
def return_food(mealtime: str='lunch',**kwargs)-> str:
  if mealtime=='breakfast':
    food='BACON'
  elif mealtime=='lunch':
    food='pot noodle'
  elif mealtime=='dinner':
    food='pasta'
  if 'message' not in kwargs:
    message='Enjoy your meal'
  else:
    message=kwargs['message']
  return food, message

In [None]:
return_food()

In [None]:
return_food(message='I hope this tastes bad')

<img src = 'https://upload.wikimedia.org/wikipedia/commons/8/81/Stop_sign.png' width = "200" height = "" >

## Using the os (operating system) library

In [None]:
import os

We can see what system we are running on, like this

In [None]:
os.uname()

And our current directory

In [None]:
os.getcwd()

We can use os to make new directories.

In [None]:
os.makedirs('content/my_newfolder')

We can make formatted strings as follows

In [None]:
'My name is {name} and for {meal} I like to eat {food}'.format(name='Nick',meal='Breakfast',food='BACON')

So we can leverage this and list comprehension to create new directories for several different types of food.

In [None]:
foods=['BACON','SAUSAGE','EGGS']

[os.makedirs('content/my_newfolder_{food}'.format(food=food)) for food in foods]

We can send any command to the terminal with os.system.

In [None]:
os.system('mkdir terminal_made_this_folder') # Making a new folder

In [None]:
os.system('rm -rf terminal_made_this_folder') # Deleting the new folder

A useful os function is os.path.join, which allows us to contruct path-like strings.

In [None]:
os.path.join('data','data1','new_data')

So are path.split and splitext, which will automatically parse out the path and filename and extension from a given string.

In [None]:
os.path.split('/home/myfile.txt'),os.path.splitext('/home/myfile.txt')

## Numpy

[Numpy](https://numpy.org/) is the main numerical library. It has a ridiculous amount of functionality, so I will just cover some useful stuff here.

In [None]:
import numpy as np
my_array=np.array([1,2,3,4,5,6,7])

With numpy we can illustrate some more indexing. Here we take every 2nd element. 

In [None]:
my_array[::2]

We can also do boolean array indexing, here returning and selecting values >3

In [None]:
my_array[my_array>3]

Note here that * did not perform an arithmetic operation on a list, but it does so for a numpy array.

In [None]:
my_array*2

In [None]:
my_array**2

We can join arrays together using vstack and hstack to join arrays together vertically or horizontally respectively.

In [None]:
my_array2=np.vstack([my_array*2,my_array])

The number of rows and columns of a numpy array can be returned by .shape

In [None]:
my_array2.shape

We can return the mean of an array as follows, as well as the row and columnwise means by specifying an axis.

In [None]:
np.mean(my_array2),np.mean(my_array2,axis=0),np.mean(my_array2,axis=1)

We can also do matrix multiplication

In [None]:
np.dot(my_array,my_array2.T)

Repeat can be used to repeat each element of an array N times and tile can be used to repeat the array itself N times. 

In [None]:
np.repeat(my_array,2),np.tile(my_array,2)

We can save out and load numpy arrays to/from a npy file as follows:

In [None]:
np.save('content/my_newfolder/my_array.npy',my_array2)
my_array3=np.load('content/my_newfolder/my_array.npy')
np.array_equal(my_array2,my_array3)

Linspace and logspace allow us to obtain a set of N linearly and logspaced values between two extremes

In [None]:
np.linspace(1,200,13),np.logspace(1,200,13)

We can also set up arrays of random numbers using the random module

In [None]:
random_array=np.random.rand(30,10)
random_array

And create an array of random integers within a given range.

In [None]:
np.random.randint(1, 10, (40,2))

<img src = 'https://upload.wikimedia.org/wikipedia/commons/8/81/Stop_sign.png' width = "200" height = "" >

## Pandas

[Pandas](https://pandas.pydata.org/) is a library that allows us to interact with tabular data. It allows us to replicate most of the functionality of the 'data.frame' that exists in R.

In [None]:
import pandas as pd

my_frame=pd.DataFrame(random_array,columns=['variable_'+ str(i) for i in range(10)])
my_frame.head()

We can select out certain rows as follows

In [None]:
my_frame.loc[1:4]

We can select certain columns as follows:

In [None]:
my_frame[['variable_1', 'variable_2']].head()

We can filter according to various conditions as follows:

In [None]:
my_frame[(my_frame['variable_1']<.5) & (my_frame['variable_2']<.5)]

We can assign constants as follows:

In [None]:
my_frame=my_frame.assign(Participant='Participant 1', condition='condition 1')
my_frame.head()

We can also construct pandas columns from numpy expressions

In [None]:
my_frame['time']=np.array(range(my_frame.shape[0]))

I wont go into pandas in any more detail, but if you are coming from R and want to learn about the equivalent functions, you can refer to [this handy guide](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_r.html#compare-with-r).

## Scipy and matplotlib

[Scipy](https://scipy.org/) is the main scientific python library. It's functions are too diverse to summarise here, but they include statistics, curve fitting, optimisation, algebra and so on . [Matplotlib](https://matplotlib.org/) is the main plotting library. I don't particularly like it, which is why I prefer to use ggplot (in R) for plotting. Note that a form of[ggplot](https://plotnine.readthedocs.io/en/stable/) also exists for python, though there is somewhat less functionality. I wont discuss it here as it is not really a 'core' python package.




Lets introduce some different flavour to importing now.

Note that here I import only a couple of functions from the stats submodule of scipy 

In [None]:
from scipy.stats import zscore, ttest_1samp
import matplotlib.pyplot as plt

The stats module contains lots of statistical utilities, like zscoring data for instance.

In [None]:
zscored_data=zscore(my_frame['variable_1'])

In fact, if we have an array, we can zscore the entire thing over a given axis. Here for instance, we zscore everything about the columnwise mean and standard deviation.

In [None]:
zscore(random_array,axis=1)

Here we plot the original and zscored data. I find this all pretty clunky and inelegant as compared to ggplot. 😫

In [None]:
plt.plot(my_frame['time'],my_frame['variable_1'],label='Original_data')
plt.plot(my_frame['time'],zscored_data,label='zscored_data')
plt.legend()

[Composing subplots](https://matplotlib.org/stable/gallery/subplots_axes_and_figures/subplots_demo.html) is a science unto itself, but here is an example of vertically stacked plots.

In [None]:
fig, axs = plt.subplots(2)
fig.suptitle('Vertically stacked subplots')
axs[0].plot(my_frame['time'],my_frame['variable_1'],label='Original_data')
axs[1].plot(my_frame['time'],zscored_data,label='zscored_data')

Here we use scipy to perform a T test against a population mean and plot the result.

In [None]:
my_ttest=ttest_1samp(my_frame['variable_1'],popmean=.3)
plt.hist(my_frame['variable_1'])
plt.xlabel("Variable 1")
plt.ylabel("Frequency")
plt.axvline(x=.3,c='k')

plt.text(x=0.6, y=6,s='t={tval} p={pval}'.format(tval=round(my_ttest[0],3),pval=round(my_ttest[1],3)))

Note again, we can perform a t test for every column of our array specified before.

In [None]:
ttest_1samp(random_array,popmean=0,axis=1)

As a signal processing example, we can detect some peaks in one of our variables and save the figure out to a png. This would be useful if we wanted to detect heartbeats, for instance. 

In [None]:
from scipy.signal import find_peaks

x,y=my_frame['time'],my_frame['variable_1']
peaks = find_peaks(y, height = 0, threshold = .1, distance = 1)
height = peaks[1]['peak_heights'] #list of the heights of the peaks
peak_pos = x[peaks[0]] #list of the peaks positions
#Finding the minima
y2 = y*-1
minima = find_peaks(y2)
min_pos = x[minima[0]] #list of the minima positions
min_height = y2[minima[0]] #list of the mirrored minima heights
#Plotting
fig = plt.figure(figsize=(4, 3), dpi=150)
ax = fig.subplots()
ax.plot(x,y)
ax.scatter(peak_pos, height, color = 'r', s = 15, marker = 'D', label = 'Maxima')
ax.scatter(min_pos, min_height*-1, color = 'gold', s = 15, marker = 'X', label = 'Minima')
ax.legend()
ax.grid()
plt.show()

In [None]:
fig.savefig('content/my_newfolder/my_peaks.png')

## Summary

We have covered some very basic python here. This has just been a whirlwind tour of some of the main packages. We have barely even scratched the surface.

If you want to learn a bit about how we go about using all this stuff to perform fMRI analysis in python, I prepared these notebooks as part of the PYM0FM course:

https://colab.research.google.com/drive/1dMgMQQddOPPCs7sBOnH9YNFQk7SsWtb-?usp=sharing

https://colab.research.google.com/drive/1I8cB-2IbxsFuMNjVHMUWvf0T61MDxRMZ?usp=sharing#scrollTo=3NO1v8Ssiyhm

https://colab.research.google.com/drive/1Ng8tH0JSSj57ACUkLieRlv0KAU2xCT72?usp=sharing

## Homework

1. Search the web to find out how to create a dictionary. In this dictionary, set the keys to be meals of the day (brekfast, lunch dinner) and the values to be what you had to eat for the last instance of these meals. Save this to a yaml file. 

2. Create a function named 'diffslopes' that takes two sets of data (x1,y1,x2,y2), performs a scipy-based linear regression on both and then subtracts the two estimated slopes. 

3. One module I did not touch on was [sklearn](https://scikit-learn.org/stable/index.html) a machine learning library. Try out some of the examples here based on what takes your interest.

4. Search the web to try and find a way of sending an email from python. Send me an email (nhedger1@gmail.com) with a picture of your favourite celebrity attached.