# Major Differences: Python vs. R

In R, whitespace is ignored, for loops and if statements are determined by curly braced {..} and brackets (..). In python, everything is determined by indentation, so whitespace becomes more important! This makes it very neat :). All ifs and fors follow the same general pattern; the for/if, the statement, a colon :, then the main body is indented. Similarly function definitions use `def`, followed by the function and its args, then a colon; the main body is indented.

## Examples:

In [30]:
# First declare a function is_even which decides if an integer is even or not
def is_even(x):
    # Check the input is an integer
    if (not isinstance(x, (int,long))):
        print "Usage:\nis_even(x)\n\tx -- integer"
        return None
    # If remainder is 0 then it's even (% is remainder operator)
    elif x % 2 == 0:
        return True
    else:
        return False


# Set variable and try out function
x = 3
# Print using python string formatting (\n is newline, \t is newtab)
print "{0} is even: {1}\n".format(x, is_even(x))
x = 4
print "{0} is even: {1}\n".format(x, is_even(x))

# Try some invalid input
x = "I'm even I promise!"
is_even(x)

3 is even: False

4 is even: True

Usage:
is_even(x)
	x -- integer


# Data Structures

Python has all the standard data structures: Int, String, Float etc. It also has collections similar to the R vector and list. These are as follows:
- **List**, this is like the R vector, but can be manipulated a lot easier.
   - `[2,"a",4]` -- creates a list with elems `2`, `"a"` and `4`.
   - `[].append(2)` -- adds the element `2` to a list.
   - **power command** list comprehension: `[x^2 for x in range(10)]`. Creates a list of $x^2$ for $x = 0,\dots, 9$.
- **Dicts**, like an R list.
   - `d = {"a": 1, "b": 2}` -- creates a dict with keys (a,b) and elems (1,2)
   - `d["b"]` access elem with key "b"

There are lots that I've missed. See resources at the end for a full list. 
### Things to be aware of:
- Lists are indexed from 0, setting `li = [1,2,3]` and then typing `print li[1]` will return `2`.
- Unless a decimal point is involved, python will assume we are talking Ints. `4/3` returns 1 while `4.1/3` returns decimals.
- The list is not as high performance as the numpy array because it's easy to edit. We talk about the numpy array in the next section.

## Examples:

In [10]:
import string
import re

###### Dictionary comprehensions #####
# Get string of letters a-n (note a string is treated as a list of characters).
an = string.ascii_lowercase[:14]
print an
d = {char: num for (char,num) in zip(an,range(14))}
print d

##### Looping methods for lists ####
an = string.ascii_lowercase[:14]
nums = []
# Loop over list elements AND index
for i, char in enumerate(an):
    nums.append(i)
print nums

#### Regex of chars using list comprehension ####
chars = string.ascii_lowercase + string.ascii_uppercase
regexes = ['[A-Z]','[abd]','[Adf]']
print [re.findall(regex, chars) for regex in regexes]

abcdefghijklmn
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'n': 13}
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'], ['a', 'b', 'd'], ['d', 'f', 'A']]


# Linear Algebra in Python -- numpy

The main library for performing matrix calculations and numerical operations in python is [numpy](https://docs.scipy.org/doc/numpy/user/). To install it, go to a 
terminal and type `sudo pip install numpy`, you may need to install pip first using `sudo apt-get install pip`. Numpy is pretty fast, note it automatically uses multiple CPUs if possible, so be careful on computing clusters like STORM.

Numpy is good for the following:
- Excellent multi-dimensional arrays support
- Linear algebra
- Random sampling

By default operations are performed element wise, e.g. multiplying to n x n matrices together using `*` will be performed element wise.

## Examples:

In [17]:
# Import numpy and refer to it as np
import numpy as np

print "Checking elem wise calcs"

# To get any numpy function (f), write np.f. Lets create an array of zeros using the np function `zeros`:
x = np.zeros((5,5,5))   # 3d array of zeros of size 5x5x5

# Check what I said about element wise calculations:
print 5 * np.ones((2,2,2)) * .2 * np.ones((2,2,2))

#------------------

# Simulate from a mixture of Gaussians using random numbers

print "\nSimulating from mixture of Normals:"

## 1 -- set seed and number of components
np.random.seed(13)
K = 5

## 2 -- simulate weights from Dirichlet(1) distn
w = np.random.dirichlet(np.ones(K))
print w

## 3 -- simulate means from N(0,10)
mu = np.random.normal(loc = 0, scale = np.sqrt(10), size = K)
print mu

## 4 -- simulate 1000 data points, save to variable x
n = 1000
component_alloc = np.random.choice(K, p = w, size = n)
x = np.random.normal(loc = mu[component_alloc], scale = 1)

#--------------------------------------
# Sample from 100 of Guassians using all possible combinations of mus and sds list

print "\nSimulating from 100 Normals:"
mus = np.random.normal(scale = 10, size = 10)
sds = np.random.gamma(4, size = 10)
sample = np.array([np.random.normal(loc = mu,scale = sd) for mu in mus for sd in sds])
print sample

Checking elem wise calcs
[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]

Simulating from mixture of Normals:
[0.14341671 0.02586593 0.16584061 0.32179434 0.34308242]
[ 6.97799401 -2.97893925  2.00035731  3.89194591 -0.99323723]

Simulating from 100 Normals:
[-11.25470097 -11.76129401 -17.0368939  -11.86471199 -13.67285722
  -4.63577472 -13.94681591 -15.1003006  -10.67049708 -13.51276385
  -7.7084057   -5.34457634  -6.96064507   1.60806683  -5.67238383
  -4.31070303  -4.16896513  -6.06959117  -2.10084512  -4.8943548
  -8.40217951 -12.9919248  -11.66539523 -13.5178066   -9.15686799
 -13.2818723   -8.29786206  -7.20210401  -6.62538113 -16.51349696
  11.90366846  14.25950447   8.31545678  18.16708616  12.89878907
  19.01127382  12.31168403  22.10941967  13.45809078   2.09120536
   4.3426393    5.20106453  10.99472925   5.48025145   6.81149751
  14.63411549   3.03766789   6.62801665   5.19443151  11.47337726
   0.16981478   1.90140177   1.72646207  -8.93625946   2.17581425
  15.85427349   2.33

# More Stats (HOORAY) -- Scipy

scipy.stats implements loads of useful stats functions, including CDFs and PDFs etc. Take a look! It also has a lot of other useful maths functions.

## Examples:

In [38]:
import scipy.stats
import numpy as np

#------------------------------
# Calc log likelihood of data from MVt-distribution using misspecificed Normal.

# Assume mvt has identity scale and 0 mean
def rmvt(df = 3, n = 1000, d = 3):
    # Use method of simulation using chisq followed by mvnorm (see Wikipedia page of MVT)
    u = scipy.stats.chi2.rvs(df, size = n)
    Sigma_list = [np.eye(3) * df / u_i for u_i in u]
    return np.array([scipy.stats.multivariate_normal.rvs(cov = Sigma) for Sigma in Sigma_list])

X = rmvt()
ll = scipy.stats.multivariate_normal.logpdf(X, mean = np.zeros(3)).sum()

# Calculate avg Kolmogorov-Smirnov distance of each component from Normality
ksav = 0
# Make sample slightly bigger so KStest works properly!
X = rmvt(n = 10^4)
for j in range(3):
    x = X[:,j]
    ksav += scipy.stats.kstest(x, scipy.stats.norm.cdf).statistic / 3
print ksav

0.26889048152739814


# More Topics

Here are some interesting topics/links about python I recommend you read and play with if you want more:
- `R` Style data frames in python: https://pandas.pydata.org/
- Lots of implementations of ML algorithms: http://scikit-learn.org/stable/
- Useful tools for iterating over data: https://docs.python.org/2/library/itertools.html
- The 'pythonic way' is, rather than try to catch bad input to functions, instead catch errors elegantly as they crop up. Python has really nice error handling and this makes it quite a beauty to write code: https://wiki.python.org/moin/HandlingExceptions (quick example underneath)
- Elegant handling of time data: https://docs.python.org/2/library/datetime.html
- Pickle/cPickle for binary file saves: https://docs.python.org/2/library/pickle.html
- Generators can be used to create streaming tools for datasets: https://wiki.python.org/moin/Generators
- Objects and classes: https://www.learnpython.org/en/Classes_and_Objects
- Simple parallelisation: https://docs.python.org/2/library/multiprocessing.html
- Web Parsing: https://www.crummy.com/software/BeautifulSoup/

## Quick example of error handling:

In [43]:
# Quick demo: Rather than trying to catch bad input, sort it in the error handler

def sq(x):
    try:
        x^2
    except TypeError:
        print "Usage: x should be numeric"
    
sq('th')

Usage: x should be numeric
