# Python Tutorial, Session 1 exercises

# Zombie Apocalypse

=========================================================================

Dear Scientist

We have shortly briefed you with the basic of Python programming. Before accepting you on the team, we will test you on this knowledge . If you pass the test, we will start providing you the data of the zombie outbreak.

Regards,

The Earth Leaders

=========================================================================

Let's start by importing some necessary modules:

In [None]:
import numpy as np
import scipy
%pylab inline
import pylab as plt
import scipy.optimize as opt

## Part 1: test your Python basics

Imagine we have a typical six-sided die, that will give us a value in a finite range with equal probability. We can model this using <tt>np.random.randint</tt>

In [None]:
np.random.randint?

We can model a throw by using this method to randomly return a number between 1 and 6 inclusive:

In [None]:
print(np.random.randint(1,7))

Executing the cell again will give us another random answer (try it).

To roll the die $n$=20 number of times, we can pass $n$ as a parameter to the method through the keyword <tt>size</tt>.

In [None]:
n = 20
print(np.random.randint(1,7,size=n))

So now imagine that we have two dice that we want to roll and each make note of the sum of their two values:

In [None]:
ntimes = 50 
d1 = np.random.randint(1,7,ntimes)
d2 = np.random.randint(1,7,ntimes)
sumdice = d1+d2 # we can do this, as d1 and d2 are each a np.array
print(sumdice)

If we repeat the rolling $n$ times, we will obtain a distribution. Plotting it will give us some indication (if we happened to skip that day of stats class), as will noting the mean and standard deviation

## Reminder: watch the indentation!
To define a function (comments are optional but often proved to be useful):
```python
def squared(x):
    a = 'this is inside the function'
    return a
```

Using loops:
```python 
for jj in range(10):
    print(jj)
print('this is not part of the loop')
```

**Q1.1 Begin by plotting the histogram of summed distribution, and printing the mean and standard deviation of the distribution**

You will probably want to use the following functions (look them up if you're not sure what they do):
```python
    plt.hist(values,range=(?,?),bins=?)
    np.array.mean()
    np.array.std()
```

In [None]:
#write your code here

**Q1.2 Automating the dice throwing** can be performed by creating two functions:

(1)
```python
    get_summed_dice(ntimes)
```
that generates the distribution of the summed dice for $n$ number of throws, and 

(2)
```python
    throw_dice(ntimes)
```
a wrapper function that takes in $n$ and creates, plots and prints the distribution.

**Then for $n$ = 50, 500, 50000 throws, run the code and examine the resulting plots.**

* Advanced: consider that you may want to extend the code - think about having more than two dice, or dice that have more (or less) than 6 faces, or not taking the sum but rather the product ... How would you write your code to allow to extend for these possibilities? 

In [None]:
#write your code here

**Q1.3 For large $n$, a Gaussian (normal) distribution approximates the sum of two uniform distributions**.

If we have die 1 and die 2 giving $X_1$ and $X_2$, respectively, and if $Y$ is the sum of $X_1$ and $X_2$, then 

* $\mu_y =\mu_{x_1} + \mu_{x_2}$
* $\sigma^2_y =\sigma^2_{x_1} + \sigma^2_{x_2}$


Repeat the code from above, but also plotting the theoretical (analytical) distribution as well as the one you get from your measured values for the mean and standard deviation (numerical). **Hint:** you may want to have a look at norm.pdf from the scipy.stats package.

In [None]:
import math
import scipy.stats as scist
scist.norm.pdf?
### some other hints
#x = np.linspace(a,b,n) # takes n sample points between a and b
#plt.plot(x,y1,'r--',label='sample1') # will plot y1 as a dashed red line
#plt.plot(x,y2,'k-',label='sample2') # will plot y2 as a black solid line
#plt.legend() # will turn on the legend so that labels 'sample1' and 'sample2' are visible

In [None]:
#write your code here

**Advanced Q1.4:** extend the code by creating a new function that will allow us to throw $d$ number of dice $n$ times

**Super advanced Q1.5**: further extend it by allowing the user to define whether they'd like to take the sum, mean or product of each throw.

In [None]:
#write your code here


# Part 2: Fitting to Zombie data

=========================================================================

Dear Scientist

Congratulations on passing the test! We now rely on you to combine this knowledge with your extraordinary intelligence and help us find a way to defeat the zombie outbreak! You will receive three assignments today and work towards a strategy to defeat the zombies next session!

We count on you!

Regards,

The Earth Leaders

=============================================================================================

** Assignment One **

The number of people alive is given by a variable S. The number of zombies is given by the variable Z. We have currently collected data in different areas when the number of zombies Z was the constant. 

Because people can be born, can die naturally, or can become infected with the zombie virus, the amount of S can change. We hypothesize that the change S is given by a polynomial of maximum order 3 (see below), but we want to find a more accurate estimate!

$$ a + bS+ cS^2 + dS^3 $$

The data is found in the file

File: <tt>noisy1.dat</tt>

Your tasks:

1) Can you find the parameters?

2) Comparing the parameter magnitudes, can we simplify our polynomial?

3) Don't forget to plot the result!

===============================================================================================

In [None]:
# read in the file
data = np.loadtxt('noisy1.dat')
xs = data[0,:]
ys = data[1,:]

In [None]:
# write your code here

Thanks to your work, we have been able to adapt our model. The rate of change of people alive is given by

$$dS/dt = P - B*S*Z - d*S$$

where :

    S: the number of susceptible victims
    Z: the number of zombies
    P: the population birth rate
    d: the chance of a natural death
    B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)
    
Based on our incoming data from New York, we estimated a remaining population size S=10000 yesterday. Today, our estimation points at only 9051 survivors. No data shows natural deaths. Using the equation above, can you estimate the number of zombies Z that were in New York?

Hint: the derivative dS/dt can be approximated by $$\Delta S / \Delta t $$ where the denominator is 1 day.

In [None]:
# write your code here

=============================================================================================

** Assignment Two **

We have received new data showing the number of infected people during the past 25 days in Australia!

The data is found in the file

File: <tt>noisy2.dat</tt>

We don't know whether the infection is contained, following a sinusoidal function, or whether it is spreading exponentially. Can you test which will give the best fit?

Your tasks:

1) Try to fit the data with a cosine - function.

2) Try to fit the data with an exponential function.

3) Plot the results. Which one matches the data better?

===============================================================================================

In [None]:
data = np.loadtxt('noisy2.dat')
xs = data[0,:]
ys = data[1,:]

In [None]:
# write your code here

=============================================================================================

** Assignment Three (Advanced) **

Top secret data just arrived. We are currently trying to analyse it. We know the following data should contain multiple sinusoidal terms, however we don't know the parameters.

The data is found in the file

File: <tt>noisy3.dat</tt>

Your task:

1) Find the best fit using multiple sinusoidal terms.

Hint: you may want to consider inspecting the data first to see whether you can get an indication as to how many frequencies there seem to be, as well as some initial values.

===============================================================================================

In [None]:
data = np.loadtxt('noisy3.dat')
xs = data[0,:]
ys = data[1,:]

In [None]:
#write your code here

-----------------------------------------
Created by S.Jarvis, Nov 2013  
Revised by J.Bono and G. Pernelle, Nov 2015, Nov 2016