### Numpy basics

In [1]:
import numpy as np

1\. Find the row, column and overall means for the following matrix:

```python
m = np.arange(12).reshape((3,4))
```

In [9]:
m = np.arange(12).reshape((3,4))
print(m)
print("means of rows: ", np.mean(m, axis=1))
print("means of columns: ", np.mean(m, axis=0))
print("overall mean: ", np.mean(m))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
means of rows:  [1.5 5.5 9.5]
means of columns:  [4. 5. 6. 7.]
overall mean:  5.5


2\. Find the outer product of the following two vecotrs

```python
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])
```

Do this in the following ways:

   * Using the function outer in numpy
   * Using a nested for loop or list comprehension
   * Using numpy broadcasting operatoins


In [23]:
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])

# using np.outer()
product_a = np.outer(u,v)
print("with np.outer():\n", product_a)

# using list comprehension
product_b = np.array([[u[i]*v[j] for j in range(len(v))] for i in range(len(u))])
print("with list comprehension:\n", product_b)

# using numpy broadcasting
product_c = u.reshape(4,1)*v
print("with broadcasting:\n", product_c)

with np.outer():
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]
with list comprehension:
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]
with broadcasting:
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]


3\. Create a 10 by 6 matrix of random uniform numbers. Set all rows with any entry less than 0.1 to be zero

Hint: Use the following numpy functions - np.random.random, np.any as well as Boolean indexing and the axis argument.

In [25]:
import numpy.random as npr

In [27]:
m = npr.random(size=(10,6))
m[m < 0.1] = 0
# BUT YOU HAVE TO DO THE ROWS!!!!!!
print(m)

[[0.16376701 0.11982034 0.46502344 0.2602923  0.28045821 0.        ]
 [0.61164819 0.66076927 0.40157452 0.49836969 0.79859828 0.62761793]
 [0.43253945 0.72577723 0.10501694 0.         0.99600149 0.15624902]
 [0.48545931 0.80235746 0.44667801 0.35211232 0.36364841 0.14998289]
 [0.10891312 0.6507387  0.93854998 0.17436298 0.10750251 0.8437671 ]
 [0.89169113 0.21890957 0.61511601 0.62466743 0.4295479  0.38463562]
 [0.94946686 0.57876595 0.77270596 0.3923074  0.32319007 0.52048071]
 [0.55997448 0.54844262 0.57131979 0.4691439  0.60112895 0.88213989]
 [0.86378845 0.         0.57844132 0.3338634  0.61049554 0.72545832]
 [0.24152947 0.18551876 0.97899254 0.42890777 0.13710392 0.39124154]]


4\. Use np.linspace to create an array of 100 numbers between 0 and 2π (includsive).

  * Extract every 10th element using slice notation
  * Reverse the array using slice notation
  * Extract elements where the absolute difference between the sine and cosine functions evaluated at that element is less than 0.1
  * Make a plot showing the sin and cos functions and indicate where they are close

5\. Create a matrix that shows the 10 by 10 multiplication table.

 * Find the trace of the matrix
 * Extract the anto-diagonal (this should be ```array([10, 18, 24, 28, 30, 30, 28, 24, 18, 10])```)
 * Extract the diagnoal offset by 1 upwards (this should be ```array([ 2,  6, 12, 20, 30, 42, 56, 72, 90])```)

6\. Use broadcasting to create a grid of distances

Route 66 crosses the following cities in the US: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff, Los Angeles
The corresponding positions in miles are: 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448

  * Construct a 2D grid of distances among each city along Route 66
  * Convert that in km (those savages...)

7\. Prime numbers sieve: compute the prime numbers in the 0-N (N=99 to start with) range with a sieve (mask).
  * Constract a shape (100,) boolean array, the mask
  * Identify the multiples of each number starting from 2 and set accordingly the corresponding mask element
  * Apply the mask to obtain an array of ordered prime numbers
  * Check the performances (timeit); how does it scale with N?
  * Implement the optimization suggested in the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

**N.B. the following exercises are meant to be solved only if you are familiar with the numpy random library. If not you can skip them (postponed for one of the next exercise sessions)**


8\. Diffusion using random walk

Consider a simple random walk process: at each step in time, a walker jumps right or left (+1 or -1) with equal probability. The goal is to find the typical distance from the origin of a random walker after a given amount of time. 
To do that, let's simulate many walkers and create a 2D array with each walker as a raw and the actual time evolution as columns

  * Take 1000 walkers and let them walk for 200 steps
  * Use randint to create a 2D array of size walkers x steps with values -1 or 1
  * Build the actual walking distances for each walker (i.e. another 2D array "summing on each raw")
  * Take the square of that 2D array (elementwise)
  * Compute the mean of the squared distances at each step (i.e. the mean along the columns)
  * Plot the average distances (sqrt(distance\*\*2)) as a function of time (step)
  
Did you get what you expected?

9\. Analyze a data file 
  * Download the population of hares, lynxes and carrots at the beginning of the last century.
    ```python
    ! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt
    ```

  * Check the content by looking within the file
  * Load the data (use an appropriate numpy method) into a 2D array
  * Create arrays out of the columns, the arrays being (in order): *year*, *hares*, *lynxes*, *carrots* 
  * Plot the 3 populations over the years
  * Compute the main statistical properties of the dataset (mean, std, correlations, etc.)
  * Which species has the highest population each year?

Do you feel there is some evident correlation here? [Studies](https://www.enr.gov.nt.ca/en/services/lynx/lynx-snowshoe-hare-cycle) tend to believe so.