# Week Nine: Statistics

The goal of filling in the requested pieces is twofold: you should be able to run the worksheet and get the requested answer with the given dataset, and you should also be able to pass with different datasets (not given). These will often check unusual inputs, etc., so try to make sure all possible input datasets are accounted for.

To be graded, your notebook must be runnable start to finish. If you can't make an in-notebook test pass, comment it out for to attempt to get partial credit. You should replace the `...` markers with your code. Do not change the names of the pre-defined variables and functions.

Plots should have the required elements of a plot: labels, units if valid, a legend if more than one marker or line type is present. Titles are not required.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
import scipy.optimize
import numba

In [None]:
# EID is your 6+2 UC Electronic ID
EID = 'sixplus2'
NAME = 'Joe Smith'

## Problem 1: Confidence intervals

We select the following events out of a larger sample. What is the confidence interval on the mean of the larger normally distributed sample:

In [None]:
sample = np.array([2.14, 1.91, 1.96, 2.08, 2.27, 2.19, 2.09, 2.12, 2.13, 2.11])

In [None]:
def compute_ci(sample):
    '''
    Return (min, max) of confidence interval
    '''

    ...

In [None]:
np.testing.assert_allclose(compute_ci(sample), (2.026041336336207, 2.173958663663793), rtol=1e-04) 

Run 1000 samples and check to see how often your CI includes the real value:

In [None]:
def compute_number_contained(mu=1, sigma=0.1, samples=10, N=1000):
    total_contained = 0
    for i in range(N):
        ...
        
    return total_contained

In [None]:
compute_number_contained(N=1000) # should be roughly 95% of 1000

## Problem 2: MCMC

Using the metropolis algorithm (MCMC but without computing a posterior, so simpler) to produce samples from $\left(1 + x^2\right)^{-1}$. See <https://theclevermachine.wordpress.com/2012/10/05/mcmc-the-metropolis-sampler/> if you need a hint.

In [None]:
def p(x):
    return 1/(1 + x**2)

def metropolis(p, samples = 50_000, sigma=1, min_value=-20, max_value=20):
    x = np.zeros(samples+1)
    x[0] = np.random.rand()

    for i in range(samples):

        # suggest new position
        ...

        # Compute alpha - the fractional chance of moving to a new point
        ...

        # Accept/reject based on alpha
        ...

        # Add the current (moved?) point
        ...

    return x

In [None]:
vals = metropolis(p)

Note: I was able to get this to go from 220 ms to about 2 ms by adding `@numba.njit` in front of both functions above. Feel free to try it out.

In [None]:
plt.figure(figsize=(10,4))
plt.plot(vals[:500], 'r')
plt.plot(np.arange(500,len(vals)), vals[500:], 'g')
plt.show()

In [None]:
plt.figure(figsize=(10,3))
x = np.linspace(-10,10,200)
plt.hist(vals[500:], bins=400, range=(-20,20), density=True);
plt.plot(x, p(x) / np.pi, lw=3)
plt.xlim(-10,10)
plt.show()

## Problem 3: Performance

In class I revisited the fractals from week 3, first lecture. I accelerated the classic integral fractal. Take the continuously colored version (below), and accelerate it too. (You can use any method, Numba is easiest though and has an existing example.) Note that you may be doing the opposite of the normal "vectorization"; you might end up taking array-at-a-time syntax and rewriting it with loops - this is okay in Numba. Numba does support some array-at-a-time calculations, but it does not support boolean indexing arrays.

Double click on this cell for hints.

<!--
* You can use two nested for loops, like in the example in the lectures
* You can use if statements instead of the boolean arrays - Numba doesn't like boolean indexing - and this lets you do a better job anyway.
* You will need to use [i,j] inside the loop to access elements at a time instead of arrays at a time
-->

In [None]:
def make_fractal_cc(size, iterations):
    x = np.linspace(-2,2,size[0]).reshape(1,-1)
    y = np.linspace(-2,2,size[1]).reshape(-1,1)
    c = x + y*1j
    z = np.zeros(size, np.complex_)
    it_matrix = np.zeros(size, dtype=np.double)
    for n in range(iterations):
        z[it_matrix == 0] = z[it_matrix == 0]**2 + c[it_matrix == 0]
        filt = (it_matrix == 0) & (np.abs(z) > 2)
        it_matrix[filt] =  n + 1 - np.log(np.log(np.abs(z[filt])))/np.log(2)
    return it_matrix

In [None]:
%%timeit
make_fractal_cc(size, iterations)

In [None]:
@numba.njit
def make_fractal_cc_fast(size, iterations):
    x = np.linspace(-2,2,size[0]).reshape(1,-1)
    y = np.linspace(-2,2,size[1]).reshape(-1,1)
    c = x + y*1j
    z = np.zeros(size, np.complex_)
    it_matrix = np.zeros(size, dtype=np.double)
    for n in range(iterations):
        
        # You'll probably start making changes here:
        z[it_matrix == 0] = z[it_matrix == 0]**2 + c[it_matrix == 0]
        filt = (it_matrix == 0) & (np.abs(z) > 2)
        it_matrix[filt] =  n + 1 - np.log(np.log(np.abs(z[filt])))/np.log(2)
        
    return it_matrix

In [None]:
size = (500, 500)
iterations = 50

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(make_fractal_cc_fast(size, iterations));

In [None]:
%%timeit
make_fractal_cc_fast(size, iterations)