# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your GT login and the GT logins of any of your collaborators below. (The GT logins are worth 1 point per notebook, so don't miss the opportunity to get a free point!)

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Is monthly rainfall in Atlanta stationary and homogeneous?

In this part of the lab, you will download historical monthly rainfall data for Atlanta and, treating it as a random process, determine whether it appears to be a) stationary and b) homogeneous?

First, let's download and plot the data, which is taken from [weather.gov](http://weather.gov) and covers the eleven year period from 1996-2016. Here is some code to inspect it. The main result are two variables, `months` and `rainfall`, which stores the time measured as number of months since January 1, 1996 and monthly precipitation in inches, respectively.

In [None]:
import numpy as np
import pandas as pd

rain_atl_raw = pd.read_table ('https://raw.githubusercontent.com/rvuduc/cx4230sp17labs/master/lab8/rain-atl-raw.txt', comment='#')
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
month_ids = [i+1 for i in range (len (months))]
rain_atl_raw.rename (columns={m: i for m, i in zip (months, month_ids)},
                     inplace=True)
rain_atl = pd.melt (rain_atl_raw,
                    id_vars=['Year'],
                    value_vars=month_ids,
                    var_name='Month',
                    value_name='Inches')
rain_atl['MonthsFrom1996'] = (rain_atl['Year']-1996)*12 + rain_atl['Month']
rain_atl.sort_values (by='MonthsFrom1996', inplace=True)

months = np.array (rain_atl['MonthsFrom1996'])
rainfall = np.array (rain_atl['Inches'])

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure (figsize=(15, 5))
plt.plot (months, rainfall, 'o')
plt.xlabel ('Months since January 1, 1996')
plt.title ('Monthly rainfall in Atlanta (inches)')

**Exercise 1** (2 points). Check whether "learning" is happening by testing whether the slope differs significantly from zero. Write your code below and explain your results in the Markdown cell that follows it.

> What does it mean for the slope to differ from zero "significantly?" If you use [`scipy.stats.linregress()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html), it performs a diagnostic statistical test for you. In particular, the null hypothesis is that the slope *is* zero; you would reject that hypothesis only if the $p$-value reported by `linregress()` is below a pre-selected significance level. You can assume a significance level of 0.05 to start, which is a common choice. For a more detailed explanation of how such an analysis goes, see [this tutorial](http://stattrek.com/regression/slope-test.aspx?Tutorial=AP).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

**Exercise 2** (2 points). Write a function to compute the "disjoint windowed means" of a sequence.

That is, let $x_0, x_1, x_2, x_3, \ldots, x_{n-1}$, denote the data sequence. Then to compute its disjoint windowed mean at a window size of $w$, you would do the following:

1. Divide the sequence into $\left\lfloor \frac{n}{w} \right\rfloor$ continguous groups of size $w$ consecutive elements each, i.e., the first group is $\{x_0, \ldots, x_{w-1}\}$, the second group is $\{x_w, \ldots, x_{2w-1}\}$, the third group is $\{x_{2w}, \ldots, x_{3w-1}\}$, and so on. (If the last group contains fewer than $w$ elements, then exclude it.)
2. Compute the mean of each group.
3. Return the list of group means.

In [None]:
def disjoint_windowed_means (x, w):
    n = len (x)
    assert 1 <= w <= n
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
x = rainfall[:12]
assert (disjoint_windowed_means (x, 1) == x).all ()

y1 = disjoint_windowed_means (x, 4)
assert np.allclose (y1, [5.3525, 2.6550, 3.1425], atol=1e-5)

y2 = disjoint_windowed_means (x, 5)
assert np.allclose (y2, [4.706, 2.742, 3.680], atol=1e-4)

print ("\n(Passed.)")

plt.figure (figsize=(15, 5))
plt.plot (months, rainfall, 'o')
plt.plot ([min (months), max (months)], [np.mean (rainfall), np.mean (rainfall)])
plt.plot (months[::4], disjoint_windowed_means (rainfall, 4), '*--')
plt.plot (months[::12], disjoint_windowed_means (rainfall, 12), 'v--')

**Exercise 3** (3 points). For lags between 0 and 24, inclusive, compute the lag correlation and create a stem-and-dot plot of it. Place your code in the code cell below. Then, in the Markdown cell that follows, explain based on your result whether it is likely that the rainfall each month is independent of other months.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE