## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [1]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [2]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [27]:
g = make_data(5, 10)
list(g)

[999999999.8979864,
 1000000004.7052081,
 999999999.3375891,
 999999993.4103054,
 1000000003.0717053,
 1000000002.0171043,
 1000000004.1583248,
 1000000002.3734239,
 1000000000.2192163,
 999999999.7236252,
 999999999.7616198]

In [5]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[1000000004.8620105,
 1000000002.0241961,
 1000000001.3887969,
 1000000001.1003666,
 1000000001.5273584,
 1000000001.2801799,
 1000000001.2685615,
 1000000001.3385491,
 1000000001.2621979,
 1000000001.2276483,
 1000000000.6857018,
 1000000000.6189383,
 1000000000.430508,
 1000000000.5606848,
 1000000000.6328887,
 1000000000.6879442,
 1000000000.6449937,
 1000000000.6270849,
 1000000000.5959257,
 1000000000.6022533,
 1000000000.6465746,
 1000000000.6815833,
 1000000000.6383125,
 1000000000.691703,
 1000000000.7164993,
 1000000000.6624789,
 1000000000.8695464,
 1000000000.8481997,
 1000000000.9888165,
 1000000000.9767468,
 1000000000.8017282,
 1000000000.759307,
 1000000000.7913662,
 1000000000.8496752,
 1000000000.7700454,
 1000000000.6941131,
 1000000000.6313045,
 1000000000.5900815,
 1000000000.650765,
 1000000000.684251,
 1000000000.6806076,
 1000000000.615181,
 1000000000.64058,
 1000000000.6316932,
 1000000000.6041749,
 1000000000.494791,
 1000000000.4891216,
 1000000000.4410318,
 

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [96]:
# your code here
import math
import numpy as np

def online_mean_dev(iterator):
    n = 1
    mu = 0 
    dev_accum = 0
    stddev = np.nan
    for value in iterator:
        delta = value - mu
        mu = mu + delta/n
        dev_accum += (value - mu)**2
        if n > 1:    
            stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
        n += 1

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [97]:
data_with_stats = online_mean_dev(make_data(5, 100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [98]:
#your code here
def is_ok(level, t):
    if np.abs(t[1]-t[2])/t[3] < level:
        return True

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [99]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, list(data_with_stats))

We materialize the anomalies...

In [100]:
list(anomalies)#materialize

[(1, 1000000001.9179087, 1000000001.9179087, nan),
 (6383, 1000000014.994646, 1000000000.0006934, 2.8144615482098523),
 (7878, 1000000015.2353746, 1000000000.0039496, 2.8147031976729346),
 (11604, 1000000014.9912527, 1000000000.039969, 2.8378875607442344),
 (14341, 999999985.7664379, 1000000000.0612235, 2.821197248486268),
 (14867, 1000000016.7098643, 1000000000.0483481, 2.8309315220262055),
 (15167, 999999985.2206017, 1000000000.0474732, 2.8307767137928437),
 (15363, 999999984.1827582, 1000000000.0444572, 2.8345800505676375),
 (18456, 1000000017.2355269, 1000000000.0321716, 2.845078574893959),
 (27623, 1000000014.2942101, 1000000000.0212021, 2.845948126291442),
 (27866, 999999984.7801061, 1000000000.0250461, 2.8478582132032706),
 (29020, 999999985.2117698, 1000000000.0238507, 2.854080311418395),
 (30816, 1000000014.5256214, 1000000000.0208899, 2.857404183704058),
 (36277, 999999979.555514, 1000000000.0040451, 2.8571896606153153),
 (37078, 1000000014.5571408, 1000000000.0056968, 2.8579

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).