## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#your code here
# See team project repo

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [1]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [2]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [3]:
g = make_data(5, 10)
list(g)

[1000000000.2483191,
 1000000001.7175666,
 999999999.2208488,
 999999999.2710557,
 999999997.4430383,
 999999999.9566939,
 1000000000.9763582,
 1000000001.0733516,
 999999999.9241843,
 999999998.7275048,
 999999999.9380003]

In [4]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[999999998.7560065,
 1000000004.2296506,
 1000000002.640017,
 1000000001.6089094,
 1000000001.2274193,
 1000000001.3009561,
 1000000001.4638871,
 1000000002.5257026,
 1000000002.7656804,
 1000000002.3871646,
 1000000002.2381872,
 1000000001.8489997,
 1000000001.5907165,
 1000000001.2823277,
 1000000000.9260416,
 1000000000.686166,
 1000000000.6798712,
 1000000000.9366128,
 1000000000.7683005,
 1000000000.7819194,
 1000000000.709597,
 1000000000.6445792,
 1000000000.5735127,
 1000000000.5824695,
 1000000000.7244146,
 1000000000.6694156,
 1000000000.5137767,
 1000000000.5250639,
 1000000000.4312708,
 1000000000.3267772,
 1000000000.3246696,
 1000000000.3182658,
 1000000000.3309765,
 1000000000.3203361,
 1000000000.2750416,
 1000000000.4469855,
 1000000000.426442,
 1000000000.3733248,
 1000000000.4074972,
 1000000000.3686476,
 1000000000.1829406,
 1000000000.095577,
 1000000000.019109,
 1000000000.1051184,
 1000000000.0332004,
 1000000000.0313668,
 999999999.9322939,
 999999999.930284,
 9

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [16]:
# your code here
import numpy as np
import math

def online_mean_dev(iterator):
    n = 0    
    mu, dev_accum = None, None
    stddev = None
    for num in iterator:
        n += 1
        
        if n == 1:
            mu = num
            dev_accum = 0
            stddev = np.nan
        else:
            old_mu = mu
            mu = mu + (num - mu)/n
            dev_accum = dev_accum + (num - old_mu) * \
                (num - mu)
            stddev = math.sqrt(dev_accum/(n-1))
        yield (n, num, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [19]:
data_with_stats = online_mean_dev(make_data(5, 100000))
# print(data_with_stats[0])

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [20]:
#your code here
def is_ok(level, t):
    n, num, mu, stddev = t
    return (-level * stddev + mu) <= num <= (level * stddev + mu)

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [21]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [22]:
list(anomalies)#materialize

[(1, 999999999.8438935, 999999999.8438935, nan),
 (3250, 1000000017.4806058, 999999999.9445951, 2.8618371303328543),
 (3712, 1000000015.282047, 999999999.9473315, 2.8614212998204924),
 (4510, 1000000014.9706957, 999999999.9714713, 2.8995781131968394),
 (6462, 1000000014.5301795, 999999999.967669, 2.895039700137576),
 (7129, 999999981.1367793, 999999999.9767876, 2.902447704549819),
 (11578, 999999985.1587152, 999999999.9761664, 2.907613608242672),
 (11796, 1000000015.3252591, 999999999.9796087, 2.908895229689659),
 (12792, 999999985.3967189, 999999999.9842292, 2.904821794040266),
 (18642, 999999984.8977883, 999999999.9821388, 2.880160124206725),
 (26419, 999999983.3929456, 999999999.9966364, 2.8908546264597046),
 (28126, 999999985.5261078, 999999999.9923877, 2.8915093148400635),
 (28457, 1000000015.1930368, 999999999.9914454, 2.893201648755465),
 (30050, 999999985.3163742, 999999999.9905875, 2.894027969360338),
 (34428, 999999983.7703326, 999999999.9933141, 2.8957794433063615),
 (42559,

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).