## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
def __iter__(self):
    for v in self._values:
        yield v

def itertimes(self):
    for t in self._times:
        yield t

def itervalues(self):
    for v in self._values:
        yield v

def iteritems(self):
    for (t, v) in zip(self._times, self._values):
        yield (t, v)


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [2]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [3]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [4]:
g = make_data(5, 10)
list(g)

[999999999.2710783,
 1000000000.0729389,
 1000000001.6190716,
 1000000000.6817493,
 999999994.2990803,
 999999999.1130251,
 1000000000.4452161,
 1000000004.7200706,
 999999997.7068175,
 999999999.6564506,
 999999999.9888841]

In [5]:
g = online_mean(make_data(5, 10))
print(type(g))
list(g)

<class 'generator'>


[1000000003.5073863,
 1000000000.8165691,
 1000000000.305948,
 999999998.942487,
 999999999.7109706,
 1000000000.0458524,
 999999999.3149323,
 999999999.8643061,
 999999999.6569027,
 999999999.8564799,
 999999999.908443]

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [180]:
import math
def online_mean_dev(iterator):
    S = 0
    n = 0
    mu = 0
    mu_prev = 0 
    for x in iterator:
        n += 1
        delta = x - mu
        mu += delta/n
        S += (x - mu_prev)*(x - mu)
        mu_prev = mu
        if n > 1:
            stddev = math.sqrt(S/(n-1))
            yield (n, x, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [193]:
data_with_stats = online_mean_dev(make_data(5,100000))

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [194]:
def is_ok(level, t):
    n,val,mu,std = t
    return True if (mu-level*std) < val < (mu+level*std) else False
    

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [195]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [196]:
list(anomalies)#materialize

[(1268, 999999984.7188612, 999999999.8752137, 2.9145943229430418),
 (2224, 1000000014.833243, 999999999.9735227, 2.89941967972494),
 (3945, 1000000014.6750737, 1000000000.0183676, 2.8863392516372315),
 (4121, 999999980.7254833, 1000000000.0210831, 2.8968806490296592),
 (17646, 1000000014.6138165, 1000000000.0236162, 2.8892674409126755),
 (18256, 999999983.7184129, 1000000000.0207312, 2.88641351445875),
 (18264, 1000000014.957017, 1000000000.021213, 2.8882388462086324),
 (18725, 1000000014.8059732, 1000000000.0199852, 2.8891700144631915),
 (24038, 1000000014.6281406, 1000000000.0178963, 2.8948375628233056),
 (24303, 1000000016.449877, 1000000000.019978, 2.898233346402561),
 (26572, 1000000014.5998565, 1000000000.0240046, 2.899570131997866),
 (29480, 999999984.5476906, 1000000000.01035, 2.9016545830695843),
 (35852, 1000000015.586729, 1000000000.01269, 2.898443583098256),
 (39559, 1000000014.7310599, 1000000000.0056897, 2.9000578868760143),
 (52085, 999999984.6970662, 1000000000.0080813,

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).