## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
#See test_timeseries.py in project repository


## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [2]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random() )
        

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [3]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [4]:
g = make_data(5, 10)
list(g)

[1000000001.3868843,
 1000000002.5404531,
 1000000000.2014244,
 999999999.0167351,
 999999999.3345492,
 1000000001.5854721,
 999999997.2193824,
 1000000000.5018299,
 999999998.8449419,
 1000000000.0446758,
 999999998.7717435]

In [5]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

<class 'generator'>


[1000000004.3856759,
 1000000001.3662616,
 1000000004.3036283,
 1000000002.3150502,
 1000000001.8291456,
 1000000001.4553989,
 1000000001.6176692,
 1000000002.766257,
 1000000002.6875873,
 1000000002.1945325,
 1000000001.9402399,
 1000000001.5488362,
 1000000001.4518881,
 1000000001.1655737,
 1000000001.1500425,
 1000000001.041397,
 1000000001.0411717,
 1000000001.0847863,
 1000000000.8547621,
 1000000000.6169966,
 1000000000.4858509,
 1000000000.4145086,
 1000000000.4007676,
 1000000000.3341949,
 1000000000.306874,
 1000000000.2965688,
 1000000000.2755674,
 1000000000.2599816,
 1000000000.2871659,
 1000000000.2735342,
 1000000000.4387435,
 1000000000.415412,
 1000000000.4940624,
 1000000000.2997793,
 1000000000.2955313,
 1000000000.3437393,
 1000000000.4881462,
 1000000000.4334329,
 1000000000.4946506,
 1000000000.4144443,
 1000000000.5364456,
 1000000000.5528824,
 1000000000.516369,
 1000000000.4178096,
 1000000000.4097847,
 1000000000.3802783,
 1000000000.3737718,
 1000000000.380963

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [11]:
import math
def online_mean_dev(iterator):
    n = 0
    mu = 0
    dev_accum = 0
    for value in iterator:
        if n >= 1:
            n += 1
            delta = value - mu
            mu = mu + delta/n
            dev_accum = dev_accum + delta*(value - mu)
            stddev = math.sqrt(dev_accum/(n-1))
            yield (n, value, mu, stddev)
        else:
            n += 1
            mu = value
            yield (n, value, mu, 0)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [58]:
data_with_stats = online_mean_dev(make_data(5, 100000))
#list(data_with_stats)

## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [59]:
def is_ok(level,t):
    diff = t[1] - t[2]
#    print(diff)
    if t[3] == 0:
        if diff == 0: return True 
        else: return False
    else:
        nsigma = abs(diff/t[3])
#        print(nsigma)
        if nsigma < level: return True
        else: return False

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [60]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [61]:
list(anomalies)#materialize

[(5252, 999999984.8679287, 999999999.9661534, 2.891469236584855),
 (5497, 999999984.1350139, 999999999.9599606, 2.898139491881727),
 (6025, 1000000016.8132517, 999999999.9464177, 2.9056799314006336),
 (8973, 1000000015.2248049, 999999999.9548907, 2.895293880393277),
 (15247, 1000000016.2028623, 999999999.986835, 2.9035825088703655),
 (15989, 1000000014.7596158, 999999999.9862708, 2.904171573648156),
 (17120, 1000000017.3254101, 999999999.9884906, 2.8988612062544994),
 (17476, 999999984.6569301, 999999999.9881663, 2.9003633959223984),
 (18403, 1000000017.1114132, 999999999.9961855, 2.9058853541300897),
 (19045, 1000000015.2624334, 999999999.9968764, 2.909635994402627),
 (25908, 999999982.1915729, 1000000000.0080341, 2.9083065250407443),
 (27849, 999999983.8051348, 1000000000.0173457, 2.9135274387824226),
 (39033, 999999983.9595014, 1000000000.0094726, 2.904076328731311),
 (39802, 999999985.2516277, 1000000000.0090587, 2.904027786256553),
 (43033, 1000000018.4832474, 1000000000.0146494, 

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).