## Q1.

Add methods `__iter__` to your project Time Series class to iterate over values, a method `itertimes` to iterate over times, a method `itervalues` to iterate over values, and a method `iteritems` to iterate over time-value pairs. (This is a similar interface to python dictionaries). To test these, check both the types of the results and the answers you expect.

In [1]:
def __iter__(self):
    for v in self._values:
        yield v

    def itertimes(self):
        for t in self._times:
            yield t

    def itervalues(self):
        for v in self._values:
            yield v

    def iteritems(self):
        for (t, v) in zip(self._times, self._values):
            yield (t, v)

## Q2.

An online mean and standard deviation algorithm.

Below is a function to generate a potentially infinite stream of 1-D data.

In [2]:
from random import normalvariate, random
from itertools import count
def make_data(m, stop=None):
    for _ in count():
        if stop and _ > stop:
            break
        yield 1.0e09 + normalvariate(0, m*random())

Here is an implementation of an online mean algorithm..see http://www.johndcook.com/blog/standard_deviation/ and the link to http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ in-between. (Convince yourselves of the formulas...)

In [109]:
def online_mean(iterator):
    n = 0
    mu = 0
    for value in iterator:
        n += 1
        delta = value - mu
        mu = mu + delta/n
        yield mu

We use out generator functions to implement iterators:

In [None]:
g = make_data(5, 10)
list(g)

In [None]:
g = online_mean(make_data(5, 100))
print(type(g))
list(g)

### 2.1

Implement the standard deviation algorithm as a generator function as

```python
def online_mean_dev(iterator):
    BLA BLA
    if n > 1:
        stddev = math.sqrt(dev_accum/(n-1))
        yield (n, value, mu, stddev)
```

In [129]:
import math

def online_mean_dev(iterator):
    n = 0
    stddev = 0  
    mu = 0
    mu_prev = 0
    for x in iterator:
        n += 1
        delta = x - mu_prev
        mu = mu_prev + delta / n
        stddev += (x - mu_prev) * (x  - mu)
        mu_prev = mu
        if (n > 1):
            stddev = math.sqrt(stddev / (n-1))
            yield (n, x, mu, stddev)

Here we make 100000 element data, and run this iterator on it (imagine running this on a time-series being slowly read from disk

In [152]:
data_with_stats = online_mean_dev(make_data(5, 100000))

In [134]:
for i in online_mean_dev(make_data(5, 10)):
    print(i)

(2, 1000000001.1383768, 999999999.8354645, 1.8425963222739445)
(3, 1000000002.7623073, 1000000000.8110788, 1.943390776961208)
(4, 1000000000.0625712, 1000000000.6239519, 0.8876163850328967)
(5, 1000000000.780305, 1000000000.6552225, 0.47622826001511487)
(6, 999999998.8091618, 1000000000.3475457, 0.8143928428897316)
(7, 999999998.6351215, 1000000000.1029137, 0.744745590327787)
(8, 1000000007.0182407, 1000000000.9673296, 2.466598988631139)
(9, 1000000000.7069416, 1000000000.9383976, 0.5620128299456385)
(10, 999999997.6324264, 1000000000.6078005, 1.0748909048138058)
(11, 999999999.3906026, 1000000000.4971461, 0.492115166769422)


## Q3.

Let's do Anomaly detection. Write a routine `is_ok`:

```python
def is_ok(level, t)
```

which takes a tuple like the one yielded by your code above and returns True if the value is inbetween `level`-$\sigma$ of the mean.

In [142]:
def is_ok(level, t):
    _, x, mu, stddev = t
    tol = level - stddev
    return (mu - tol < x < mu + tol)

We use this function to create a predicate passed through to `itertools.filterfalse` which is then used to obtain an iterator on the anomalies.

In [154]:
from itertools import filterfalse
pred = lambda t: is_ok(5, t)
anomalies = filterfalse(pred, data_with_stats)

We materialize the anomalies...

In [155]:
anomaly_list = list(anomalies)
print(len(anomaly_list))
anomaly_list  # materialize

9432


[(2, 999999993.668907, 999999997.1454622, 4.916591385519283),
 (5, 1000000006.8063874, 1000000000.196701, 3.7337646917130316),
 (19, 999999994.3843958, 1000000000.0517331, 1.3790551740134611),
 (20, 1000000006.5172698, 1000000000.37501, 1.4706257466845123),
 (22, 1000000008.1340752, 1000000000.5921197, 1.6944561177220614),
 (33, 1000000006.0135381, 1000000000.5386285, 0.9848084343305032),
 (37, 1000000007.1887525, 1000000000.7186396, 1.0988792818028763),
 (43, 1000000008.1489433, 1000000000.6390294, 1.178396768339386),
 (44, 1000000006.7925471, 1000000000.778882, 0.9423324774844044),
 (58, 999999989.3760073, 1000000000.4685622, 1.482495254362478),
 (62, 1000000005.6040199, 1000000000.566396, 0.6511569922386762),
 (73, 1000000005.8065561, 1000000000.5435818, 0.6267492162242675),
 (80, 999999992.7377859, 1000000000.3485179, 0.8620040547245812),
 (94, 1000000006.0758184, 1000000000.3349707, 0.5994705296032061),
 (124, 999999993.268937, 1000000000.0943075, 0.6202635065278119),
 (128, 99999

## To think of, but not hand in

What kinds of anomalies will this algorithm pick up? What kinds would a shorter "window" of anomaly detection, like 100 points around the time in question pick? How might you create an algorithm which does window based averaging? (hint: the window size is small compared to the time series size). 

Finally think a bit of how you might implement all of this in a production environment..remember that data streaming in might get backed up when you handle an anomaly.

(Some inspiration might accrue if you look at the docs for `collections.deque`).