# Understanding Iterators and (mostly) Generators
Seetha Krishnan
<br>
ASPP - Asia Pacific 2018

## Iterators
Iterators are everywhere. 
An iterator is simply an object that can be iterated upon, say using a `for` loop

In this extremely simple example, the __range(4)__ is the iterable object which at each iteration provides a different value to the __"i"__ variable.

In [None]:
for i in range(4):
    print(i)

You can iterate over strings, lists, files, dictionaries etc

In [None]:
# Iterate over lines of a file
filename = 'sometextIwrote.txt'
f = open(filename, 'r')
for linenumber, lines in enumerate(f):  
    print('{} > {}'.format(linenumber, lines))
f.close()
# Enumerate is one of those super cool in-built python functions that 
# allows you to loop over an object and have an automatic counter

__Sidenote__ :  The proper way to open and close a file is not like the above example, but using a `with` statement, which takes care of opening and closing a file <br>(These are called Context Managers, we will talk more about them later)

In [None]:
with open(filename) as f:
    for linenumber, lines in enumerate(f):
        print('{} > {}'.format(linenumber, lines))

#### Underneath the covers is a specific protocol:
iter : This returns the iterator object itself
<br>next() : This returns the next value. 
<br>_StopIteration_ error once all the objects have been looped through.

If you want to know how to write an iterator from scratch, refer to some of these tutorials
<br>https://www.programiz.com/python-programming/iterator
<br>http://anandology.com/python-practice-book/iterators.html

In [None]:
it = iter(range(4))
print(it)

In [None]:
print(next(it))  # Run this multiple times

## Generators
Generators are a simple, yet elegant type of iterators. A generator produces a sequence of results, not just a single value

__To create generators:__ 
- Define a function
- instead of the return statement, use the __yield__ keyword. 

In [None]:
# Count number of words per line of csv file
def read(filename):
    with open(filename) as fin:
        for line in fin:
            yield line

def countwords(linearray):
    for i, line in enumerate(linearray):
        yield i, len(line.split())

You will typically use generator functions as an __iterator object__

In [None]:
filename = 'sometextIwrote.txt'
for i, n in countwords(readcsv(filename)):
    print('Number of words in line {} is {}'.format(i, n))

In [None]:
g = readcsv(filename)
print(g)

In [None]:
next(g)

### Whats so great about a generator? 
- Generators allow you to iterate over some data __lazily__ without loading the entire data source into memory at once.  (Great for large datasets!)
- When functions `return`, they are done for good. Not generators.
- Functions always start from the first line, generators start where you left off : at __yield__ 

### Exercise 1 
Multiple CSV files stored in a directory, contain information of x-y position of a swimming zebrafish across time.
<br>__The task:__
1. Loop through each csv file, acquire the x and y position and find distance travelled by the fish at each time point.
2. To find distance travelled between two timepoints, you need to get the x and y position of fish at two consecutive frames.
3. Using the acquired distance travelled, print time spent by the fish at a speed below the threshold. 

  <img src="files/fish.png"  width="500" >

In [84]:
import math
import csv
import os

def CSVfileGrabber(dirname):
    for filename in os.listdir(dirname):
        print('Working on: {}'.format(filename[:5])) #Print name of fish
        yield os.path.join(dirname, filename)


def readcsv(filename):
    with open(filename) as f:
        csvreader = csv.reader(f)
        for i, line in enumerate(csvreader):
            if i < 10:  # Skip header frame and frames used to calculate background
                continue
            else:
                yield line


def getxy(linearray):
    for i in linearray:
        # Columns containing x and y coordinates are in the 3rd and 4th column respectively
        yield [int(i[2]), int(i[3])]


def consecutivexy(linearray):
    # Here we want to get two consecutive xy to get speed/frame
    # Make use of the next keyword and Stop Iteration to quit
    try:
        for i, line in enumerate(linearray):
            if i == 0:
                prevxy = line
            else:
                prevxy = nextxy
            nextxy = next(linearray)
            yield prevxy, nextxy

    except StopIteration:
        yield prevxy, line


def getdist(xy):
    #Calculate euclidean distance
    for prevxy, nextxy in xy:
        dist = [(a - b)**2 for a, b in zip(prevxy, nextxy)] #zip allows you to iterate two lists parallely
        dist = math.sqrt(sum(dist))
        yield dist


def getframes(dist, threshold, frames_per_sec):
    dist_count = 0
    for i, d in enumerate(dist):
        if d < threshold:
            dist_count += 1
    print('Of {:0.3f} seconds recording time, time spent with speed less than {} is {:0.3f} seconds'.format(
        i / frames_per_sec, threshold, dist_count / frames_per_sec))

In [85]:
dirname = '/Users/seetha/Desktop/Microbetest/Collective/'
for files in CSVfileGrabber(dirname):
    getframes(
        getdist(
            consecutivexy(
                getxy(readcsv(files)))), threshold=10, frames_per_sec=30)

Working on: Fish1
Of 8.133 seconds recording time, time spent with speed less than 10 is 3.033 seconds
Working on: Fish6
Of 8.133 seconds recording time, time spent with speed less than 10 is 4.600 seconds
