# Scenario 

<div style='color: #3366ff'>You're trying to download tweets and process them for sentiments: </div>
    - "I have a @Microsoft Surface Pro 4 and LOVE it!"
    - "#surfacepro @surface Why is this so expensive?"


<div style='color: #3366ff'>How do we solve this problem? </div>

* Strategy 1: Download all tweets and then process them 
    - `download, loop [ process ]` 

* Strategy 2: Download each tweet in small batches 
    - `loop [ download tweet, process ] `

Need a function that can implement strategy 1 but does not need to keep everything in memory! 

Same design pattern is used in many cases. For instance, machine learning: load images and train machine based on them. 

* There are thousands (or even millions) of images so you can't load them all in memory. 
* The processes are complicated so you don't want to couple loading logic with learing. 

# Generators 

Let's first see what we're trying to do and then define generators. 

In [None]:
for i in [0, 1, 2, 3, 4]:   # keeps the list in memory 
    print(i)                # process each item 

In [None]:
for i in range(5):  # does not generate all five elements in memory! 
    print(i) 

In [None]:
range(5)     # NOT a list but a generator object 

In [None]:
def myrange(n): 
    x = 0
    while x < n: 
        yield x   # 'yield' turns a function into a generator 
        x += 1 

In [None]:
type(myrange(5))

In [None]:
for i in myrange(5): 
    print(i)

In [None]:
def countdown(n): 
    while n > 0: 
        print("Computing next number ... ")
        yield n
        n -= 1 

In [None]:
for i in countdown(5): 
    print(i)

In [None]:
v = countdown(5) 

In [None]:
next(v)

In [None]:
import random

def random_gen(low, high, num):
    i = 0
    while i < num:
        yield random.randrange(low, high)
        i += 1 

In [None]:
r = random_gen(0, 100, 5)

In [None]:
r

In [None]:
list(r)

In [None]:
def random_gen_inf(low, high):
    while True:
        yield random.randrange(low, high)

In [None]:
r = random_gen_inf(0, 100)

In [None]:
next(r)

# Generator Syntax 

In [None]:
%time v = [ i**2 for i in range(10000000) ]

In [None]:
print(v[:10])

In [None]:
%time g = (i**2 for i in range(10000000))

In [None]:
g

# Real World Example

In [None]:
wwwlog = open("access-log")
for line in wwwlog: 
    print(line)
    break 

In [None]:
wwwlog = open("access-log")
total = 0

In [None]:
for line in wwwlog:
    bytestr = line.rsplit(None,1)[1]
    if bytestr != '-':
        total += int(bytestr)
    
print("Total", total)

The generator way: 

In [None]:
wwwlog      = open("access-log")
bytecolumn  = (line.rsplit(None,1)[1] for line in wwwlog)
bytes       = (int(x) for x in bytecolumn if x != '-')

print("Total", sum(bytes))

# Tailing a File

In [None]:
import time
def follow(thefile):
    thefile.seek(0, 2)         # Go to the end of the file
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)    # Sleep briefly
            continue
        yield line

In [None]:
logfile  = open("test-log")

In [None]:
loglines = follow(logfile)

In [None]:
for line in loglines:
    print(line, )
    
    if line[:1] == '.': 
        break 