To open this notebook in Google Colab and start coding, click on the Colab icon below.

<table style="border:2px solid orange" align="left">
  <td style="border:2px solid orange ">
    <a target="_blank" href="https://colab.research.google.com/github/neuefische/ds-meetups/blob/main/01_Python_Workshop_Revisiting_Some_Fundamentals/generators_and_lazy_evaluation.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
 </table>

# Revisiting... Generators and lazy evaluation

## Learning goals for this Notebook
At the end of this notebook you should:
- have a better understanding of generators and have seen several ways to use them
- understund what lazy evaluation means and why it's an advantage 
- understand ther different syntaxes for creating generaters


## How to use this
This notebook is supposed to be *follow-along*. Feel free to change stuff and experiment as much as you want, though.
Ideally, you should look at each cell and try to predict the result. Afterwards you can run it and see if you were right.

## Importing stuff
We barely have to import anything here, this is just basic Python.
 We just define a few helper functions:

In [None]:
import numpy as np
import timeit
from datetime import datetime

def n_say(s):
    print(f"Nico:    {s}")
def l_say(s):
    print(f"Larissa: {s}")
def p_say(s):
    print(f"Python:  {s}")       
    
def is_prime(number):
    """Function to check if a number is prime. Not very sophisticated but working.

    Args:
        number (int): the number to check

    Returns:
        Bool: True when number is prime, False if not
    """
    if number <2: return False
    if number in [2,3,5,7]: return True
    
    for i in range(2,int(number**0.5)+1):
        if number%i==0:
            return False
    return True    

In [None]:
n_say("Hi, I'm Nico. ")
l_say("Hi, I'm Larissa.")
p_say("I'm the sole voice of reason here. Don't trust the others!")

n_say(f"Speaking of reason: Did you know 2021 is{' ' if is_prime(2021) else ' not '}a prime number?")


## Introduction: what are Generators?

Lets just assume, you suddenly find yourself in dire need of some prime numbers. Then
1. There is a chance of about 99% you are currently doing a coding tutorial or a coding challenge.
2. We can easily get those by using the prime function from above.

So... lets use a list comprehension to generate a list full of prime numbers. And lets not be shy and directly get all primes up to 1,000,000 

In [None]:
start= datetime.now()
prime_list=[i for i in range(1000000) if is_prime(i)]
duration =round((datetime.now()-start).total_seconds(),2)

p_say(f"Done. result: ...{prime_list[-5:]}")
p_say(f"Computation took {duration} s")

That was easy! 

But instead of just stopping, you listen to your pet python that whispers in your ear: "just make a small change: Use round brackets () instead of square brackets []".

In [None]:
start =datetime.now()
prime_list_gen=(i for i in range(1000000) if is_prime(i))
duration =round((datetime.now()-start).total_seconds(),2)

p_say(f"Done. result: {prime_list_gen}")
p_say(f"Computation took {duration} s")

This is performed almost instantly. But why? What does it do now? 

**The expression is now no longer a list comprehension, but a generator instead!**

-- Lets get down into that rabbithole. Thanks Python!


## Comparing list comprehensions and generators


**First the list comprehension**
```python
prime_list=[i for i in range(1000000) if is_prime(i)]
````

This tells python you want a list that contains all the primes in range(1000000). Python accepts the syntax **and** prepares the list. This is referred to as **eager** evaluation.

On my machine this takes about 2.4 seconds. After this, the whole list is prepared and stored in memory.

**Now the generator expression**
```python
prime_list_gen=(i for i in range(1000000) if is_prime(i))
````

This tells python you want a list that contains all the primes in range(1000000). Python accepts the syntax **and** .. thats it.

 On my machine this takes about 0.0000003 seconds. After this, no list is computed yet! 
 
 Instead, python will only compute the results from the generator when you ask for that. This is what is called **lazy** evaluation.
 
#### Retrieve items from the generator


In [None]:
#First lets setup the generator:
prime_list_gen=(i for i in range(1000000) if is_prime(i))

Lets get the first prime! For this we use the next() function. It takes any iterator and returns the next item.

If you run the same cell again, you will get the next one, hence the functions name.


In [None]:
next(prime_list_gen) # run this cell ceveral times

Let's get the first 10 primes only!

In [None]:
prime_list_first_10=[next(prime_list_gen) for i in range(10)] #subsequent calls of next within a list comprehension? easy!
p_say(prime_list_first_10)

Wait those are actually not the first ten primes(two is missing), they are the next ten! That is because we used the same generator, that we already used in the previous cell. Try this by running the cell again.



Now let's get all the other numbers:


In [None]:
prime_list=[prime for prime in prime_list_gen]
p_say(prime_list[:5])
p_say(prime_list[-5:])


If you try to run this again now:
```python
next(prime_list_gen) 
```

it will return an error, because the generator is already spent. You can only iterate over a generator once.

## Execution count and timing
Now let's use the list comprehension and generator to get 10 primes each to see how efficient both approaches are, if you don't need all numbers. We do this by sneaking in a counting function: _cntr() as it always returns True, the expression _cntr()&_is_prime(i) is not changed. But both functions are called with each iteration. 

#### Execution count and timing: list comprehension

In [None]:
# Let's count the actual calls too
count=0
def _cntr():
    global count
    count += 1
    return True

start =datetime.now()

prime_list=[i for i in range(1000000) if _cntr()&is_prime(i)]
prime_list_first_10=prime_list[:10]

duration =round((datetime.now()-start).total_seconds(),2)
    
p_say(f"Done. result: \n\t {prime_list_first_10}")
p_say(f"Computation took {duration} s")
p_say(f"{count} calls to is_prime()")

#### Execution count and timing: generator expression

In [None]:
count=0
start =datetime.now()

prime_list_gen=(i for i in range(1000000) if _cntr()&is_prime(i))
prime_list_first_10=[next(prime_list_gen) for i in range(10)] 

duration =round((datetime.now()-start).total_seconds(),2)

p_say(f"Done. result: \n\t {prime_list_first_10}")
p_say(f"Computation took {duration} s")
p_say(f"{count} calls to is_prime()")

So this is quite a bit difference!

And, now use the generator to get them all

In [None]:
#First lets get all the primes by using the generator:
count=0
start =datetime.now()
prime_list_gen=(i for i in range(1000000) if _cntr()&is_prime(i))
prime_list=[i for i in prime_list_gen]
duration =round((datetime.now()-start).total_seconds(),2)


p_say(f"{len(prime_list)} primes found. The last five are :{prime_list[-5:]}")
p_say(f"Computation took {duration} s")
p_say(f"{count} calls to is_prime()")

#timing this leads (on my machine) again to about 2.4s so we don't loose any time by doing this, great.
#But the advantage comes in most prominent, if you don't actually need all elements from the list


### Generators - syntax alternatives
There are two main ways to define a generator. The fist follows the syntax of the list comprehensions, but uses parenthesis () instead of squared brackets. Thats what we used in the previous example:

**List Comprehension syntax**

*List comprehension*
```python
newlist = [expression for item in iterable if condition]
```
*Generator*
```python
generator = (expression for item in iterable if condition)
```
The second follows the syntax of function definitions, but instead of return, we use yield

**Function definition syntax**

*function*
```python
def complicatedFunction():
    #Do some complicated stuff here
    return something
```

*Generator*
```python
def complicatedGenerator():
    #Do some complicated stuff here
    yield something
```

"yield" is different in that return ends the function (so no expression inside the function after return is evaluated), and yield is more like a pause (so at the next next() call, evaluation starts at the line after the pass)


A speciality of generators is, that they can only be iterated thorugh one time. Once, an (or all) element is visited, its basically spent. 

In [None]:
#and for an easy example:

def easy_generator():
    for i in range(5):
        yield i       

p_say([i for i in easy_generator()])

## Example 1: Prime Sextuples

Now let's facilitate generators to do something useful!

Let's look again at prime numbers. If two primes are directly neigbours (i.e. the are only 2 apart) they are called primetwins. Examples for this is are (3,5) or (227,229).

But twins are boring, let's jump straight to sextuples, why not? Six primes can't be direct neigbours, because every third odd number is divisible by three and, hence, not a prime. So we have to allow some more space between them.

Our Prime sextuples should follow the form:
 (p, p+4, p+6, p+10, p+12, p+16) [see here](https://en.wikipedia.org/wiki/Prime_quadruplet)


As we dont know how many primes we have to look through to find a sextuple (or several) we can make use of another great property of generators: they don't **have to** have an upper limit!

But be careful! Whenever you write something like **"while True:"** make extra sure that you are doing it safely.


In [None]:
#First lets turn the "is_prime()" function from above into a generator

def prime_generator():
    yield 2 #manually yield 2 first, so that we can start with 3 and use an increment by 2
    number=3
    while True:
        is_Prime=True    
        for i in range(2,int(number**0.5)+1):
            if number%i==0:
                is_Prime=False
                break
        if is_Prime: yield number
        number+=2
        
prime_gen=prime_generator()

Actually, this is a good point to show off another nice lazyness feature: all(iterable). This function iterates through the iterable and checks if all values observed are "True". This is done lazily, so once something other than (True) is observed, all() stops executing and returns False (there is also any() that returns True once it finds a True value in the iterable).


In [None]:
def prime_generator2():
    yield 2 #manually yield 2 first, so that we can start with 3 and use an increment by 2
    number=3
    while True:
        if all((number % i != 0 for i in range(3, int(number**.5 ) + 1,2))):
            yield number
        number+=2
prime_gen=prime_generator2()

Here, we make use of a generator to test possible divisors in succession. If the checked number (i) is a divisor, the generator returns False. This terminates the calling all() function with False as well -> hence no computation is wasted.
I guess it's up to you to decide which generator is easier to read.

In [None]:
# Now lets see if it works
for i in range(15):
    print(next(prime_gen))

Looking good!
Next, we develop a generator for prime sextuplets by going through 6 primes at a time to see if they qualify:

In [None]:
def prime_sextuplets_generator():
    prime_gen=prime_generator() #use the prime_generator from above, as we know, there is no upper limit to the primes generated by this
    
    # Use a generator to run next(prime_gen) six times -> you can directly unpack a generator into variables!
    p1,p2,p3,p4,p5,p6 = (next(prime_gen) for i in range(6)) 
    
    while True:
        # See formula, we have a sixtuplet if the first and last are 16 apart
        if(p6-p1)==16:                                   
            yield (p1,p2,p3,p4,p5,p6)        
        #shifting the primes one back and adding the next one
        p1,p2,p3,p4,p5,p6 = p2,p3,p4,p5,p6,next(prime_gen)
        

Pretty neat and compact function I think! Python is a very nice language for this sort of task. Lets see if it works as expected:

In [None]:
prime_gen=prime_sextuplets_generator()
start =datetime.now()

for i in range(10):
    p_say(next(prime_gen))
    
duration =round((datetime.now()-start).total_seconds(),2)
p_say(f"Computation took {duration} s")

Had we done the search based on a list instead of a generator, we would have needed to know how many primes to prepare to include those 10 prime sextuplets. Depending on the application this is very hard or impossible to know beforehand!

## Example 2: Back to the calendar

Another example! Remember the calendar from the List-Comprehensions notebook? The final steps in that where mostly to showcase the problems of trying to do everything within a list comprehension.

So lets make use of the generator syntax, to actually turn this into usefull example!

In [None]:
def is_leap_year(yr): #Same function as earlier
    if yr%4!=0:
        return False
    elif yr%100!=0:
        return True
    elif yr%400!=0:
        return False
    else:
        return True

Instead of the nested list comprehension and trying to compute all the days at once, we use much easier readible loops in our generator.

In [None]:
days=["Thu","Fri","Sat","Sun","Mon","Tue","Wed"]
months={"Jan":31,"Feb":28,"Mar":31,"Apr":30,"May":31,"Jun":30,"Jul":31,"Aug":31,"Sep":30,"Oct":31,"Nov":30,"Dec":31}
years={yr : is_leap_year(yr) for yr in range(1970,2022)}

def gen_calendar():
    i=0
    for year,leap in years.items():
        for month,max_days in months.items():
            max_days_leap=max_days+1+(leap and (month=="Feb"))
            for day in range(1,max_days_leap):
                yield f"{days[(i)%7]}, {day}th of {month} {year}"
                #unlike return in functions, yield won't terminate the loop here! hence, its fine to do the increment after yielding                 
                i+=1
                


In [None]:
fun_calendar=gen_calendar()
p_say(f"prepared the generater: {fun_calendar}")

fancy_cal=[s for s in fun_calendar]
p_say(fancy_cal[-5:])

Isn't that just much more readable? And again, if you don't need all the days, you just have to compute those you actually care for. Let's demonstrate this again by turning our fun calendar in a doomed calendar!



In [None]:
def gen_unlucky_days():
    i=0
    for year,leap in years.items():
        for month,max_days in months.items():
            max_days_leap=max_days+1+(leap and (month=="Feb"))
            for day in range(1,max_days_leap):
                if (day==13) & (days[(i)%7]=="Fri"): #Now we only return doomed days, i.e. Friday the 13th!
                    yield f"{days[(i)%7]}, {day}th of {month} {year}"
                i+=1
                

bad_luck_days=gen_unlucky_days()


n_say(f"Fun fact: did you know Black Sabbath debut album 'Black Sabbath' was released on the first doomed day since unix time {next(bad_luck_days)}?")
n_say(f"Not-so-fun fact: did you know that friday the 13th is considerd unlucky because on friday 13th October 1307 most members of the Templar order were arrested and consequently imprisoned, tortured or murdered?")

To successfully end this notebook on a sad note, let's keep generating unlucky days for the rest of our time:

In [None]:
 # Keep repeating for more days of doom and sorrow
next(bad_luck_days)