Understanding the Key Workings of Python Generators

Reason:
- have you ever ahd to work with a dataset so large that it overwhelmed your machines' memory? Or maybe you have a complex function that needs to maintain an internal state
everytime it is called, but the function is too small to justify creating its own class. In this case and more, generators and the python yield statememt are here to help

Generator functions are a special kind of functions that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.

Example 1 - Reading Large Files
A common use case of generators is work with data streams or large files, like CSV files. These text files seperate data into columns by using commas. This format is a common way to share data.

In [None]:
def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split('\n')
    return result

csv_gen = csv_reader('some_csv.txt')
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row Count is: {row_count}")

In the above version, the computer will probably slow down, because of a probable memory error (MemoryError)

In [1]:
def csv_reader(file_name):
    for row in open(file_name,  "r"):
        yield row

csv_gen = csv_reader('techcrunch.csv')
row_count =0

for row in csv_gen:
    row_count += 1

print(f"Row Count is: {row_count}")

Row Count is: 1461


From the above, csv_reader() has been turned into a generator function. This version opens files, loops through each line, and yields each row, instead of retuning it.

Generator expression (also called a generator comprehension), which has a very similar syntax to list comprehensions. In this way, the generator can be used without calling a function.

Example 2: Generating an infinite sequence

In [2]:
a = range(5)
list(a)


[0, 1, 2, 3, 4]

In [5]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1
    

Example 3: Detecting Palindromes
- You can use infinite sequences in many ways, but one practical use for them is in building palindrome detectors. A palindrome detector is will locate all sequences of letters or numbers
that are palindromes. These are words or numbers that are read the same forward and backward, like 121

In [9]:
def is_palindrome(num):
    if num // 10 == 0:
        return False
    temp = num
    reversed_num = 0

    while temp != 0:
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10
    
    if num == reversed_num:
        return num
    else:
        return False

nums_squared_lc = [num**2 for num in range(5)]
nums_squared_lc
nums_squared_gc = (num**2 for num in range(5))
nums_squared_gc

<generator object <genexpr> at 0x104f70d60>

Profiling Generator Performance
- generators are a great way to optimize memory. While an infinite sequence generator is an extreme example of this optimization

In [11]:
import sys
nums_squared_gc = [i ** 2 for i in range(10000)]
sys.getsizeof(nums_squared_gc)
nums_squared_gc = (i ** 2 for i in range(10000))
print(sys.getsizeof(nums_squared_gc))

104


From above the list comprehension is 87,624 bytes, while the generator object is only 104, This means that the list is over 700 times larger than the generator object
There is one thing to keep in mind also, if the list is smaller than the running machine's available memory, then list comprehension can be faster to evaluate than the equivalent generator expression. To explore this, below, sum across two results from the two comprehensions

In [13]:
import cProfile
cProfile.run('sum([i*2 for i in range(10000)])')


         5 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <string>:1(<listcomp>)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [14]:
cProfile.run('sum((i*2 for i in range(10000)))')

         10005 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10001    0.001    0.000    0.001    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.001    0.001 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




Understanding the Python Yield Statement

On the whole, the primary job of yield is to control the flow of a generator function in a way that's similar to return statements. 
When you call a generator function or use a generator expression, you return a special ITERATOR called a generator. WHen you call special methods on the generator, such as next(), the code within the function is executed up to yield.

WHen the python yield statement is hit, the program suspends function execution and returns the yielded value to the caller. (In contrast, return stops function execution completely.) When a function is suspended, the state of the function is saved. This includes any variable bindings local to the generator, the instruction pointer, the internal stack, and any exception handling

This allows you to resume function execution whenever you call one of the generator's methods. In this way, all function evaluation picks back up right after the yield. This is fairly visible when using
multiple yield statements.

In [None]:
def multi_yield():
    yield_str = "This will print the first string"
    yield yield_str
    yield_str = "This will print the second string"
    yield yield_str

multi_obj = multi_yield()
print(next(multi_obj))


This will print the first string


In [17]:
print(next(multi_obj))

This will print the second string


In [18]:
print(next(multi_obj))

StopIteration: 

From the above, the last next() call, the execution has blown up with a traceback. This is because generators, like all iterators, can be exhausted. Unless the generator is infinite, you can iterate through it one time only. Once values have been evaluated, iteration will stop and for the loop will exit. If you used next(), then instead you will get an explicit StopIteration exception

Using Advanced Generator Methods
.send()
.throw()
.close()

Using .send()

the idea is to build a program that makes use of all three methods. This program will print numeric palindromes like before, but with a few tweaks. Upon encountering a palindrome, program will add a digit and start a search for the next one from there. Exceptins will be handled with .throw() and stop the generator after a given amount of digits with .close(). 

In [22]:
def is_palindrome(num):
    # skip single digit values
    if num // 10 == 0:
        return False
    temp = num
    reversed_num = 0

    while temp != 0:
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10
    
    if num == reversed_num:
        return True
    else:
        return False

def infinte_palindromes():
    num = 0
    while True:
        if is_palindrome(num):
            i = (yield num)
            if i is not None:
                num = i
        num += 1

pal_gen = infinte_palindromes()
for i in pal_gen:
    print(i)
    digits = len(str(i))
    if digits == 5:
        # pal_gen.throw(ValueError("We do not like large palindromes"))
        pal_gen.close()
    pal_gen.send(10 ** (digits))


11
111
1111
10101


StopIteration: 

Creating Data Pipelines with Generators
Data pipelines allow you to string together code to process large datasets or streams of data without maxing machine memory. 

Strategy:
1. Read every line of the file
2. Split each line into a list of values
3. Extract the column names
4. Use the column names and list to create a dictionary
5. filter out the rounds you are not interested in
6. Calculate the total and average values for the rounds you are interested in.

In [25]:
file_name = "techcrunch.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)

"""
    From above you first create a generator expression lines to yield each line in a file. Next you iterate through that generator within the definition of another generator expression
    called list_line, which turns each line in to a list of values. Then you advance the iteration of list_line just once with the next() to get a list of column names from your csv file.

    .rstrip() make sure that there are no trailing newline characters, which can be present in the CSV files
"""
company_dicts = (dict(zip(cols, data)) for data in list_line)

"""
    the generator expression iterates through the list produced by list_line. THen it uses zip() and dict() to create the dictionary as specified above. Below is another generator
    to filter the funding round you want and pull raisedAmount as well
"""
funding = (
    int(company_dict["raisedAmt"])
    for company_dict in company_dicts
    if company_dict['round'] == "a"
)

"""
    in this code snippet the generator iterates the results of company dicts and take the raisedAmount for any company_dict where the round key is a
    remember that you are not iterating through all these at once in the generator expression. In fact you are not iterating through anything you actually use a for loop or a function that 
    works on iterables, like sum(). In fact, call sum() num to iterate through the generators
"""
total_series_a = sum(funding)
print(f"Total series a fund raising: ${total_series_a}")

Total series a fund raising: $4380015000


Review Questions:
1. What is a generator function
 - a function that returns a lazy iterator
 - A generator function in Python is a special kind of function that returns a lazy iterator. These iterators allow you to loop over the items they generate without storing all items in memory at the same time.
 -Introduced with PEP 255, generator functions are a powerful feature for creating iterators that you can work with in a memory-efficient way.


2. What’s a key difference between a list and a generator?
    - Generators do not store their contents in memory
    - A key difference between a list and a generator in Python is that generators don’t store their contents in memory, while lists do. This makes generators lazy iterators, meaning they generate their elements on the fly, as you request them, instead of storing them all upfront.
    - This can be a big advantage when you’re working with large datasets or when you’re generating a sequence of results that you only need to process one at a time.

3. What is the difference between yield and return in a Python function?
    - yield sends a value back to the caller and remembers the statre of the funciton for the next call, while return sends a value back and exits the function.
    - You can use the yield keyword in a function like a return statement.
    - However, yield returns a value and pauses the function’s execution while keeping the function’s state. You can resume the function with the next call right where it left off, which allows the function to generate a series of values over time instead of computing all values at once.
    - This is why functions that contain a yield statement are called generators.

4. The yield statement in a generator function suspends the function execution and returns the yielded value to the caller.
    - You can then resume the function from the same point the next time you’re calling one of the generator’s methods. This is different from return, which finishes function execution completely.
    - Once you’ve fully iterated over a generator, it raises a StopIteration exception to signal that it’s exhausted.

5. What is the main difference between generator expression and list comprehension in python?
    - a generator expression does not build and hold the entire data in memory before iteration. During iteration, the generator only generates one item at a time.
    - This means that you won’t have a memory penalty when you use generator expressions.

6. What does .send() do in the context of a Python generator?
    - In the context of a Python generator, you can use .send() to send a value back to the generator.
    - When you use yield as an expression, then you can use .send() to manipulate the yielded value. This allows you to update the state of the generator from outside the generator function.

7. What does .throw() do in a python generator?
    - The .throw() method allows you to throw exceptions within a Python generator.
    - For example, you can use .throw() to control when you stop iterating through the generator.

8. In Python, you can stop a generator by calling its .close() method. This method raises a StopIteration exception, which you can use to signal the end of a finite iterator.
9. What happens when you iterate over a generator fully?
    - When you fully iterate over a generator, then it exhausts itself.
    - This means that you can’t iterate over the same generator again. If you try to do so, the generator won’t yield any more values.