Generators in Python

Generators are a simple way to create iterators. They are written using 

functions and the yield statement. Unlike regular functions that return a

single value and terminate, generators return an iterable set of values,

one at a time, using the yield statement

## Advantages of Generators

Memory Efficiency: Generators produce items one at a time and only when required, thus saving memory.

Representing Infinite Sequences: Generators can represent infinite sequences.

Pipelining Operations: Generators can be used to pipeline a series of operations, which can be efficient

for large data sets.

## Best Practices

Use Generators for Large Datasets: When dealing with large datasets, use generators to avoid memory overload.

Chain Generators: You can chain generators together to form a pipeline of operations.

Handle Exceptions: Ensure you handle StopIteration and other exceptions properly when using generators.

In [22]:
# Generate a large CSV file with the specified number of rows and columns.

import csv
import random

def generate_large_csv(file_path, num_rows=100000, num_columns=10):
    """Generate a large CSV file with the specified number of rows and columns."""
    header = [f"Column{i+1}" for i in range(num_columns)]
    data = [[random.randint(1, 1000) for _ in range(num_columns)] for _ in range(num_rows)]

    with open(file_path, 'w', newline='') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerow(header)  # Write the header row
        csv_writer.writerows(data)   # Write the data rows

# Generate a large CSV file with 100,000 rows and 10 columns
generate_large_csv('large_dataset.csv', num_rows=100000, num_columns=10)

In [23]:
def generate_large_file(file_path, num_lines=100000):
    """
    Generate a large text file with the specified number of lines.

    Parameters:
    - file_path: The path to the file to be created.
    - num_lines: The number of lines to be written to the file.
    """
    with open(file_path, 'w') as file:
        for i in range(num_lines):
            file.write(f"This is line {i+1}\n")

# Example usage
generate_large_file('large_file.txt', num_lines=100000)


Creating Generators

In [2]:
def my_generator():
    yield 1
    yield 2
    yield 3


Using the Generator


In [3]:
gen = my_generator()

print(next(gen))  # Output: 1
print(next(gen))  # Output: 2
print(next(gen))  # Output: 3

# Here what happens is that the generator function is called and it returns a generator object.
# This generator object is used to generate the required values.
# The next() function is used to generate the next value in sequence.
# If there are no more values to generate, it will raise the StopIteration exception.

# The generator object can be used only once. If you want to iterate through the generator again, 
# you will have to create another generator object using something like my_generator().

1
2
3


Generator Expressions

Generator expressions are a concise way to create generators, 

similar to list comprehensions but with parentheses instead of square brackets.

In [6]:
gen_exp = (x * x for x in range(10)) # Generator expression 
print(gen_exp)  # Output: <generator object <genexpr> at 0x000001E9E7D3F200>

for num in gen_exp:
    print(num)

<generator object <genexpr> at 0x10799add0>
0
1
4
9
16
25
36
49
64
81


Use Cases for Memory-Efficient Data Processing

In [25]:
# Processing Large Files

def read_large_file(file_path):
    """Generator to read a large file line by line."""
    with open(file_path) as file:
        for line in file:
            yield line.strip()  # Strip leading/trailing whitespace

def process(item):
    """Processing function to count the number of characters in the line."""
    return len(item)

# Process each line in the large file
for line in read_large_file('large_file.txt'):
    result = process(line)  # Apply processing function
    print(result)  # Output the result

14
14
14
14
14
14
14
14
14
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
1

In [12]:
# Generating Infinite Sequences

def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

gen = infinite_sequence()
for i in range(10):
    print(next(gen))

0
1
2
3
4
5
6
7
8
9


In [15]:
import time

# Step 1: Creating a Data Stream Generator
def get_limited_data_stream(limit=10):
    """Simulate a data stream with a limited number of items."""
    num = 1
    count = 0
    while count < limit:
        yield num
        num += 1
        count += 1
        time.sleep(0.1)  # Simulate delay in data stream

# Step 2: Define a Condition Function
def some_condition(item):
    """Condition to filter even numbers."""
    return item % 2 == 0

# Step 3: Define a Processing Function
def process(item):
    """Processing function to square the item."""
    return item * item

# Step 4: Stream Processing Function
def stream_processing(source):
    """Process items from the source stream based on a condition."""
    for item in source:
        if some_condition(item):
            yield process(item)

# Create the limited data stream generator
source = get_limited_data_stream()

# Process the stream and print the results
for processed_item in stream_processing(source):
    print(processed_item)


4
16
36
64
100


Examples of Generators for Specific Scenarios
Fibonacci Sequence

In [16]:
def fibonacci(limit):
    a, b = 0, 1
    while a < limit:
        yield a
        a, b = b, a + b

for num in fibonacci(100):
    print(num)

0
1
1
2
3
5
8
13
21
34
55
89


Prime Numbers Generator


In [17]:
def primes(limit):
    def is_prime(num):
        if num < 2:
            return False
        for i in range(2, int(num**0.5) + 1):
            if num % i == 0:
                return False
        return True

    num = 2
    while num < limit:
        if is_prime(num):
            yield num
        num += 1

for prime in primes(50):
    print(prime)

2
3
5
7
11
13
17
19
23
29
31
37
41
43
47


Generators are particularly useful for processing large batch datasets because they allow you to
 handle data iteratively without loading the entire dataset into memory. This can lead to significant
  memory savings and improved performance when dealing with large volumes of data. Here's a step-by-step
   guide on how to use generators efficiently for large batch datasets:

Example: Processing a Large CSV File
Step 1: Create a Generator to Read the CSV File
Instead of loading the entire CSV file into memory, you can use a generator to read it line by line.

In [26]:
import csv

def read_large_csv(file_path):
    """Generator to read a large CSV file line by line."""
    with open(file_path, newline='') as csvfile:
        csv_reader = csv.reader(csvfile)
        header = next(csv_reader)  # Skip header row
        for row in csv_reader:
            yield row

Define a Processing Function
Define a function to process each row. This could be any operation, such as data cleaning,
transformation, or aggregation.

In [27]:
def process_row(row):
    """Processing function to clean or transform the data."""
    # Example: Convert all fields to integers
    return [int(field) for field in row]

Use a Generator to Process the Data

Use another generator to process the data iteratively.

In [28]:
def process_data(source):
    """Generator to process each row from the data source."""
    for row in source:
        yield process_row(row)

Handling the Processed Data

You can handle the processed data in batches or one at a time, depending on your use case.

For demonstration purposes, we will print each processed row.

In [29]:
file_path = 'large_dataset.csv'  # Path to the large CSV file
source = read_large_csv(file_path)
processed_data = process_data(source)

for processed_row in processed_data:
    print(processed_row)

[178, 893, 534, 542, 899, 559, 681, 416, 246, 909]
[532, 815, 588, 263, 880, 770, 571, 809, 27, 606]
[954, 140, 189, 498, 716, 335, 937, 110, 316, 24]
[373, 144, 795, 162, 794, 805, 567, 577, 29, 709]
[947, 594, 353, 347, 58, 16, 913, 84, 457, 266]
[261, 448, 706, 938, 750, 912, 320, 112, 79, 436]
[69, 841, 844, 653, 581, 55, 596, 24, 165, 100]
[251, 685, 770, 568, 148, 485, 386, 315, 625, 458]
[393, 836, 867, 217, 270, 651, 885, 101, 876, 234]
[340, 254, 432, 147, 726, 14, 611, 932, 807, 44]
[897, 620, 318, 570, 409, 305, 227, 340, 698, 662]
[913, 168, 650, 908, 530, 681, 880, 703, 303, 927]
[906, 565, 685, 3, 126, 625, 160, 345, 225, 494]
[61, 588, 552, 266, 686, 986, 339, 178, 3, 523]
[515, 796, 12, 301, 798, 390, 496, 747, 887, 168]
[798, 714, 484, 823, 703, 152, 271, 963, 866, 630]
[832, 739, 675, 713, 900, 805, 88, 852, 429, 775]
[209, 205, 124, 386, 968, 414, 626, 285, 71, 12]
[251, 880, 952, 101, 489, 917, 386, 229, 103, 459]
[976, 604, 774, 566, 271, 884, 1, 495, 931, 743]
[64