 # 🔁 What is a Generator in Python?
* A generator is a special type of iterator that allows you to iterate through a sequence of data without storing the entire sequence in memory.

 * Created using **yield** instead of **return**.

* Efficient for working with **large datasets**.

* Helps in streaming or lazy evaluation — data is generated one item at a time, on demand.

In [1]:
# 🧪 Python Generators - Data Science Examples

# 🟢 What is a Generator?

# A generator is a special function that returns an iterator.
# Instead of returning all the data at once, it yields one item at a time.
# This is especially useful when dealing with large datasets.

def simple_generator():
    for i in range(3):
        yield i

gen = simple_generator()
print(next(gen))  # Output: 0
print(next(gen))  # Output: 1
print(next(gen))  # Output: 2
# print(next(gen))  # Uncommenting this will raise StopIteration


0
1
2


In [2]:
# 📊 Example 1: Reading a Large CSV File Using a Generator

# Let's assume you have a very large CSV file and you want to process it row by row without loading everything into memory.

import csv

def read_large_csv(file_path):
    with open(file_path, mode='r') as file:
        reader = csv.DictReader(file)
        for row in reader:
            yield row  # Yield one row at a time

# Example usage (assuming 'large_data.csv' exists with a column 'value')
# for row in read_large_csv('large_data.csv'):
#     if float(row['value']) > 100:
#         print(row)


In [2]:
# ⚙️ Example 2: Batching Data with a Generator

# In machine learning, we often feed data into models in batches.
# Here's a generator that yields data in batches.

def batch_generator(data, batch_size):
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]

# Sample data
data = list(range(20))

# Print data in batches of 5
for batch in batch_generator(data, 5):
    print(batch)


[0, 1, 2, 3, 4]
[5, 6, 7, 8, 9]
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]


In [3]:
# 📚 Example 3: Using a Generator with a Pandas DataFrame

import pandas as pd

# Create a mock dataframe
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'score': [88, 92, 85, 90, 95]
})

# Generator to yield rows one by one
def dataframe_row_generator(df):
    for _, row in df.iterrows():
        yield row

# Use the generator
for row in dataframe_row_generator(df):
    print(f"{row['name']} scored {row['score']}")


Alice scored 88
Bob scored 92
Charlie scored 85
David scored 90
Eva scored 95
