### **Introduction to `yield` in Python**

The `yield` keyword in Python is used to create **generators**, which are a type of iterable that allows you to produce values **lazily**, one at a time, instead of returning all at once like in a list.

---

### **Key Features of `yield`:**

1. **State Retention:**
   - Unlike `return`, which exits a function completely, `yield` pauses the function and retains its state. The function can be resumed from where it left off.

2. **Efficient Memory Usage:**
   - Because generators produce items one at a time, they are more memory-efficient than creating and storing all items in memory at once.

3. **Simplifies Iterator Creation:**
   - Generators eliminate the need for implementing `__iter__()` and `__next__()` methods manually.

4. **Use Cases:**
   - Generators are ideal for handling large data streams, infinite sequences, or any scenario where you don't need all the data at once.

---

### **How `yield` Works:**

#### **1. Creating a Generator Function:**
   - Any function that contains a `yield` statement automatically becomes a generator function.
   - Instead of returning a single value, the function generates a series of values, pausing after each `yield`.

#### Example:
```python
def count_up_to(n):
    count = 1
    while count <= n:
        yield count
        count += 1

# Using the generator
for num in count_up_to(5):
    print(num)
```

**Output:**
```
1
2
3
4
5
```

**Explanation:**
- The function `count_up_to` pauses at each `yield` and resumes when the next value is requested.

---

#### **2. Comparing `yield` vs `return`:**
- **`return`**: Ends the function and sends a single value.
- **`yield`**: Pauses the function and can return multiple values over time.

```python
def using_return():
    return [1, 2, 3]  # Returns all values at once

def using_yield():
    yield 1
    yield 2
    yield 3  # Yields values one at a time
```

---

### **When to Use `yield`?**

1. **Large Datasets:**
   - When processing a dataset that is too large to fit in memory, like reading a massive file line by line.
   
   Example:
   ```python
   def read_file(file_name):
       with open(file_name) as file:
           for line in file:
               yield line.strip()
   ```

2. **Infinite Sequences:**
   - When you need to generate a potentially infinite series, such as Fibonacci numbers or prime numbers.
   
   Example:
   ```python
   def infinite_fibonacci():
       a, b = 0, 1
       while True:
           yield a
           a, b = b, a + b
   ```

3. **Pipelines:**
   - When chaining multiple processing steps together, using generators avoids creating intermediate lists.


In [None]:
# Example of data lost using return

def start_scraping(response_api):
    results = []

    for i in response_api:
        color = i["color"] # This will trigger error
        results.append(color)
    return results
    # print("End of function")

response_api = [
    {"ID": 1, "item": "Laptop", "color": "black"},
    {"ID": 2, "item": "Smart Watch", "color": "green"},
    {"ID": 3, "item": "Camera"},
]

print(start_scraping(response_api))

In [None]:
# Example of data retrieved with yield

def start_scraping(response_api):
    for i in response_api:
        yield i["color"] # This will trigger error
        # print("End of function")

# Dummy data
response_api = [
    {"ID": 1, "item": "Laptop", "color": "black"},
    {"ID": 2, "item": "Smart Watch", "color": "green"},
    {"ID": 3, "item": "Camera"},
]

# Create a generator object
results = start_scraping(response_api)

for i in results:
    print(i)

In [None]:
# Compare the size of a list and a generator
import sys

example_list = [i for i in range(1000)]
example_generator = (i for i in range(1000))

print(sys.getsizeof(example_list))
print(sys.getsizeof(example_generator))


In [None]:
""" 
Objective: Understanding the difference between a funtion and a generator
"""
list_data = [i for i in range(10)]

# TODO: 
# 1. Create a function that reverse a list manually, without reverse method
# 2. Execute your function using list_data as the input parameter
# 3. Check your function by printing them
# 4. Print all of the item using loop

In [None]:
""" 
Objective: Understanding the difference between a funtion and a generator
"""
# TODO: 
# 1. Re-create previous function using yield
# 2. Execute your function using list_data as the input parameter
# 3. Check your function by printing them
# 4. Print all of the item using loop
# 5. Analyze the difference between them

In [None]:
# TODO: Execute this cell and take a look at csv file before continue
import csv

def create_csv(file_name, base_url, num_entries):
    with open(file_name, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        
        # Write header
        writer.writerow(["ID", "URL"])
        
        # Write rows with dynamically generated URLs
        for i in range(1, num_entries + 1):
            # Replace "page-20.html" with the current ID
            dynamic_url = base_url + f"/catalogue/page-{i}.html"
            writer.writerow([i, dynamic_url])
    
    print(f"CSV file '{file_name}' with {num_entries} dynamic URLs has been created.")

create_csv(
    file_name="books_urls.csv",
    base_url="https://books.toscrape.com",
    num_entries=1000000
)


In [None]:
""" 
Objective: Compare the speed of scraping execution from huge file of csv
"""

import requests
import csv

def read_urls_from_csv(file_path):
    """
    Reads a CSV file and returns a list of URLs found in the 'URL' column.
    """
    urls = []  # Initialize an empty list to store URLs
    with open(file_path, mode='r') as file:
        # Create a CSV reader object to parse the CSV file
        csv_reader = csv.DictReader(file)
        
        # Iterate through each row in the CSV file
        for row in csv_reader:
            # Append the value in the 'URL' column to the urls list
            urls.append(row["URL"])
    
    return urls  # Return the list of URLs

# Read the URLs from the CSV file into the data_csv list
data_csv = read_urls_from_csv('books_urls.csv')

# Iterate through each URL in the list
for url in data_csv:
    print(f"Getting {url}")  # Print a message indicating the URL being fetched
    response = requests.get(url).status_code  # Send a GET request and get the status code
    
    # Raise an exception to intentionally halt the program (for testing purposes)
    raise

# TODO: Take a look at how long it takes before raising error

In [None]:
""" 
Objective: Compare the speed of scraping execution from huge file of csv
"""
# TODO:
# 1. Re-create previous function by using yield
# 2. Compare the time execution and give your insight

In [None]:
""" 
Objective: Using yield for scraping
"""

import requests
from bs4 import BeautifulSoup


# Scrape product data from a list of URLs
def scrape_product_urls(urls):
    """
    Scrape product URLs from a list of pages.
    """
    all_product_urls = []
    for url in urls:
        print(f"Scraping: {url}")

        # TODO: 
        # 1. Get the html response of the page url
        # 2. Extract the items url into all_product_urls

    return all_product_urls

# Main execution
if __name__ == "__main__":
    page_urls = [
        "https://books.toscrape.com/catalogue/page-1.html",
        "https://books.toscrape.com/catalogue/page-10.html",
        "https://books.toscrape.com/catalogue/page-200.html",
        "https://books.toscrape.com/catalogue/page-20.html"
    ]
    product_items = scrape_product_urls(page_urls)

    # Print the extracted product URLs
    for item in product_items:
        print(item)

In [None]:
""" 
Objective: Using yield for scraping
"""
# TODO: 
# 1. Update previous code by using yield

In [None]:
""" 
Objective: Using yield for scraping
"""
# TODO:
# 1. From your last assignment, update your code by using yield
# 2. Create a new branch and push into github
# 3. Put the URL here

### **Reflection**
If you have a lot of memory, do you think you still need a generator? Give me your reason!

(answer here)

### **Exploration**
In Python, generators and iterators are both essential tools for working with sequences of data. However, we only covers the generators topic here. Explore about the iterators!