https://github.com/patrickloeber/python-engineer-notebooks/blob/master/advanced-python/14-Generators.ipynb
Generators
Generators are functions that can be paused and resumed on the fly, returning an object that can be iterated over. Unlike lists, they are lazy and thus produce items one at a time and only when asked. So they are much more memory efficient when dealing with large datasets.
A generator is defined like a normal function but with the yield statement instead of return.

def my_generator():
    yield 1
    yield 2
    yield 3
Execution of a generator function
Calling the function does not execute it. Instead, the function returns a generator object which is used to control execution. Generator objects execute when next() is called. When calling next() the first time, execution begins at the start of the function and continues until the first yield statement where the value to the right of the statement is returned. Subsequent calls to next() continue from the yield statement (and loop around) until another yield is reached. If yield is not called because of a condition or the end is reached, a StopIteration exception is raised:

In [1]:
def countdown(num):
    print('Starting')
    while num > 0:
        yield num
        num -= 1

# this will not print 'Starting'
cd = countdown(3)

# this will print 'Starting' and the first value
print(next(cd))

# will print the next values
print(next(cd))
print(next(cd))

# this will raise a StopIteration
print(next(cd))
     

Starting
3
2
1


StopIteration: 

In [2]:
# you can iterate over a generator object with a for in loop
cd = countdown(3)
for x in cd:
    print(x)

Starting
3
2
1


In [2]:
import pandas as pd

def collatz_steps(n):
    steps = 0
    while n != 1:
        if n % 2 == 0:
            n = n // 2
        else:
            n = 3 * n + 1
        steps += 1
    return steps

# Create a list to store data
data = []

# Input a range of positive integers
start_num = int(input("Enter the starting positive integer: "))
end_num = int(input("Enter the ending positive integer: "))

if start_num <= 0 or end_num <= 0 or start_num > end_num:
    print("Please enter valid positive integers.")
else:
    for num in range(start_num, end_num + 1):
        steps = collatz_steps(num)
        data.append({'Number': num, 'Steps to 1': steps})

# Create a pandas DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

         Number  Steps to 1
0             1           0
1             2           1
2             3           7
3             4           2
4             5           5
...         ...         ...
999995   999996         113
999996   999997         113
999997   999998         258
999998   999999         258
999999  1000000         152

[1000000 rows x 2 columns]


In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Function to extract the first digit
def extract_first_digit(number):
    while number >= 10:
        number //= 10
    return number

# Apply the function to the 'Numbers' column
df['First Digit'] = df['Steps to 1'].apply(extract_first_digit)




In [None]:
# Calculate the expected Benford's Law distribution
benford_distribution = np.log10(1 + 1 / np.arange(1, 10))

# Normalize the counts to match the expected distribution
normalized_counts = first_digit_counts / first_digit_counts.sum()

# Plot the actual and expected distributions
plt.bar(range(1, 10), normalized_counts, label='Actual Distribution')
plt.plot(range(1, 10), benford_distribution, marker='o', linestyle='--', color='r', label="Benford's Law")
plt.xlabel('First Digit')
plt.ylabel('Normalized Frequency')
plt.title('Benford\'s Law Test on Column')
plt.xticks(range(1, 10))
plt.legend()
plt.show()