# DataFrame from generators

This is a simple example of how to use Python generator functions to create [pandas][pandas] DataFrames.

[pandas]: https://pandas.pydata.org/

## Preliminaries

In [1]:
# Import pandas
import pandas as pd

In [2]:
# Some dummy data
birds = ['Owl', 'Crow', 'Dove', 'Bat']
fishes = ['Shark', 'Cod', 'Sushi', 'Beaver']

If you are not familiar with `zip()`, check out [the documentation][zip]. Here is a quick intro:

[zip]: https://docs.python.org/3/library/functions.html#zip

In [3]:
# Can be combined with zip, e.g.:
for bird, fish in zip(birds, fishes):
    print(f'Bird: {bird} \t Fish: {fish}')

Bird: Owl 	 Fish: Shark
Bird: Crow 	 Fish: Cod
Bird: Dove 	 Fish: Sushi
Bird: Bat 	 Fish: Beaver


Below is an example of a [generator function][gf]:

[gf]: https://wiki.python.org/moin/Generators

In [4]:
# generator function that yields pairs of birds and fishes
def my_generator():
    for bird, fish in zip(birds, fishes):
        yield bird, fish

In [5]:
# generator instances will be "used up"
my_gen = my_generator()

# Iterate over the generator with next()
print(next(my_gen))
print(next(my_gen))
print(next(my_gen))
print(next(my_gen))

('Owl', 'Shark')
('Crow', 'Cod')
('Dove', 'Sushi')
('Bat', 'Beaver')


Now, there is only no "pair" left, so the next step will raise a `StopIteration`:

In [6]:
try:
    next(my_gen)
except StopIteration:
    print('Generator has reach its end.')

Generator has reach its end.


As long as the iterator is not infinite, it can be turned into a list (like any iterable):

In [7]:
list(my_generator())

[('Owl', 'Shark'), ('Crow', 'Cod'), ('Dove', 'Sushi'), ('Bat', 'Beaver')]

In [8]:
# for-loops steps through iterators..
for pair in my_generator():
    print(pair)

('Owl', 'Shark')
('Crow', 'Cod')
('Dove', 'Sushi')
('Bat', 'Beaver')


## Use with pandas
Simply passing the generator function as `data`-parameter to [`pd.DataFrame`][df] will create a DataFrame with each yield's data as a row:

[df]: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

In [9]:
# Create DataFrame
columns = ['Bird', 'Fish']
pd.DataFrame(my_generator(), columns=columns)

Unnamed: 0,Bird,Fish
0,Owl,Shark
1,Crow,Cod
2,Dove,Sushi
3,Bat,Beaver
