## Before continuing, please select menu option:  **Cell => All output => clear**

## Generators are great for processing files & pipelines

*This example uses generator comprehensions but a more complete solution would likely use generator functions.*

Imagine a large dataset:

> permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round<br>
> digg,Digg,60,web,San Francisco,CA,1-Dec-06,8500000,USD,b<br>
> digg,Digg,60,web,San Francisco,CA,1-Oct-05,2800000,USD,a<br>
> facebook,Facebook,450,web,Palo Alto,CA,1-Sep-04,500000,USD,angel<br>
> facebook,Facebook,450,web,Palo Alto,CA,1-May-05,12700000,USD,a<br>
> photobucket,Photobucket,60,web,Palo Alto,CA,1-Mar-05,3000000,USD,a<br>
> ...

Strategy:

1. Read every line of the file.
2. Split each line into a list of values.
3. Extract the column names.
4. Use the column names and lists to create a dictionary.
5. Filter out the rounds you aren’t interested in.
6. Calculate the total and average values for the rounds you are interested in.

In [None]:
# Is the sample available:
!dir techcrunch.csv

In [None]:
!type techcrunch.csv

In [None]:
# Read in the file:
file_name = "techcrunch.csv"
lines = (line for line in open(file_name))
lines

In [None]:
# Split each line ito values:
list_line = (s.rstrip().split(",") for s in lines)
list_line

In [None]:
# Get just the header row:
cols = next(list_line)
cols

In [None]:
# Convert data into a dictionary:
company_dicts = (dict(zip(cols, data)) for data in list_line)
company_dicts

In [None]:
# Filter the rounds you are not interested in:
funding = (
    int(company_dict["raisedAmt"])
    for company_dict in company_dicts
    if company_dict["round"].upper() == "A"
)
funding

In [None]:
# Calculate the total:
total_series_a = sum(funding)
print(f"Total series A fundraising: ${total_series_a}")

## Exercise:
 1. When does the code to read the data lines from the file get executed above?
 2. Modify above to calcuate the average of the filtered rounds.