### Pipelines - Pulling Data

Included with this notebook, we are going to use the `cars.csv` data file.

Let's start by writing a generator that will produce data from that file:

In [1]:
import csv

def parse_data(f_name):
    f = open(f_name)
    try:
        dialect = csv.Sniffer().sniff(f.read(2000))
        f.seek(0)
        next(f)  # skip header row
        yield from csv.reader(f, dialect=dialect)
    finally:
        f.close()

Notice how we are already using delegation to delegate iteration fo the csv reader iterator. Here we are therefore pulling data from the csv reader and yielding that out from the `parse_data` generator.

Let's look at the data:

In [2]:
import itertools

for row in itertools.islice(parse_data('cars.csv'), 5):
    print(row)

['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Buick Skylark 320', '15.0', '8', '350.0', '165.0', '3693.', '11.5', '70', 'US']
['Plymouth Satellite', '18.0', '8', '318.0', '150.0', '3436.', '11.0', '70', 'US']
['AMC Rebel SST', '16.0', '8', '304.0', '150.0', '3433.', '12.0', '70', 'US']
['Ford Torino', '17.0', '8', '302.0', '140.0', '3449.', '10.5', '70', 'US']


Now let's filter out rows based on the car make:

In [3]:
def filter_data(rows, contains):
    for row in rows:
        if contains in row[0]:
            yield row

We can now start building a (pull) pipeline by pulling data from the data source, through the filter:
```
caller <-- filter <-- data
```

In [4]:
data = parse_data('cars.csv')
filtered_data = filter_data(data, 'Chevrolet')

# pipeline: caller <-- filtered_data <-- data

for row in itertools.islice(filtered_data, 5):
    print(row)

['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Chevrolet Impala', '14.0', '8', '454.0', '220.0', '4354.', '9.0', '70', 'US']
['Chevrolet Chevelle Concours (sw)', '0', '8', '350.0', '165.0', '4142.', '11.5', '70', 'US']
['Chevrolet Monte Carlo', '15.0', '8', '400.0', '150.0', '3761.', '9.5', '70', 'US']
['Chevrolet Vega 2300', '28.0', '4', '140.0', '90.00', '2264.', '15.5', '71', 'US']


As you can see, using iteration we are pulling data all the way from the file, through the csv reader, through the filter and back to us (the caller).

But why stop there?
Let's further filter out rows that contain the word 'Carlo' as well:

In [5]:
data = parse_data('cars.csv')
filter_1 = filter_data(data, 'Chevrolet')
filter_2 = filter_data(filter_1, 'Carlo')

# pipeline: caller <-- filter_2 <-- filtered_1 <-- data

for row in itertools.islice(filter_2, 5):
    print(row)

['Chevrolet Monte Carlo', '15.0', '8', '400.0', '150.0', '3761.', '9.5', '70', 'US']
['Chevrolet Monte Carlo S', '15.0', '8', '350.0', '145.0', '4082.', '13.0', '73', 'US']
['Chevrolet Monte Carlo Landau', '15.5', '8', '350.0', '170.0', '4165.', '11.4', '77', 'US']
['Chevrolet Monte Carlo Landau', '19.2', '8', '305.0', '145.0', '3425.', '13.2', '78', 'US']


We can package all this up into a single delegator generator:

In [6]:
def output(f_name):
    data = parse_data(f_name)
    filter_1 = filter_data(data,'Chevrolet')
    filter_2 = filter_data(filter_1, 'Carlo')
    yield from filter_2

And we can use our delegator generator this way:

In [7]:
results = output('cars.csv')
for row in results:
    print(row)

['Chevrolet Monte Carlo', '15.0', '8', '400.0', '150.0', '3761.', '9.5', '70', 'US']
['Chevrolet Monte Carlo S', '15.0', '8', '350.0', '145.0', '4082.', '13.0', '73', 'US']
['Chevrolet Monte Carlo Landau', '15.5', '8', '350.0', '170.0', '4165.', '11.4', '77', 'US']
['Chevrolet Monte Carlo Landau', '19.2', '8', '305.0', '145.0', '3425.', '13.2', '78', 'US']


We can actually make this a little more generic while we're at it:

In [8]:
def output(f_name, *filter_words):
    data = parse_data(f_name)
    for filter_word in filter_words:
        data = filter_data(data, filter_word)
    yield from data

In [9]:
results = output('cars.csv', 'Chevrolet')
for row in itertools.islice(results, 5):
    print(row)

['Chevrolet Chevelle Malibu', '18.0', '8', '307.0', '130.0', '3504.', '12.0', '70', 'US']
['Chevrolet Impala', '14.0', '8', '454.0', '220.0', '4354.', '9.0', '70', 'US']
['Chevrolet Chevelle Concours (sw)', '0', '8', '350.0', '165.0', '4142.', '11.5', '70', 'US']
['Chevrolet Monte Carlo', '15.0', '8', '400.0', '150.0', '3761.', '9.5', '70', 'US']
['Chevrolet Vega 2300', '28.0', '4', '140.0', '90.00', '2264.', '15.5', '71', 'US']


In [10]:
results = output('cars.csv', 'Chevrolet', 'Carlo', 'Landau')
for row in itertools.islice(results, 5):
    print(row)

['Chevrolet Monte Carlo Landau', '15.5', '8', '350.0', '170.0', '4165.', '11.4', '77', 'US']
['Chevrolet Monte Carlo Landau', '19.2', '8', '305.0', '145.0', '3425.', '13.2', '78', 'US']
