# Useful Hints

##### Reading and rewinding data from a File

The file object supports reading data by specifying the amount of data we want to read, and repositioning the "read head" using the `seek` function.

Let's take a look:

In [1]:
with open('cars.csv') as f:
    print('---', f.read(100))  # read head at 100
    print('---', f.read(100))  # read head at 200
    f.seek(0)                  # read head at 0 
    print('---', f.read(100))  # read head at 100
    

--- Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu
--- ;18.0;8;307.0;130.0;3504.;12.0;70;US
Buick Skylark 320;15.0;8;350.0;165.0;3693.;11.5;70;US
Plymouth 
--- Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu


##### Sniffing the CSV dialect

The dialect of a CSV file refers to some of the specifics used to define data in a CSV file. The separators can be different (for example some failes use a comma, some use a semi-colon, some use a tab, etc).

Also, as we have seen before, a field is also sometimes delimited using quotes, or double quotes, or maybe some entirely different character.

When we have to deal with files that may be encoded using different dialects it can require quite a bit of work to determine what those specifics are. This is were the `Sniffer` class from the `csv` module can be useful. By providing it a sample fo the CSV file, it can analyze it and determine a best guess as to the specific dialect that was used. We can then use that dialect when we use the `csv.reader` function.

Let's see how to use it with one of our files: `personal_info.csv`:

In [2]:
import csv

with open('personal_info.csv') as f:
    sample = f.read(2000)
    dialect = csv.Sniffer().sniff(sample)
print(vars(dialect))

{'__module__': 'csv', '_name': 'sniffed', 'lineterminator': '\r\n', 'quoting': 0, '__doc__': None, 'doublequote': False, 'delimiter': ',', 'quotechar': '"', 'skipinitialspace': False}


We can now use this dialect to open the csv reader:

In [4]:
from itertools import islice

with open('personal_info.csv') as f:
    reader = csv.reader(f, dialect)
    for row in islice(reader, 5):
        print(row)

['ssn', 'first_name', 'last_name', 'gender', 'language']
['100-53-9824', 'Sebastiano', 'Tester', 'Male', 'Icelandic']
['101-71-4702', 'Cayla', 'MacDonagh', 'Female', 'Lao']
['101-84-0356', 'Nomi', 'Lipprose', 'Female', 'Yiddish']
['104-22-0928', 'Justinian', 'Kunzelmann', 'Male', 'Dhivehi']


# Goal 1

For this goal, you are given a number of CSV files, each of which have their first row with the field name.

You goal is to create a context manager that you can use to produce the data from each file in a named tuple with field names corresponding to the  header row field names.

You should use the `csv` module's `reader` function to help with parsing the data.

Your context manager should be generic in the sense that it should just need the file name, no other configuration or hardcoded functionality is required. You do not need to worry about data types for this goal - just return every field as a string.

In addition, your context manager should produce lazy iterators.

Implement this using a class that implements the context manager protocol

#######################

We don't want to hardcode in the csv dialect for each file. Instead, we want Python to try and figure it out, and go off that prediction. We can do this with `csv.Sniffer()`.

In [5]:
import csv

def get_dialect(f_name):
    with open(f_name) as f:
        return csv.Sniffer().sniff(f.read(1000))

example_dialect = get_dialect('cars.csv')
vars(example_dialect)

mappingproxy({'__module__': 'csv',
              '_name': 'sniffed',
              'lineterminator': '\r\n',
              'quoting': 0,
              '__doc__': None,
              'doublequote': False,
              'delimiter': ';',
              'quotechar': '"',
              'skipinitialspace': False})

In [23]:
from collections import namedtuple
import csv


class FileParser:
    def __init__(self, f_name):
        self.f_name = f_name

    def __enter__(self):
        self._f = open(self.f_name, 'r')
        self._reader = csv.reader(self._f, get_dialect(self.f_name))
        headers = map(lambda s: s.lower(), next(self._reader))
        self._nt = namedtuple('Data', headers)
        return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        self._f.close()
        return False

    def __iter__(self):
        return self

    def __next__(self):
        if self._f.closed:
            raise StopIteration
        else:
            return self._nt(*next(self._reader))


In [24]:
from itertools import islice

with FileParser('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [25]:
with FileParser('personal_info.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')


# Goal 2

The goal is to reproduce the work you did in Goal 1, but using a generator function and the `contextlib` `contextmanager` decorator.

We have to separate the iterator and context manager protocol into two separate parts for this approach to work. `parsed_data` is going to be the generator function which is converted into a context manager and yields an iterator

In [51]:
from collections import namedtuple
import csv
from contextlib import contextmanager

@contextmanager
def parsed_data(f_name):
            
    f = open(f_name)
    try:
        dialect = csv.Sniffer().sniff(f.read(1000))
        f.seek(0)
        
        reader = csv.reader(f, dialect)
        headers = map(lambda s:s.lower(), next(reader))
        nt = namedtuple('Data', headers)
        yield (nt(*row) for row in reader)

    finally:
        f.close()

In [49]:
from itertools import islice

with parsed_data('cars.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


In [50]:
with parsed_data('personal_info.csv') as data:
    for row in islice(data, 5):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
