### Project 5 - Goal 1

First, let's take a look at each file and make sure the first row contains the field names:

In [1]:
f_names = 'cars.csv', 'personal_info.csv'

for f_name in f_names:
    with open(f_name) as f:
        print(next(f), end='')
        print(next(f), end='')
    print('\n-----------------')

Car;MPG;Cylinders;Displacement;Horsepower;Weight;Acceleration;Model;Origin
Chevrolet Chevelle Malibu;18.0;8;307.0;130.0;3504.;12.0;70;US

-----------------
ssn,first_name,last_name,gender,language
100-53-9824,Sebastiano,Tester,Male,Icelandic

-----------------


One thing I notice here is that the field names in the `cars.csv` file have uppercase letters - the second does not. I'm going to make those consistent by lowercasing the field names when I create the named tuple.

The second thing I notice is that the delimiter is not the same for both files - one uses a `;`, the other uses a `,`. 

Fortunately, the `csv` module has a `Sniffer` class that we can use to try and deduce the delimiter from sampling some data. Alternatively, we could specify the delimiter to use as part of our conjtext manager's `__init__` method - but it would be nicer if we did not have to do that.

Let's first see how that `Sniffer` class works:

In [2]:
import csv
from itertools import islice

with open('cars.csv') as f:
    dialect = csv.Sniffer().sniff(f.read(1000))
print(dialect.delimiter)

;


And we do this with our other file:

In [3]:
with open('personal_info.csv') as f:
    dialect = csv.Sniffer().sniff(f.read(1000))
print(dialect.delimiter)

,


So, we'll use this to set the dialect for our csv parser.

Let's create a small utility function to handle this for us:

In [4]:
def get_dialect(f_name):
    with open(f_name) as f:
        return csv.Sniffer().sniff(f.read(1000))

We want to create a context manager that, given just the file name:
1. reads the header row to get the field names
2. creates the appropriate named tuple
3. uses the `csv.reader` to create an iterator over the data rows of the file
4. returns that iterator from the `__enter__` method
5. closes the file upon `__exit__`

We're actually going to create a class that will be **both** the context manager and the iterator - so we'll implement both of these protocols.

In [5]:
from collections import namedtuple

class FileParser:
    def __init__(self, f_name):
        self.f_name = f_name
        
    def __enter__(self):
        self._f = open(self.f_name, 'r')
        self._reader = csv.reader(self._f, get_dialect(self.f_name))
        headers = map(lambda x: x.lower(), next(self._reader))
        self._nt = namedtuple('Data', headers)
        return self
        
    def __exit__(self, exc_type, exc_value, exc_tb):
        self._f.close()
        return False
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._f.closed:
            # file has been closed - so we're can't iterate anymore!
            raise StopIteration
        else:
            return self._nt(*next(self._reader))

In [6]:
from itertools import islice

with FileParser('cars.csv') as data:
    for row in islice(data, 10):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
Data(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')
Data(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')
Data(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')
Data(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')
Data(ssn='106-35-1938', first_name='Mackenzie', last_name='Nussey', gen

And of course it should work equally well with the other file too:

In [7]:
with FileParser('personal_info.csv') as data:
    for row in islice(data, 10):
        print(row)

Data(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
Data(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
Data(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
Data(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
Data(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')
Data(ssn='105-27-5541', first_name='Federico', last_name='Aggett', gender='Male', language='Chinese')
Data(ssn='105-85-7486', first_name='Angelina', last_name='McAvey', gender='Female', language='Punjabi')
Data(ssn='105-91-5022', first_name='Moselle', last_name='Apfel', gender='Female', language='Latvian')
Data(ssn='105-91-7777', first_name='Audi', last_name='Roach', gender='Female', language='Estonian')
Data(ssn='106-35-1938', first_name='Mackenzie', last_name='Nussey', gen