### Project Solution - Goal 1

First we should look at what's in the file itself. Just a few records should be enough. (You can also "cheat" and look in Excel - but this works because the file is relatively small).

In [1]:
file_name = 'nyc_parking_tickets_extract.csv'

In [2]:
with open(file_name) as f:
    for _ in range(10):
        print(next(f))

Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description

4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION

4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION

4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION

4006265037,FZX9232,NY,PAS,8/23/2016,5,SUBN,FORD,BUS LANE VIOLATION

4006535600,N203399C,NY,OMT,10/19/2016,5,SUBN,FORD,BUS LANE VIOLATION

4007156700,92163MG,NY,COM,4/13/2017,5,VAN,FRUEH,BUS LANE VIOLATION

4006687989,MIQ600,SC,PAS,11/21/2016,5,VN,HONDA,BUS LANE VIOLATION

4006943052,2AE3984,MD,PAS,2/1/2017,5,SW,LINCO,BUS LANE VIOLATION

4007306795,HLG4926,NY,PAS,5/30/2017,5,SUBN,TOYOT,BUS LANE VIOLATION



So we should notice that we have these `\n` line terminators in the file - we'll need to strip those out.

Secondly we see that the first row of the file are the column headers - we'll need to skip that line when we want to look at just the data.

We should also not make the assumption that the data is entirely clean - we probably have missing values and will need to deal with that accordingly.

We also will need to determine an appropriatre data type for every column in the data set.

#### Column Definitions and Named Tuple

Let's start with the column definitions, data types and named tuple.

In [3]:
with open(file_name) as f:
    column_headers = next(f).strip('\n').split(',')
    sample_data = next(f).strip('\n').split(',')

In [4]:
column_headers

['Summons Number',
 'Plate ID',
 'Registration State',
 'Plate Type',
 'Issue Date',
 'Violation Code',
 'Vehicle Body Type',
 'Vehicle Make',
 'Violation Description']

In [5]:
sample_data

['4006478550',
 'VAD7274',
 'VA',
 'PAS',
 '10/5/2016',
 '5',
 '4D',
 'BMW',
 'BUS LANE VIOLATION']

In [6]:
list(zip(column_headers, sample_data))

[('Summons Number', '4006478550'),
 ('Plate ID', 'VAD7274'),
 ('Registration State', 'VA'),
 ('Plate Type', 'PAS'),
 ('Issue Date', '10/5/2016'),
 ('Violation Code', '5'),
 ('Vehicle Body Type', '4D'),
 ('Vehicle Make', 'BMW'),
 ('Violation Description', 'BUS LANE VIOLATION')]

Let's start by creating a tuple that contains the names of the columns:

In [7]:
column_names = [header.replace(' ', '_').lower() 
                for header in column_headers]

In [8]:
column_names

['summons_number',
 'plate_id',
 'registration_state',
 'plate_type',
 'issue_date',
 'violation_code',
 'vehicle_body_type',
 'vehicle_make',
 'violation_description']

Next we need to determine the data types for each of these fields:

    0. summons_number: looks like integers
    1. plate_id: string
    2: registration_state: string
    3: plate_type: string
    4: issue_date: looks like valid dates
    5: violation_code: looks like integers
    6: vehicle_body_type: string
    7: vehicle_make: string
    8: violation_description: string


We'll create utility functions to cast the data (which will always be strings) into the appropriate data type for each field.

We have to be careful though, we may have issues with data integrity and our assumptions about the data type.

What we'll do as a first pass is to keep track of the rows where the data was not an integer or date when we expected it (or missing).

Let's create our named tuple data structure:

In [9]:
from collections import namedtuple

Ticket = namedtuple('Ticket', column_names)

#### Reading and Cleaning a data row

In [10]:
with open(file_name) as f:
    next(f)
    raw_data_row = next(f)

In [11]:
raw_data_row

'4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION\n'

You'll notice that to read the data in the file, we have to skip the first row in the file. Also, I have to use a `with` statement and the file name every time. To make life easier, I'm going to write a small utility function that will yield just the data rows from the file:

In [12]:
def read_data():
    with open(file_name) as f:
        next(f)
        yield from f

We can test it out easily:

In [13]:
raw_data = read_data()
for _ in range(5):
    print(next(raw_data))

4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION

4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION

4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION

4006265037,FZX9232,NY,PAS,8/23/2016,5,SUBN,FORD,BUS LANE VIOLATION

4006535600,N203399C,NY,OMT,10/19/2016,5,SUBN,FORD,BUS LANE VIOLATION



Let's write a function that will try to convert a value to an integer, or return some default if the value is missing or not an integer:

In [14]:
def parse_int(value, *, default=None):
    try:
        return int(value)
    except ValueError:
        return default

We need to do the same thing with dates.
It looks like the dates are provided in M/D/YYYY format, so we'll use that to parse the date. 

We'll use the `strptime` function available in the `datetime` package.

In [15]:
from datetime import datetime
def parse_date(value, *, default=None):
    date_format='%m/%d/%Y'
    try:
        return datetime.strptime(value, date_format).date()
    except ValueError:
        return default

Let's make sure those functions work as expected:

In [16]:
parse_int('123')

123

In [17]:
parse_int('hello', default='N/A')

'N/A'

In [18]:
parse_date('3/28/2018')

datetime.date(2018, 3, 28)

In [19]:
parse_date('31/31/2000', default='N/A')

'N/A'

OK, so these seem to work as expected.

We also need to write a string parser - we want to remove any potential leading and trailing spaces.

In [20]:
def parse_string(value, *, default=None):
    try:
        cleaned = str(value).strip()
        if not cleaned:
            # empty string
            return default
        else:
            return cleaned
    except ValueError:
        return default

Let's test this one as well:

In [21]:
parse_string('   hello   ')

'hello'

In [22]:
parse_string('  ', default='N/A')

'N/A'

Now that we have our utility functions, we can write our row parser.

To make life easier, I'm going to create a tuple that contains the functions that should be called to clean up each field. The tuple positions will correspond to the fields in the data row.

I'm also going to specify what the default value should be when there is a problem parsing the fields. To do this, I will use `partials`, because I still need a callable for each element of the column parser tuple. (Note that I could just as easily use a lambda as well instead of partials).

In [23]:
from functools import partial

In [24]:
column_names

['summons_number',
 'plate_id',
 'registration_state',
 'plate_type',
 'issue_date',
 'violation_code',
 'vehicle_body_type',
 'vehicle_make',
 'violation_description']

In [25]:
column_parsers = (parse_int,  # summons_number, default is None
                  parse_string,  # plate_id, default is None
                  partial(parse_string, default=''),  # state
                  partial(parse_string, default=''),  # plate_type
                  parse_date,  # issue_date, default is None
                  parse_int,  # violation_code
                  partial(parse_string, default=''),  # body type
                  parse_string,  # make, default is None
                  lambda x: parse_string(x, default='')  # description
                 )

To parse each field in a row, I'll first separate the data fields into a list of values, then I'll apply the functions in `column_parsers` to the data in that list. 

To do that, I'm going to zip up the parser functions and the data, and use a comprehension to apply each function to its corresponding data field:

In [26]:
def parse_row(row):
    fields = row.strip('\n').split(',')
    parsed_data = (func(field) 
                   for func, field in zip(column_parsers, fields))
    return parsed_data

This is not quite what we want yet, but let's test it out and make sure it does what we expect:

In [27]:
rows = read_data()
for _ in range(5):
    row = next(rows)
    parsed_data = parse_row(row)
    print(list(parsed_data))

[4006478550, 'VAD7274', 'VA', 'PAS', datetime.date(2016, 10, 5), 5, '4D', 'BMW', 'BUS LANE VIOLATION']
[4006462396, '22834JK', 'NY', 'COM', datetime.date(2016, 9, 30), 5, 'VAN', 'CHEVR', 'BUS LANE VIOLATION']
[4007117810, '21791MG', 'NY', 'COM', datetime.date(2017, 4, 10), 5, 'VAN', 'DODGE', 'BUS LANE VIOLATION']
[4006265037, 'FZX9232', 'NY', 'PAS', datetime.date(2016, 8, 23), 5, 'SUBN', 'FORD', 'BUS LANE VIOLATION']
[4006535600, 'N203399C', 'NY', 'OMT', datetime.date(2016, 10, 19), 5, 'SUBN', 'FORD', 'BUS LANE VIOLATION']


Let's finish up the row parser.

First I want it to return a named tuple instead of a plain iterator.

Also, the way I have set up the parsers, I only want to look at data where none of the fields are `None` - that's why I had some fields default to an empty string instead of `None` - those are the ones I still want to retain, even if they are empty.

To do this efficiently, I'm going to use `all`

Let's just quickly recall how `all` works:

In [28]:
all([10, 'hello'])

True

In [29]:
all([None, 'hello'])

False

But we have to watch out, since we are allowing empty strings in our valid data, we cannot simply use `all`:

In [30]:
all([10, ''])

False

That's because empty strings are falsy. So, we need to tweak this slightly.

I'll use a generator expression for this:

In [31]:
l = [10, '', 0]
all(item is not None for item in l)

True

In [32]:
l = [10, '', 0, None]
all(item is not None for item in l)

False

So, now let's finish up our row parser. We'll return a Ticket named tuple if none of the parsed fields are `None`, and we'll allow the user to specify a default otherwise.

In [33]:
def parse_row(row, *, default=None):
    fields = row.strip('\n').split(',')
    # note that I'm using a list comprehension here, 
    # since we'll need to iterate through the entire parsed fields
    # twice - one time to check if nothing is None
    # and another time to create the named tuple
    parsed_data = [func(field) 
                   for func, field in zip(column_parsers, fields)]
    if all(item is not None for item in parsed_data):
        print(*parsed_data)
        return Ticket(*parsed_data)
    else:
        return default

Now let's test it out again:

In [34]:
rows = read_data()
for _ in range(5):
    row = next(rows)
    parsed_data = parse_row(row)
    print(parsed_data)

4006478550 VAD7274 VA PAS 2016-10-05 5 4D BMW BUS LANE VIOLATION
Ticket(summons_number=4006478550, plate_id='VAD7274', registration_state='VA', plate_type='PAS', issue_date=datetime.date(2016, 10, 5), violation_code=5, vehicle_body_type='4D', vehicle_make='BMW', violation_description='BUS LANE VIOLATION')
4006462396 22834JK NY COM 2016-09-30 5 VAN CHEVR BUS LANE VIOLATION
Ticket(summons_number=4006462396, plate_id='22834JK', registration_state='NY', plate_type='COM', issue_date=datetime.date(2016, 9, 30), violation_code=5, vehicle_body_type='VAN', vehicle_make='CHEVR', violation_description='BUS LANE VIOLATION')
4007117810 21791MG NY COM 2017-04-10 5 VAN DODGE BUS LANE VIOLATION
Ticket(summons_number=4007117810, plate_id='21791MG', registration_state='NY', plate_type='COM', issue_date=datetime.date(2017, 4, 10), violation_code=5, vehicle_body_type='VAN', vehicle_make='DODGE', violation_description='BUS LANE VIOLATION')
4006265037 FZX9232 NY PAS 2016-08-23 5 SUBN FORD BUS LANE VIOLATION

#### Checking What Rows are Missing Required Values

Let's quickly run through the file and see what data issues we might have - maybe our assumptions were incorrect about the various data types.

In [35]:
for row in read_data():
    parsed_row = parse_row(row)
    if parsed_row is None:
        print(list(zip(column_names, row.strip('\n').split(','))), end='\n\n')

4006478550 VAD7274 VA PAS 2016-10-05 5 4D BMW BUS LANE VIOLATION
4006462396 22834JK NY COM 2016-09-30 5 VAN CHEVR BUS LANE VIOLATION
4007117810 21791MG NY COM 2017-04-10 5 VAN DODGE BUS LANE VIOLATION
4006265037 FZX9232 NY PAS 2016-08-23 5 SUBN FORD BUS LANE VIOLATION
4006535600 N203399C NY OMT 2016-10-19 5 SUBN FORD BUS LANE VIOLATION
4007156700 92163MG NY COM 2017-04-13 5 VAN FRUEH BUS LANE VIOLATION
4006687989 MIQ600 SC PAS 2016-11-21 5 VN HONDA BUS LANE VIOLATION
4006943052 2AE3984 MD PAS 2017-02-01 5 SW LINCO BUS LANE VIOLATION
4007306795 HLG4926 NY PAS 2017-05-30 5 SUBN TOYOT BUS LANE VIOLATION
4007124590 T715907C NY OMT 2017-04-03 5 SUBN TOYOT BUS LANE VIOLATION
5096061966 HRC9475 NY PAS 2017-04-18 7 SUBN CADIL FAILURE TO STOP AT RED LIGHT
5094070400 DYP8042 NY PAS 2016-10-26 7 SUBN CHEVR FAILURE TO STOP AT RED LIGHT
5094906770 G30ESY NJ PAS 2017-01-01 7 WAGO CHRYS FAILURE TO STOP AT RED LIGHT
5093319363 GGT8868 NY PAS 2016-09-06 7 SUBN CHRYS FAILURE TO STOP AT RED LIGHT
5092638

8464532088 HJR5750 NY PAS 2017-02-03 14 SUBN KIA 14-No Standing
8488825948 HAU1278 NY PAS 2017-03-31 14 SUBN LEXUS 14-No Standing
8559362496 HLW7798 NY PAS 2017-06-13 14 4DSD LEXUS 14-No Standing
8478353860 VUE95C NJ PAS 2016-12-06 14 4DSD LEXUS 14-No Standing
8518911631 ZD53LE NJ PAS 2017-05-17 14 SUBN LEXUS 14-No Standing
1416091415 75213MH NY COM 2016-11-28 14 VAN ME/BE 
8482374059 F96FBF NJ PAS 2017-04-23 14 4DSD ME/BE 14-No Standing
8544960947 HMX8950 NY PAS 2017-05-27 14 4DSD ME/BE 14-No Standing
1418609274 8P82H NY OMT 2016-12-21 14 TAXI NISSA 
1417565895 HEB8184 NY PAS 2017-02-11 14 SDN NISSA 
8155550278 W56GSE NJ PAS 2016-10-22 14 4DSD NISSA 14-No Standing
7433187960 2330830 ME PAS 2017-03-06 14 TRLR NS/OT 14-No Standing
7645052715 FRZ3573 NY PAS 2017-01-07 14 SUBN NS/OT 14-No Standing
7922559173 XCYU94 NJ PAS 2016-10-24 14 VAN NS/OT 14-No Standing
7767415582 XW915N NJ PAS 2017-03-29 14 DELV NS/OT 14-No Standing
8524350581 HGR5953 NY PAS 2017-02-27 14 4DSD OLDSM 14-No Standing

8236569895 EPY9505 NY PAS 2017-03-22 21 4DSD BUICK 21-No Parking (street clean)
8502707383 JGL6885 PA PAS 2017-01-24 21 SUBN BUICK 21-No Parking (street clean)
8511711946 42283JZ NY COM 2017-04-27 21 VAN CHEVR 21-No Parking (street clean)
1418142980 483TFM TN PAS 2016-12-19 21 VAN CHEVR 
1420029915 53468JZ 99 COM 2017-04-27 21 VAN CHEVR 
8539666996 85121ME NY COM 2017-05-22 21 VAN CHEVR 21-No Parking (street clean)
1413709760 DGN6881 NY PAS 2016-09-02 21 SDN CHEVR 
8473311693 DTG6286 NY PAS 2016-11-23 21 SUBN CHEVR 21-No Parking (street clean)
7658191050 GEL4496 NY PAS 2016-07-22 21 SUBN CHEVR 21-No Parking (street clean)
8478626049 GRU5176 NY PAS 2017-05-15 21 4DSD CHEVR 21-No Parking (street clean)
7369979570 GTC5499 NY PAS 2016-07-20 21 4DSD CHEVR 21-No Parking (street clean)
1400876217 GVB9839 NY PAS 2016-09-19 21 P-U CHEVR 
1422284335 HHJ5747 NY PAS 2017-06-03 21 SUBN CHEVR 
8556051480 M322307 NJ PAS 2017-04-26 21 PICK CHEVR 21-No Parking (street clean)
8514570973 T738567C NY OMT 

8566704320 L62FLC NJ PAS 2017-05-17 24 SUBN CHEVR 24-No Parking (exc auth veh)
8536961156 X92FSM NJ PAS 2017-03-28 24 SUBN HYUND 24-No Parking (exc auth veh)
1408122029 DSN6323 NY PAS 2016-07-04 24 SUBN JEEP 
8266568753 F12GRE NJ PAS 2016-11-11 24 SUBN JEEP 24-No Parking (exc auth veh)
8446524806 GYG8911 NY PAS 2016-12-18 24 SUBN JEEP 24-No Parking (exc auth veh)
8316519355 HDJ7785 PA PAS 2016-07-14 24 2DSD NISSA 24-No Parking (exc auth veh)
8533035640 T687600C NY OMT 2017-05-31 24 4DSD NISSA 24-No Parking (exc auth veh)
8526732663 ASV2478 NY PAS 2017-06-22 24 4DSD TOYOT 24-No Parking (exc auth veh)
1410027223 ENS9253 NY PAS 2016-10-31 24 SDN VOLKS 
8479824712 58910MG NY COM 2017-04-21 26 DELV ISUZU 26-No Stnd (for-hire veh only)
1416527722 AHG9422 NY PAS 2017-01-05 27 SUBN TOYOT 
8512270482 Y26CRJ NJ PAS 2017-06-03 31 SUBN BSA 31-No Stand (Com. Mtr. Zone)
8498706890 THANEDAR NY OMT 2017-03-22 31 SUBN CADIL 31-No Stand (Com. Mtr. Zone)
8328546632 HHP7446 NY PAS 2016-12-02 31 2DSD CHEVR

4622727298 GWF1975 NY PAS 2016-08-04 36 4DSD LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4622272659 HEA8485 NY OMS 2016-07-14 36 4DSD LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4631639052 HEK6758 NY PAS 2017-03-09 36 SUBN LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4629635506 HHA6371 NY PAS 2017-01-18 36 SUBN LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4633257055 HJN6853 NY PAS 2017-05-02 36 SUBN LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4629197515 N78FLU NJ PAS 2017-01-09 36 WAGO LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4633226356 PTJ969 NY PAS 2017-05-01 36 SUBN LEXUS PHTO SCHOOL ZN SPEED VIOLATION
4626735174 HGR5863 NY PAS 2016-11-14 36 4DSD LINCO PHTO SCHOOL ZN SPEED VIOLATION
4622247501 T635722C NY OMT 2016-07-13 36 4DSD LINCO PHTO SCHOOL ZN SPEED VIOLATION
4626383063 T701050C NY OMT 2016-11-04 36 4DSD LINCO PHTO SCHOOL ZN SPEED VIOLATION
4631263763 92814 NY MED 2017-03-03 36 4DSD ME/BE PHTO SCHOOL ZN SPEED VIOLATION
4626546109 39999MB NY COM 2016-11-07 36 VAN ME/BE PHTO SCHOOL ZN SPEED VIOLATION
4626838832 EVV1706 

8478953528 JNB5044 PA PAS 2016-12-17 38 SUBN HYUND 38-Failure to Display Muni Rec
8471611235 T13AZX NJ PAS 2016-10-10 38 4DSD HYUND 38-Failure to Display Muni Rec
7015856128 GMU5413 NY PAS 2016-09-06 38 4DSD INFIN 38-Failure to Display Muni Rec
8532505170 HAZ7926 NY PAS 2017-04-15 38 4DSD INFIN 38-Failure to Display Muni Rec
8156045178 HLF3741 NY PAS 2016-12-21 38 4DSD INFIN 38-Failure to Display Muni Rec
8477154788 AP656S NJ PAS 2017-02-07 38 REFG INTER 38-Failure to Display Muni Rec
7992738184 63162MD NY COM 2016-10-24 38 VAN ISUZU 38-Failure to Display Muni Rec
8488913308 GYV3087 NY PAS 2017-03-13 38 SUBN JEEP 38-Failure to Display Muni Rec
8303532224 HAB4875 NY PAS 2017-02-01 38 SUBN JEEP 38-Failure to Display Muni Rec
7172330286 MHM933 SC PAS 2017-01-18 38 SUBN KIA 38-Failure to Display Muni Rec
8279018967 V41GCR NJ PAS 2016-08-13 38 SUBN KIA 38-Failure to Display Muni Rec
8530104183 EMK4874 NY PAS 2017-03-21 38 4DSD LEXUS 38-Failure to Display Muni Rec
8228552543 GEA2261 NY PAS 2

1387522700 71017JM NY COM 2016-08-25 46 VAN CHEVR 
8008621527 HFP7192 NY PAS 2016-10-25 46 VAN CHEVR 46A-Double Parking (Non-COM)
1413687738 25570MC NY COM 2016-09-06 46 DELV DODGE 
8513917291 33147MK NY COM 2017-06-27 46 VAN DODGE 46A-Double Parking (Non-COM)
8505564479 99828MC NY COM 2017-03-07 46 VAN DODGE 46B-Double Parking (Com-100Ft)
8513554716 XWD5801 VA PAS 2017-03-03 46 SUBN DODGE 46A-Double Parking (Non-COM)
1420122800 AB80443 CT PAS 2017-03-13 46 SDN FIAT 
8472012591 48768JZ NY COM 2016-11-09 46 VAN FORD 46B-Double Parking (Com-100Ft)
1401381406 EUM7025 NY PAS 2016-11-14 46 SUBN FORD 
1417599716 12203MG NY COM 2016-12-01 46 VAN FRUEH 
8019072664 12211MG NY COM 2016-11-11 46 VAN FRUEH 46B-Double Parking (Com-100Ft)
8429513220 12246MG NY COM 2016-10-04 46 VAN FRUEH 46B-Double Parking (Com-100Ft)
8470521172 28672MH NY COM 2016-11-21 46 VAN FRUEH 46B-Double Parking (Com-100Ft)
1414783784 30182JF NY COM 2016-11-25 46 VAN FRUEH 
8267055393 76253JY NY COM 2016-10-10 46 DELV FRUEH 4

8397514636 FYW3850 NY PAS 2016-10-22 74 SUBN FORD 74A-Improperly Displayed Plate
8528912840 HNY5206 NY PAS 2017-04-21 74 SUBN GMC 74A-Improperly Displayed Plate
8511501113 FBP6836 NY PAS 2017-02-16 74 SUBN HONDA 74-Missing Display Plate
8276551890 HJR4081 NY PAS 2017-01-07 74 SUBN HONDA 74A-Improperly Displayed Plate
8214550531 HJY2207 NY PAS 2017-03-09 74 4DSD HYUND 74-Missing Display Plate
8487469012 27527ME NY COM 2017-01-27 74 DELV INTER 74-Missing Display Plate
8034780236 HCD3158 NY PAS 2016-09-02 74 SUBN KIA 74A-Improperly Displayed Plate
8545052455 HMK1149 NY PAS 2017-06-07 74 4DSD ME/BE 74-Missing Display Plate
1411263467 BLANKPLATE 99 999 2017-02-13 74 SDN NISSA 
8540001550 HLA4803 NY PAS 2017-03-23 74 4DSD SAAB 74-Missing Display Plate
8556155431 HFB9919 NY PAS 2017-05-26 75 4DSD DODGE 75-No Match-Plate/Reg. Sticker
8394016790 31690BB NY OMR 2016-09-06 77 BUS AM/T 77-Parked Bus (exc desig area)
1419472306 56090BA NY OMR 2017-02-20 77 BUS INTER 
8363508688 97256 MA APP 2016-08

OK, so mostly the data is clean. Looks like we have a few rows without descriptions. 
Technically there's a whole lot more validation and cleaning we should do. For example, it looks like the states are not always proper state abbreviations (like 99 in some records, etc). But this is good enough for now.

#### Creating an Iterator for the data

Finally, let's create an iterator to easily iterate over the cleaned up and structured data in the file, skipping `None` rows:

In [36]:
def parsed_data():
    for row in read_data():
        parsed = parse_row(row)
        if parsed:
            yield parsed

Let's test it out by iterating a few times:

In [37]:
parsed_rows = parsed_data()
for _ in range(5):
    print(next(parsed_rows))

4006478550 VAD7274 VA PAS 2016-10-05 5 4D BMW BUS LANE VIOLATION
Ticket(summons_number=4006478550, plate_id='VAD7274', registration_state='VA', plate_type='PAS', issue_date=datetime.date(2016, 10, 5), violation_code=5, vehicle_body_type='4D', vehicle_make='BMW', violation_description='BUS LANE VIOLATION')
4006462396 22834JK NY COM 2016-09-30 5 VAN CHEVR BUS LANE VIOLATION
Ticket(summons_number=4006462396, plate_id='22834JK', registration_state='NY', plate_type='COM', issue_date=datetime.date(2016, 9, 30), violation_code=5, vehicle_body_type='VAN', vehicle_make='CHEVR', violation_description='BUS LANE VIOLATION')
4007117810 21791MG NY COM 2017-04-10 5 VAN DODGE BUS LANE VIOLATION
Ticket(summons_number=4007117810, plate_id='21791MG', registration_state='NY', plate_type='COM', issue_date=datetime.date(2017, 4, 10), violation_code=5, vehicle_body_type='VAN', vehicle_make='DODGE', violation_description='BUS LANE VIOLATION')
4006265037 FZX9232 NY PAS 2016-08-23 5 SUBN FORD BUS LANE VIOLATION