#### Challenge 1   
  
- Open up a new IPython notebook
- Download a few MTA turnstile data files
- Open up a file, use csv reader to read it, make a python dict where
  there is a key for each (C/A, UNIT, SCP, STATION). These are the
  first four columns. The value for this key should be a list of
  lists. Each list in the list is the rest of the columns in a
  row. For example, one key-value pair should look like


{    ('A002','R051','02-00-00','LEXINGTON AVE'):
[
['NQR456', 'BMT', '01/03/2015', '03:00:00', 'REGULAR', '0004945474', '0001675324'],
['NQR456', 'BMT', '01/03/2015', '07:00:00', 'REGULAR', '0004945478', '0001675333'],
['NQR456', 'BMT', '01/03/2015', '11:00:00', 'REGULAR', '0004945515', '0001675364'],
...
]
}

In [125]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import collections
%matplotlib inline

In [126]:
df = pd.read_csv('turnstile_180922.txt')
df.rename(index=str, columns={df.columns[-1]: "EXITS"}, inplace=True)
print(df.columns)

dt = df.DATE + ' '+ df.TIME

df['DATETIME'] = pd.to_datetime(dt, infer_datetime_format=True)
df['KEY'] = df.STATION + ' ' + df['C/A']+ ' '+df.UNIT+ ' '+df.SCP

mydict = collections.defaultdict(list)

#(Index='0', _1='A002', UNIT='R051', SCP='02-00-00', STATION='59 ST', LINENAME='NQR456W', DIVISION='BMT', DATE='09/15/2018', TIME='00:00:00', DESC='REGULAR', ENTRIES=6759219, EXITS=2291425, DATETIME=Timestamp('2018-09-15 00:00:00'), KEY='59 ST A002 R051 02-00-00')

for row in df.itertuples():
    key = (row[1], row[2], row[3], row[4])
    value = [row[5], row[6], row[9], row[12], row[10], row[11]]
    mydict[key].append(value)

Index(['C/A', 'UNIT', 'SCP', 'STATION', 'LINENAME', 'DIVISION', 'DATE', 'TIME',
       'DESC', 'ENTRIES', 'EXITS'],
      dtype='object')


In [127]:
mydict = collections.defaultdict(list)

#(Index='0', _1='A002', UNIT='R051', SCP='02-00-00', STATION='59 ST', LINENAME='NQR456W', DIVISION='BMT', DATE='09/15/2018', TIME='00:00:00', DESC='REGULAR', ENTRIES=6759219, EXITS=2291425, DATETIME=Timestamp('2018-09-15 00:00:00'), KEY='59 ST A002 R051 02-00-00')

for row in df.itertuples():
    key = (row[1], row[2], row[3], row[4])
    value = [row[5], row[6], row[9], row[12], row[10], row[11]]
    mydict[key].append(value)
    
mydict[('A002', 'R051', '02-00-00', '59 ST')][0]

['NQR456W',
 'BMT',
 'REGULAR',
 Timestamp('2018-09-15 00:00:00'),
 6759219,
 2291425]

#### Challenge 2

- Let's turn this into a time series.

 For each key (basically the control area, unit, device address and
 station of a specific turnstile), have a list again, but let the list
 be comprised of just the point in time and the count of entries.

This basically means keeping only the date, time, and entries fields
in each list. You can convert the date and time into datetime objects
-- That is a python class that represents a point in time. You can
combine the date and time fields into a string and use the
[dateutil](https://labix.org/python-dateutil) module to convert it
into a datetime object. For an example check
[this StackOverflow question](http://stackoverflow.com/questions/23385003/attributeerror-when-using-import-dateutil-and-dateutil-parser-parse-but-no).

Your new dict should look something like

{    ('A002','R051','02-00-00','LEXINGTON AVE'):
[
[datetime.datetime(2013, 3, 2, 3, 0), 3788],
[datetime.datetime(2013, 3, 2, 7, 0), 2585],
[datetime.datetime(2013, 3, 2, 12, 0), 10653],
[datetime.datetime(2013, 3, 2, 17, 0), 11016],
[datetime.datetime(2013, 3, 2, 23, 0), 10666],
[datetime.datetime(2013, 3, 3, 3, 0), 10814],
[datetime.datetime(2013, 3, 3, 7, 0), 10229],
...
],
....
}

In [124]:
challenge2 = collections.defaultdict(list)

for key in mydict:
    entries = mydict[key]
    for entry in entries:
        value = [entry[3]-lastentry, entry[4]]
        lastentry = entry[3]
        challenge2[key].append(value)
print(challenge2[('A002', 'R051', '02-00-00', '59 ST')][0:5])

[[Timestamp('2018-09-15 00:00:00'), 6759219], [Timestamp('2018-09-15 04:00:00'), 6759234], [Timestamp('2018-09-15 08:00:00'), 6759251], [Timestamp('2018-09-15 12:00:00'), 6759330], [Timestamp('2018-09-15 16:00:00'), 6759538]]


#### Challenge 3

- These counts are for every n hours. (What is n?) We want total daily
  entries.

Now make it that we again have the same keys, but now we have a single
value for a single day, which is the total number of passengers that
entered through this turnstile on this day.
