## US Birth Data Set

[Original article by FiveThirtyEight about Friday the 13th](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/)

The data set contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics

### Assignment

- Open the CSV and split based on new lines
- Preview the first 10 entries

In [1]:
with open('us_births.csv') as file:
    data = file.readline().split('\r')

In [2]:
data[:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

### Assignment

- Create a function that takes in a CSV and converts the data into a list of lists
    - Each row will be a list
    - Make sure to convert the values to int
    - Return the final list of lists
- Preview the first 10 entries of the output

In [3]:
def read_csv(csvfile):
    """Opens a csv file and converts the data into a list of lists."""
    data = []
    with open(csvfile) as file:
        lines = file.readline().split('\r')
        header = lines[0].split(',')
        for line in lines[1:]:
            integers = []
            for item in line.split(','):
                integers.append(int(item))
            data.append(integers)
    return header, data

In [4]:
header, data = read_csv('us_births.csv')

In [5]:
data[:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

In [6]:
header

['year', 'month', 'date_of_month', 'day_of_week', 'births']

### Assignment

- Create a function that calculates the number of births each month
    - The function input should be the previous list of lists you created
    - Use a dictionary and increment the values associated with each month key
    - Return the final dictionary
- Preview the output

In [7]:
def births_per_month(births_data):
    """Calculates the number of births for each month"""
    month_births = {}
    months = [line[1] for line in births_data]
    for month in months:
        month_births[month] = sum(line[4] for line in births_data if line[1] == month)
    return month_births

In [8]:
births_per_month(data)

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

### Assignment

- Create a function that calculates the number of births each day of the week

In [9]:
def births_per_day(births_data):
    """Calculates the number of births each day of the week."""
    day_births = {}
    days = [line[3] for line in births_data]
    for day in days:
        day_births[day] = sum(line[4] for line in births_data if line[3] == day)
    return day_births

In [10]:
births_per_day(data)

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

### Assignment

- Create a general function that takes the data list of lists and a column index, it should return a dictionary mapping of the unique keys and summed values

In [11]:
header

['year', 'month', 'date_of_month', 'day_of_week', 'births']

In [16]:
def summary(births_data, columns):
    summed_values = {}
    for column in columns[0:-1]:
        column_sum = {}
        items = [line[columns.index(column)] for line in births_data]
        for item in items:
            column_sum[item] = sum(line[columns.index('births')] for line in births_data if line[columns.index(column)] == item)
        summed_values[column] = column_sum
    return summed_values

In [17]:
summary(data, header)

{'date_of_month': {1: 1276557,
  2: 1288739,
  3: 1304499,
  4: 1288154,
  5: 1299953,
  6: 1304474,
  7: 1310459,
  8: 1312297,
  9: 1303292,
  10: 1320764,
  11: 1314361,
  12: 1318437,
  13: 1277684,
  14: 1320153,
  15: 1319171,
  16: 1315192,
  17: 1324953,
  18: 1326855,
  19: 1318727,
  20: 1324821,
  21: 1322897,
  22: 1317381,
  23: 1293290,
  24: 1288083,
  25: 1272116,
  26: 1284796,
  27: 1294395,
  28: 1307685,
  29: 1223161,
  30: 1202095,
  31: 746696},
 'day_of_week': {1: 5789166,
  2: 6446196,
  3: 6322855,
  4: 6288429,
  5: 6233657,
  6: 4562111,
  7: 4079723},
 'month': {1: 3232517,
  2: 3018140,
  3: 3322069,
  4: 3185314,
  5: 3350907,
  6: 3296530,
  7: 3498783,
  8: 3525858,
  9: 3439698,
  10: 3378814,
  11: 3171647,
  12: 3301860},
 'year': {1994: 3952767,
  1995: 3899589,
  1996: 3891494,
  1997: 3880894,
  1998: 3941553,
  1999: 3959417,
  2000: 4058814,
  2001: 4025933,
  2002: 4021726,
  2003: 4089950}}