## Birth Dates In The United States,

The raw data behind the story **Some People Are Too Superstitious To Have A Baby On Friday The 13th**, which you can read [here](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).

We'll be working with the data set from the Centers for Disease Control and Prevention's National National Center for Health Statistics. The data set has the following structure: 
- `year` - Year
- `month` - Month
- `date_of_month` - Day number of the month
- `day_of_week` - Day of week, where 1 is Monday and 7 is Sunday
- `births` - Number of births

## Converting Data Into A List Of Lists

In [11]:
#!/usr/bin/python3
# open file, read data, and split data into a list
f = open("US_births_1994-2003_CDC_NCHS.csv", "r")
data = f.read()
rows = data.split("\n")
rows[0:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

In [12]:
# readh csv file and return a list with header removed 
def read_csv(file_name):
    f = open(file_name, "r")
    data = f.read()
    rows = data.split("\n")
    string_list = rows[1:]
    
    final_list = []
    for item in string_list:
        int_fields = []
        string_fields = item.split(",")
        for string in string_fields:
            int_fields.append(int(string))
        final_list.append(int_fields)
        
    return final_list

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
cdc_list[0:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

## Calculating Number Of Births Each Month

In [13]:
# returns a dict of total births for each month
def month_births(input_list):
    births_per_month = dict()
    
    for item in input_list:
        
        month = item[1]
        births = item[4]
        
        if month in births_per_month:
            births_per_month[month] = births_per_month[month] + births
        else:
            births_per_month[month] = births
        
    return births_per_month


cdc_month_births = month_births(cdc_list)
cdc_month_births
        

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

## Calculating Number Of Births Each Day Of Week¶

In [14]:
# returns a dict of total births for day of week 
def dow_births(input_list):
    births_per_day = dict()
    
    for item in input_list:
        
        day = item[3]
        births = item[4]
        
        if day in births_per_day:
            births_per_day[day] = births_per_day[day] + births
        else:
            births_per_day[day] = births
        
    return births_per_day


cdc_day_births = dow_births(cdc_list)
cdc_day_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Creating A More General Function

In [15]:
# returns a dict of birth counts for any column index passed in
def calc_counts(input_list, column):
    births_dict = dict()
    
    for item in input_list:
        
        key = item[3]
        births = item[4]
        
        if key in births_dict:
            births_dict[key] = births_dict[key] + births
        else:
            births_dict[key] = births
        
    return births_dict

cdc_year_births = calc_counts(cdc_list, 0)
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births = calc_counts(cdc_list, 3)

In [16]:
cdc_year_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

In [17]:
cdc_month_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

In [18]:
cdc_dom_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

In [19]:
cdc_dow_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

In [20]:
# returns the max value in dictionary for any dict 
def max_births_at_column(input_dict, minValue = True):
    if minValue:
        return min(input_dict.values())
    else:
        return max(input_dict.values())
    
min_value_cdc_year_births = max_births_at_column(cdc_year_births)
min_value_cdc_year_births

4079723

In [21]:
# Exploratory:  how did the number of births on Saturday change each year between 1994 and 2003?
# Saturday is 6th day of the week
def year_day_births(input_list, day):
    
    year_dict = dict()
    
    for item in input_list:
        
        year = item[0]
        births = item[4]
        
        if item[3] == day:
            
            if year in year_dict:
                year_dict[year] = year_dict[year] + births
            else:
                year_dict[year] = births

    return year_dict

saturday_births_by_year = year_day_births(cdc_list, 7)
# number of births on saturday decreases over thea u
saturday_births_by_year 

{1994: 428752,
 1995: 425790,
 1996: 413336,
 1997: 404478,
 1998: 407129,
 1999: 401991,
 2000: 416454,
 2001: 397119,
 2002: 391375,
 2003: 393299}