## Birth Dates in the US
The raw data behind the story **Some People Are Too 
Superstitious To Have A Baby On Friday The 13th,**
which you can read [here](https://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/)

The data set has the following structure:
- `year` - Year
- `month` - Month
- `date_of_month` - Day number of the month
- `day_of_week` - Day of week, where 1 is Monday & 7 is Sunday
- `births` - Number of births

In [1]:
file = open("US_births_1994-2003_CDC_NCHS.csv", 'r')
text = file.read()

In [2]:
formattedText = text.split("\n")

In [3]:
formattedText[0:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

In [4]:
# read the csv file given, format it accordingly, cast string fields
# to int, append these to a final list, return the resulting list
def read_csv(input):
    file = open(input, "r")
    string = file.read()
    line = string.split("\n")
    string_list = line[1:len(line)]
    final_list = []
    for item in string_list:
        int_fields = []
        string_fields = item.split(",")
        for field in string_fields:
            int_fields.append(int(field))
        final_list.append(int_fields)
    return final_list

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
cdc_list[0:10]          

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

In [5]:
# return a dictionary that stores monthly total of births
def month_births(list):
    births_per_month = {}
    for item in list:
        month = item[1]
        births = item[4]
        if month in births_per_month:
            births_per_month[month] += births
        else:
            births_per_month[month] = births
    return births_per_month

cdc_month_births = month_births(cdc_list)
cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

In [6]:
# return a dictionary that stores total number of births for each day 
def dow_births(list):
    births_per_day = {}
    for item in list:
        day = item[3]
        births = item[4]
        if day in births_per_day:
            births_per_day[day] += births
        else:
            births_per_day[day] = births
    return births_per_day

cdc_day_births = dow_births(cdc_list)
cdc_day_births    

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

In [7]:
# return a dictionary containing the total number of births
# for each unique value in a specific column
def calc_counts(data,column):
    births_for_column = {}
    for item in data:
        key = item[column]
        births = item[4]
        if key in births_for_column:
            births_for_column[key] += births
        else:
            births_for_column[key] = births
    return births_for_column

cdc_year_births = calc_counts(cdc_list,0)
cdc_month_births = calc_counts(cdc_list,1)
cdc_dom_births = calc_counts(cdc_list,2)
cdc_dow_births = calc_counts(cdc_list,3)  

In [8]:
print("Yearly births")
print(cdc_year_births)

Yearly births
{1994: 3952767, 1995: 3899589, 1996: 3891494, 1997: 3880894, 1998: 3941553, 1999: 3959417, 2000: 4058814, 2001: 4025933, 2002: 4021726, 2003: 4089950}


In [9]:

print("Monthly births")
print(cdc_month_births)

Monthly births
{1: 3232517, 2: 3018140, 3: 3322069, 4: 3185314, 5: 3350907, 6: 3296530, 7: 3498783, 8: 3525858, 9: 3439698, 10: 3378814, 11: 3171647, 12: 3301860}


In [10]:
print("Births for each day of month")
print(cdc_dom_births)

Births for each day of month
{1: 1276557, 2: 1288739, 3: 1304499, 4: 1288154, 5: 1299953, 6: 1304474, 7: 1310459, 8: 1312297, 9: 1303292, 10: 1320764, 11: 1314361, 12: 1318437, 13: 1277684, 14: 1320153, 15: 1319171, 16: 1315192, 17: 1324953, 18: 1326855, 19: 1318727, 20: 1324821, 21: 1322897, 22: 1317381, 23: 1293290, 24: 1288083, 25: 1272116, 26: 1284796, 27: 1294395, 28: 1307685, 29: 1223161, 30: 1202095, 31: 746696}


In [11]:
print("Births for each day of week")
print(cdc_dow_births)

Births for each day of week
{6: 4562111, 7: 4079723, 1: 5789166, 2: 6446196, 3: 6322855, 4: 6288429, 5: 6233657}


In [12]:
# calculate the min & max values for a dictionary given,
# return these in another dictionary
def calc_min_max(input_dict):
    dict = {}
    first_key = list(input_dict.keys())[0]
    min = input_dict[first_key]
    max = input_dict[first_key]
    for key, value in input_dict.items():
        if value < min:
            min = value
            dict["min"] = min
            dict["min_key"] = key
        if value > max:
            max = value
            dict["max"] = max
            dict["max_key"] = key
    return dict

In [13]:
# min & max values for yearly births
min_max_yearly = calc_min_max(cdc_year_births)
min_max_yearly   

{'max': 4089950, 'max_key': 2003, 'min': 3880894, 'min_key': 1997}

In [14]:
# min & max values for monthly births
min_max_monthly = calc_min_max(cdc_month_births)
min_max_monthly

{'max': 3525858, 'max_key': 8, 'min': 3018140, 'min_key': 2}

In [15]:
# min & max values for births on each day day of month
min_max_dom = calc_min_max(cdc_dom_births)
min_max_dom

{'max': 1326855, 'max_key': 18, 'min': 746696, 'min_key': 31}

In [16]:
# min & max values for births on each day of week
min_max_dow = calc_min_max(cdc_day_births)
min_max_dow

{'max': 6446196, 'max_key': 2, 'min': 4079723, 'min_key': 7}