# Exploration of US Births
This project will explore the births in the United States from 1994-2003. We start by opening the relevant file and converting it into a format more readily manipulated by Python. 

In [3]:
def read_csv(file_name):
    f = open(file_name,'r')
    file_raw = f.read()
    file_split = file_raw.split('\n')
    string_list = file_split[1:]
    final_list = []
    for item in string_list:
        int_fields = []
        string_fields = item.split(',')
        for string in string_fields:
            int_fields.append(int(string))
        final_list.append(int_fields)
    return final_list

In [6]:
cdc_list = read_csv('US_births_1994-2003_CDC_NCHS.csv')
cdc_list[:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

Note that each list in the dataset contains, in order, 
- 'year' - Year
- 'month' - Month
- 'day_of_month' - Day of the Month
- 'day_of_week' - Day of the Week, where 1 is Monday and 7 is Sunday
- 'births' - The number of births for this time period

As a first analysis, we will determine how many children were born in each month during this time period. 

In [13]:
def calc_counts(input_list, column):
    births_per = {}
    for row in input_list:
        data = row[column]
        births = row[4]
        if data in births_per:
            births_per[data] += births
        else:
            births_per[data] = births
    return births_per

In [14]:
cdc_month_births = calc_counts(cdc_list,1)
cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

As you can see, each month has a comparable number of births throughout the years, somwhere in the range of 3-3.6 million. Predictably, the least number of births occured in February (as it has the least number of days). 

Next, we will analyse the number of births for each day of the week. 

In [15]:
cdc_day_births = calc_counts(cdc_list,3)
cdc_day_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

Here, the numbers vary quite a bit more than they did by month. Notably, it seems that significantly less people give birth on weekends than weekdays. 

Next, the births per year:

In [16]:
cdc_year_births = calc_counts(cdc_list,0)
cdc_year_births

{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4058814,
 2001: 4025933,
 2002: 4021726,
 2003: 4089950}

Here, expectedly, there is a trend for more people to be born each year, likely due to overall population growth. 

Finally, the births per the day of the month: 

In [19]:
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dom_births

{1: 1276557,
 2: 1288739,
 3: 1304499,
 4: 1288154,
 5: 1299953,
 6: 1304474,
 7: 1310459,
 8: 1312297,
 9: 1303292,
 10: 1320764,
 11: 1314361,
 12: 1318437,
 13: 1277684,
 14: 1320153,
 15: 1319171,
 16: 1315192,
 17: 1324953,
 18: 1326855,
 19: 1318727,
 20: 1324821,
 21: 1322897,
 22: 1317381,
 23: 1293290,
 24: 1288083,
 25: 1272116,
 26: 1284796,
 27: 1294395,
 28: 1307685,
 29: 1223161,
 30: 1202095,
 31: 746696}

Again, predictably the only real trend is that there are less births for the last few days of the month, which makes sense because there are less months with that day in them. 