# Explore U.S. Birth Data

This Project works with a [dataset](https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv) on births in the U.S. In this project, we are using the dataset compiled by FiveThirtyEight.

The dataset contains the following columns:
- `year` -- Year(1994 to 2003).
- `month` -- Month(1 to 12).
- `date_of_month` -- Day number of the month (1 to 31).
- `day_of_week` -- Day of week (1 to 7).
- `births` -- Number of births that day.

### First, open the file and convert it to a list. 

In [1]:
f = open("US_births_1994-2003_CDC_NCHS.csv", "r")
file = f.read().split("\n")
file[0:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

### Or create a function that can convert a dataset into a list of lists where each nested list contains integer values.

In [5]:
def read_csv(csv):
    f = open(csv,"r")
    file = f.read().split("\n")
    string_list = file[1:len(file)-1]
    final_list = []
    for sl in string_list:
        int_fields = []
        string_fields = sl.split(',')
        for sf in string_fields:
            int_fields.append(int(sf))
            final_list.append(int_fields)
    return final_list

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
cdc_list[0:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 1, 6, 8096],
 [1994, 1, 1, 6, 8096],
 [1994, 1, 1, 6, 8096],
 [1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 2, 7, 7772]]

### Then create a function to calculate the total number of births that occured in each month and each day of week, across all of the years in the dataset.

In [6]:
def month_births(input_list):
    births_per_month = {}
    for ls in input_list:
        month = ls[1]
        births = ls[4]
        if month in births_per_month:
            births_per_month[month] = births_per_month[month] + births
        else:
            births_per_month[month] = births
    return births_per_month

cdc_month_births = month_births(cdc_list)
cdc_month_births

{1: 16162585,
 2: 15090700,
 3: 16610345,
 4: 15926570,
 5: 16754535,
 6: 16482650,
 7: 17493915,
 8: 17629290,
 9: 17198490,
 10: 16894070,
 11: 15858235,
 12: 16447430}

In [7]:
def dow_births(input_list):
    births_per_dow = {}
    for ls in input_list:
        dow = ls[3]
        births = ls[4]
        if dow in births_per_dow:
            births_per_dow[dow] = births_per_dow[dow] + births
        else:
            births_per_dow[dow] = births
    return births_per_dow

cdc_day_births = dow_births(cdc_list)
cdc_day_births

{1: 28945830,
 2: 32230980,
 3: 31552405,
 4: 31442145,
 5: 31168285,
 6: 22810555,
 7: 20398615}

### To make this process easier, create a function that can calculate the births based on the column we need.

The `column` can indicate year, month or day of week.

In [9]:
def calc_counts(input_list, column):
    calc_result = {}
    for ls in input_list:
        calc_index = ls[column]
        result = ls[4]
        if calc_index in calc_result:
            calc_result[calc_index] = calc_result[calc_index] + result
        else:
            calc_result[calc_index] = result
    return calc_result

cdc_year_births = calc_counts(cdc_list, 0)
cdc_year_births

{1994: 19763835,
 1995: 19497945,
 1996: 19457470,
 1997: 19404470,
 1998: 19707765,
 1999: 19797085,
 2000: 20294070,
 2001: 20129665,
 2002: 20108630,
 2003: 20387880}

### We can then use the same function to calculate sum of births based on columns.

In [13]:
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births = calc_counts(cdc_list, 3)

cdc_dom_births

{1: 6382785,
 2: 6443695,
 3: 6522495,
 4: 6440770,
 5: 6499765,
 6: 6522370,
 7: 6552295,
 8: 6561485,
 9: 6516460,
 10: 6603820,
 11: 6571805,
 12: 6592185,
 13: 6388420,
 14: 6600765,
 15: 6595855,
 16: 6575960,
 17: 6624765,
 18: 6634275,
 19: 6593635,
 20: 6624105,
 21: 6614485,
 22: 6586905,
 23: 6466450,
 24: 6440415,
 25: 6360580,
 26: 6423980,
 27: 6471975,
 28: 6538425,
 29: 6115805,
 30: 6010475,
 31: 3671610}

## Calculate the min and max births in certain dictionary.

In [20]:
def find_most(input_dict):
    most_result = {}
    most_result["min_birth"] = min(input_dict.values())
    most_result["max_birth"] = max(input_dict.values())
    return most_result

min_max_month_birth = find_most(cdc_month_births)  
min_max_month_birth

{'max_birth': 17629290, 'min_birth': 15090700}