# Exploring U.S Births

In this project, we want to explore data on births in the United States. The dataset was compiled by FiveThirtyEight and can be found [here](https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv).

The dataset contains the following columns:
* `year` - Year(1994 to 2003)
* `month` - Month (1 to 12)
* `day_of_month` - Day number of the month (1 to 31)
* `day_of_week` - Day of the week (1 to 7)
* `births` - Number of births on a specific date

**Objectives**: The objectives of this project is to learn about the Jupyer Notebook

## Introduction of the dataset

In [1]:
string_csv = open("US_births_1994-2003_CDC_NCHS.csv").read().split("\n")
print(string_csv[0:10])

['year,month,date_of_month,day_of_week,births', '1994,1,1,6,8096', '1994,1,2,7,7772', '1994,1,3,1,10142', '1994,1,4,2,11248', '1994,1,5,3,11053', '1994,1,6,4,11406', '1994,1,7,5,11251', '1994,1,8,6,8653', '1994,1,9,7,7910']


## Converting to a list of lists

In [2]:
def read_csv(filename):
    data = open(filename).read().split('\n')
    string_list = data[1:]
    final_list = []
    for each in string_list:
        int_fields = []
        string_fields = each.split(',')
        for val in string_fields:
            integer = int(val)
            int_fields.append(integer)
        final_list.append(int_fields)
        
    return final_list

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")

print(cdc_list[0:10])

[[1994, 1, 1, 6, 8096], [1994, 1, 2, 7, 7772], [1994, 1, 3, 1, 10142], [1994, 1, 4, 2, 11248], [1994, 1, 5, 3, 11053], [1994, 1, 6, 4, 11406], [1994, 1, 7, 5, 11251], [1994, 1, 8, 6, 8653], [1994, 1, 9, 7, 7910], [1994, 1, 10, 1, 10498]]


## Calculating the number of births for each month

In [3]:
def month_births(data):
    births_per_month = {}
    
    for each in data:
        month = each[1]
        births = each[4]
        if month in births_per_month:
            births_per_month[month] += births
        else:
            births_per_month[month] = births
            
    return births_per_month

cdc_month_births = month_births(cdc_list)

In [4]:
cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

## Calculating the number of births for each day of the week

In [5]:
def dow_births(data):
    births_per_dow = {}
    
    for each in data:
        dow = each[3]
        births = each[4]
        
        if dow in births_per_dow:
            births_per_dow[dow] += births
        else:
            births_per_dow[dow] = births
            
    return births_per_dow

cdc_day_births = dow_births(cdc_list)

In [6]:
cdc_day_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Creating a more general function

This function will display the list of births for each year, month, day of the month, and day of the week. That way, we get rid of redundant code.

In [7]:
def calc_counts(data, column):
    
    sum_dict = {}
    
    for each in data:
        col_value = each[column]
        births = each[4]
        
        if col_value in sum_dict:
            sum_dict[col_value] += births
        else:
            sum_dict[col_value] = births
            
    return sum_dict

cdc_year_births = calc_counts(cdc_list, 0)
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births = calc_counts(cdc_list, 3)

In [8]:
cdc_year_births

{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4058814,
 2001: 4025933,
 2002: 4021726,
 2003: 4089950}

In [9]:
cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

In [10]:
cdc_dom_births

{1: 1276557,
 2: 1288739,
 3: 1304499,
 4: 1288154,
 5: 1299953,
 6: 1304474,
 7: 1310459,
 8: 1312297,
 9: 1303292,
 10: 1320764,
 11: 1314361,
 12: 1318437,
 13: 1277684,
 14: 1320153,
 15: 1319171,
 16: 1315192,
 17: 1324953,
 18: 1326855,
 19: 1318727,
 20: 1324821,
 21: 1322897,
 22: 1317381,
 23: 1293290,
 24: 1288083,
 25: 1272116,
 26: 1284796,
 27: 1294395,
 28: 1307685,
 29: 1223161,
 30: 1202095,
 31: 746696}

In [11]:
cdc_dow_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Creating a function to find the minimum and maximum value in a dictionary

The function takes in a dictionary and will return its maximum and minimum value. This is interesting if we want to know on which day of the week is there the maximum and minimum number of births.

In [14]:
def findMinMax(input_dict):
    found_max = max(input_dict.values())
    found_min = min(input_dict.values())
    min_max = {
        "Maximum": found_max,
        "Minimum": found_min
    }
    return min_max

In [15]:
findMinMax(cdc_dow_births)

{'Maximum': 6446196, 'Minimum': 4079723}