# Exploring Births in the US

### Introduction

This notebook will analyse the births in US between 1994 and 2003. 
The data is found here: https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv

The learning objectives of this project include reading from a csv, transforming it in a list removing the headers and defining a general function to calculate the births for every year, month or day of the week.

## Reading from a csv and returning a list without headers

In [3]:
def read_csv(file_name):
    f = open(file_name, 'r')
    data = f.read()
    split_data = data.split("\n")
    string_list = split_data[1:len(split_data)]
    final_list = []
    for line in string_list:
        int_fields = []
        string_fields = line.split(",")
        for string in string_fields:
            integer = int(string)
            int_fields.append(integer)
        final_list.append(int_fields)
    return final_list

In [4]:
cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
print(cdc_list[0:10])

[[1994, 1, 1, 6, 8096], [1994, 1, 2, 7, 7772], [1994, 1, 3, 1, 10142], [1994, 1, 4, 2, 11248], [1994, 1, 5, 3, 11053], [1994, 1, 6, 4, 11406], [1994, 1, 7, 5, 11251], [1994, 1, 8, 6, 8653], [1994, 1, 9, 7, 7910], [1994, 1, 10, 1, 10498]]


## Defining a function to count the number of births

In [5]:
def calc_counts(data, column):
    number_of_births = {}
    for list in data:
        agregator = list[column]
        births = list[4]
        if agregator in number_of_births:
            number_of_births[agregator] = number_of_births[agregator] + births
        else:
            number_of_births[agregator] = births
    return number_of_births  


In [6]:
cdc_year_births = calc_counts(cdc_list, 0)
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births = calc_counts(cdc_list, 3)

print('Births by year: ', '\n',  cdc_year_births, '\n')
print('Births by month: ', '\n',  cdc_month_births, '\n')
print('Births by days of month: ', '\n',  cdc_dom_births, '\n')
print('Births by days of week: ', '\n',  cdc_dow_births, '\n')


Births by year:  
 {1994: 3952767, 1995: 3899589, 1996: 3891494, 1997: 3880894, 1998: 3941553, 1999: 3959417, 2000: 4058814, 2001: 4025933, 2002: 4021726, 2003: 4089950} 

Births by month:  
 {1: 3232517, 2: 3018140, 3: 3322069, 4: 3185314, 5: 3350907, 6: 3296530, 7: 3498783, 8: 3525858, 9: 3439698, 10: 3378814, 11: 3171647, 12: 3301860} 

Births by days of month:  
 {1: 1276557, 2: 1288739, 3: 1304499, 4: 1288154, 5: 1299953, 6: 1304474, 7: 1310459, 8: 1312297, 9: 1303292, 10: 1320764, 11: 1314361, 12: 1318437, 13: 1277684, 14: 1320153, 15: 1319171, 16: 1315192, 17: 1324953, 18: 1326855, 19: 1318727, 20: 1324821, 21: 1322897, 22: 1317381, 23: 1293290, 24: 1288083, 25: 1272116, 26: 1284796, 27: 1294395, 28: 1307685, 29: 1223161, 30: 1202095, 31: 746696} 

Births by days of week:  
 {6: 4562111, 7: 4079723, 1: 5789166, 2: 6446196, 3: 6322855, 4: 6288429, 5: 6233657} 

