# Explore U.S. Births
My first guided python project on Dataquest.io

## Project Rundown
Here, I take a US census data files, clean it up, and then do some basic analysis. I create a dictionary that includes the birth rate by year, month, day of the month, and day of the week.

First, I clean up the data and remove the headers - year, month, day of month, day of week, and birth rate. 

In [8]:
def read_csv(filename):
    final_list = []
    int_fields = []
    string_fields = []
    
    file = open(filename, "r").read()
    
    # remove header and split into list of rows
    string_list = file.split("\n")
    string_list = string_list[1:len(string_list)]
    
    # split each row into a list and convert to integers
    for string in string_list:
        string_fields = string.split(",")
        for i in range(len(string_fields)):
            string_fields[i] = int(string_fields[i])
        final_list.append(string_fields)
        
    return(final_list)

cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")

cdc_list[0:10]
            
    

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

Then I create a dictionary of birth per month. 

In [19]:
def month_births(arg):
    births_per_month = {}
    
    for item in arg:
        if item[1] not in births_per_month:
            births_per_month[item[1]] = item[4]
        else: 
            births_per_month[item[1]] += item[4]
    return(births_per_month)

cdc_month_births = month_births(cdc_list)

cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

I do the same thing for birth per day of the week.

In [21]:
def dow_births(arg):
    births_per_dow = {}
    
    for item in arg:
        if item[3] not in births_per_dow:
            births_per_dow[item[3]] = item[4]
        else: 
            births_per_dow[item[3]] += item[4]
    return(births_per_dow)

cdc_day_births = dow_births(cdc_list)

cdc_day_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

But this isn't very efficient, so I create a new function that creates a dictionary based on the column argument that is passed. Now we can quickly create a dictionary for the day, month or year. 

In [30]:
def calc_counts(data, column):
    births_per_column = {}
    
    for item in data:
        if item[column] not in births_per_column:
            births_per_column[item[column]] = item[4]
        else: 
            births_per_column[item[column]] += item[4]
    return(births_per_column)


cdc_year_births = calc_counts(cdc_list,0)
cdc_month_births = calc_counts(cdc_list,1)
cdc_dom_births = calc_counts(cdc_list,2)
cdc_dow_births = calc_counts(cdc_list,3)

print("Births per month:")
cdc_month_births

Births per month:


{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

That's all for my first python project. Stay tuned for more!