# California Births by Age of Mother (1960's to 2016)

## Imports

For this project, we'll need a variety of different modules:
 - pandas for DataFrames and reading/processing data
 - plotly for graphing the resultant data
 - colorlover for palettes to use with plotly

In [160]:
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
import colorlover as cl

import warnings
warnings.filterwarnings("always")

init_notebook_mode(connected=True)

## Data Collection and Processing
To retrieve our data, we use pandas to load a CSV from California's public data website:

In [51]:
data = pd.read_csv('https://data.chhs.ca.gov/dataset/422ce2cc-9402-46c0-8d3c-bb8d6385597d/resource/255c79b2-461c-4c49-82a8-ba7c3b8adf77/download/births-by-mothers-age-group-1960current.csv')

data.head(8)

Unnamed: 0,Year,AGE,Count
0,1960,UNDER 15,443
1,1960,15-19,54010
2,1960,20-24,128313
3,1960,25-29,94704
4,1960,30-34,57569
5,1960,35-39,29540
6,1960,40-44,6865
7,1960,45 AND OVER,338


Since our data is already formatted, there is not much processing that needs to happen before it can be graphed. In fact, we can start preparing for graphing already. 

To do this, we need to generate a list of dates to use as our x-axis, in this case, years from 1960 to 2016 (range() doesn't include the last value, so we have to add 1). Our colors are imported from colorlover. These can be tweaked to a user's liking, but the sequential 'RdPu' scheme divided the data nicely and was easy on the eyes. We also store a list of the age ranges for the data that we are interested in. This is because the data does include some rows that we do not want to include because they are too small, such as births from mothers under 15 years old or over 45. In testing, these only cluttered the eventual plot, so we leave them out for now.

In [161]:
years = [x for x in range(1960, 2017)]
colors = cl.scales['6']['seq']['RdPu']
age_ranges = ['15-19', '20-24', '25-29', '30-34', '35-39', '40-44']

## Visualization

The majority of the project comes down to the visuals, which are all defined in the next cell. Plotly has a particular way of doing things, so with reference to their usage guide I was able to prepare the data for graphing. 

We want our data to be represented in a stacked bar graph, as the dataset we are using is split into different groups that change in size over time. To use this sort of graph, we need to prepare the bars that we are going to use. Each Bar has it's own x and y values, each represented in a list. Since we generated a list of years earlier, we use that as our x value. For y, we store a list of the number of births for every year. Since they are ordered in ascending order by year, we don't need to sort them before converting it directly to a list. From here, we can change some characteristics of our eventual graph, such as the outline color.

This list comprehension generates the Bars necessary for each of the ages. Next, we need to define the actual layout. Here, we just set the various attributes of our graph, such as the title, legend, and x/y axis labels. We make sure to set the barmode to 'stack' in order to generate a stacked bar graph specifically.

In [134]:
graph_data = [go.Bar(
                x = years,
                y = data[data['AGE'] == age]['Count'].tolist(),
                name = age,
                marker = dict(
                    color = colors[age_ranges.index(age)],
                    line = dict(
                        color = 'rgba(58, 71, 80, 0.2)',
                        width = 0.5)
                )
            )
            for age in age_ranges] # create list of Bar graphs for each age range

layout = go.Layout(
    title = 'Number of California Births by Age of Mother',
    xaxis = dict(
        title = 'Year',
        tickfont = dict(
            size = 14,
            color = 'rgb(107, 107, 107)'
        )
    ),
    yaxis = dict(
        title = 'Number of Births',
        titlefont = dict(
            size = 16,
            color = 'rgb(107, 107, 107)'
        ),
        tickfont = dict(
            size = 14,
            color = 'rgb(107, 107, 107)'
        )
    ),
    legend = dict(
        x = 0,
        y = 1.2,
        bgcolor = 'rgba(255, 255, 255, 0)',
        bordercolor = 'rgba(255, 255, 255, 0)'
    ),
    barmode = 'stack',
    bargap = 0,
    bargroupgap = 0
)

fig = go.Figure(data = graph_data, layout = layout)
py.iplot(fig)

Tada!

Another possible visualization that can be done is to break down the counts of births by percentage instead of raw number. This might be helpful in revealing how the proportions of birth ages actuall changed over the years, and helps mitigate the effects of outside variables such as a rise in population.

To do this, we use an almost identical graph, the only change being the y axis. To calculate the percentage of each age by year, we divide the number of births by age by the total births in that year. Again, we can use a list comprehension for this to keep our code shorter and cleaner.

In [164]:
graph_data = [go.Bar(
                x = years,
                y = [row['Count'] / data[data['Year'] == row['Year']]['Count'].sum() for index, row in data[data['AGE'] == age].iterrows()], 
                name = age,
                marker = dict(
                    color = colors[age_ranges.index(age)],
                    line = dict(
                        color = 'rgba(58, 71, 80, 0.2)',
                        width = 0.5)
                )
            )
            for age in age_ranges] # create list of Bar graphs for each age range

layout = go.Layout(
    title = 'Percentage of Births by Age of Mother',
    xaxis = dict(
        title = 'Year',
        tickfont = dict(
            size = 14,
            color = 'rgb(107, 107, 107)'
        )
    ),
    yaxis = dict(
        title = 'Percentage of Births',
        titlefont = dict(
            size = 16,
            color = 'rgb(107, 107, 107)'
        ),
        tickfont = dict(
            size = 14,
            color = 'rgb(107, 107, 107)'
        ),
        range = [0, 1]
    ),
    barmode = 'relative',
    bargap = 0,
    bargroupgap = 0
)

fig = go.Figure(data = graph_data, layout = layout)
py.iplot(fig)

This graph gives us a better sense of how the proportions of birth ages have changed over the years, as the dominant age group has definitely shifted from the 1960s until now. 

## References and Sources
- [California Open Data: Births by Age of Mother](https://data.ca.gov/dataset/births-age-mother-1960-current)
- [Plotly](https://plot.ly/#/)