![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

To run this notebook press the >> Button and confirm "Restart and Run all".

![](./images/RunB.png)

In [None]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

<h1 align='center'>Births, Deaths and Marriages in Canada</h1>

## About this notebook

In this notebook we download a full dataset and plot multiple one-dimensional subsets of the data. 


Data set is obtained from https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710005901 via ProductID 17-10-0059-01. 

This notebook uses our quick data set exploration application, and plots pie charts specific to the data set. 

## What are the questions we are interested in answering? 

1. How has the number of deaths and births changed in Canada since 1946? 

2. Are there provinces where this change is more drastic?

3. Are rates of birth and death constant? 

4. How has the number of marriages in Canada changed since 1946?



In [None]:
!pip install tqdm

In [None]:
%run -i ./StatsCan/helpers.py
%run -i ./StatsCan/scwds.py
%run -i ./StatsCan/sc.py

In [None]:
from ipywidgets import widgets, VBox, HBox, Button
from ipywidgets import Button, Layout, widgets
from IPython.display import display, Javascript, Markdown, HTML
import datetime as dt
import pandas as pd
import json
import datetime
from tqdm import tnrange, tqdm_notebook
from time import sleep


def rerun_cell( b ):
    
    display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1,IPython.notebook.get_selected_index()+3)'))    

    
def run_4cell( b ):
    
    display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1,IPython.notebook.get_selected_index()+5)'))    

style = {'description_width': 'initial'}



## Downloading Stats Can Data

Press the button below to download the dataset https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710005901 directly through StatsCan's API. 

In [None]:
# Fancy user interface to run cells below
DS_button = widgets.Button(
    button_style='success',
    description="Download Dataset", 
    layout=Layout(width='15%', height='30px'),
    style=style
)    
DS_button.on_click( run_4cell )

display(DS_button)

In [None]:
# # Download data 
# DATA SET PRODUCT ID  for internal use only. 
productId = '17-10-0059-01'

        
download_tables(str(productId))


df_fullDATA = zip_table_to_dataframe(productId)


# Clean up full dataset - remove internal use columns
cols = list(df_fullDATA.loc[:,'REF_DATE':'UOM'])+ ['SCALAR_FACTOR'] +  ['VALUE']
df_less = df_fullDATA[cols]
df_less2 = df_less.drop(["DGUID"], axis=1)

# Display only first five entries
df_less2.head()

In [None]:
# Display subsets found within data

# Get size of trimmed pandas
iteration_nr = df_less2.shape[1]
categories = []

# Get unique values in each subcategory
for i in range(iteration_nr-1):
    categories.append(df_less2.iloc[:,i].unique())
    
    
# Build a drop down menu for each subcategory, values displayed are those which were found in the loop above    
all_the_widgets = []
for i in range(len(categories)):
    if i==0:
        # First category contains start date
        a_category = widgets.Dropdown(
                value = categories[i][0],
                options = categories[i], 
                description ='Start Date:', 
                style = style, 
                disabled=False
            )
        b_category = widgets.Dropdown(
                value = categories[i][-1],
                options = categories[i], 
                description ='End Date:', 
                style = style, 
                disabled=False
            )
        all_the_widgets.append(a_category)
        all_the_widgets.append(b_category)
    elif i==1:
        # Base category: Store locations
        a_category = widgets.Dropdown(
                value = categories[i][0],
                options = categories[i], 
                description ='Location:', 
                style = style, 
                disabled=False
            )
        all_the_widgets.append(a_category)
    elif i==len(categories)-1:
        # Base category: Scalar factor
        a_category = widgets.Dropdown(
                value = categories[i][0],
                options = categories[i], 
                description ='Scalar factor:', 
                style = style, 
                disabled=False
            )
        all_the_widgets.append(a_category)
        
    elif i==len(categories)-2:
        # Base category: Units of measure
        a_category = widgets.Dropdown(
                value = categories[i][0],
                options = categories[i], 
                description ='Units of Measure :', 
                style = style, 
                disabled=False
            )
        all_the_widgets.append(a_category)
    else:
        # Non-base categories which may be added to datasets
        a_category = widgets.Dropdown(
                value = categories[i][0],
                options = categories[i], 
                description ='Subcategory ' + str(i), 
                style = style, 
                disabled=False
            )
        all_the_widgets.append(a_category)


## Select Data Subsets: One-Dimensional Plotting


Use the user menu below to select a cateory within the full subset you are interested in exploring. 

Choose a starting and end date to plot results. 

If there is data available, it will appear under the headers. 

Be careful to select dataframes with actual data in them!. 

Use the Select Dataset button to help you preview the data. 

In [None]:
# Fancy user interface to explore datasets
# Button widget
CD_button = widgets.Button(
    button_style='success',
    description="Preview Dataset", 
    layout=Layout(width='15%', height='30px'),
    style=style
)    

# Connect widget to function - run subsequent cells
CD_button.on_click( run_4cell )

# user menu using categories found above
tab3 = VBox(children=[HBox(children=all_the_widgets[0:3]),
                      HBox(children=all_the_widgets[3:5]),
                      HBox(children=all_the_widgets[5:len(all_the_widgets)]),
                      CD_button])
tab = widgets.Tab(children=[tab3])
tab.set_title(0, 'Load Data Subset')
display(tab)

In [None]:
df_sub = df_less2[(df_less2["REF_DATE"]>=all_the_widgets[0].value) & 
                  (df_less2["REF_DATE"]<=all_the_widgets[1].value) &
                  (df_less2["GEO"]==all_the_widgets[2].value) &
                  (df_less2["UOM"]==all_the_widgets[-2].value) & 
                  (df_less2["SCALAR_FACTOR"]==all_the_widgets[-1].value) ]



df_sub.head()

In [None]:
# TO HANDLE THE REST OF THE COLUMNS, SIMPLY SUBSTITUTE VALUES 
col_name = df_sub.columns[2]

# This command will slice dataframe to get only the data we are interested in
df_sub_final = df_sub[(df_sub[col_name]==all_the_widgets[3].value)]

In [None]:

# Time to plot!
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
%matplotlib inline

# Create "invisible" plot on the left - this is not necessary, but I like to center my plots in the notebook
# feels more aesthetically pleasing
fig1 = plt.figure(facecolor='w',figsize=(18,18))
plt.subplot(3, 3, 1)
plt.axis('off');

# Actual plot of time series
plt.subplot(3, 3, 2)
# Get start and end date, plot value found under "VALUE" command
plt.plot(df_sub_final["REF_DATE"],df_sub_final["VALUE"],'b--',label='Value')
plt.xlabel('Year-Month', fontsize=20)
plt.ylabel('Value',fontsize=20)
# Title changes depending on the subcategory explored
plt.title(str(all_the_widgets[3].value) + ", "+  str(all_the_widgets[2].value),fontsize=20)
plt.xticks(rotation=90)
plt.grid(True)

# Create "invisible" plot to the right - this is not necessary, but I like to center my plots in the notebook
# feels more aesthetically pleasing
plt.subplot(3, 3, 3);
plt.axis('off');


#### Your comments here

Double click this cell to answer the questions:


1. How has the number of deaths and births changed in Canada since 1946?

2. Are there provinces where this change is more drastic?

3. Are rates of birth and death constant?

4. How has the number of marriages in Canada changed since 1946?

## Comparing proportion of Births and Deaths between a given year and 2018

Use the menu below to pick a year and a region and compare the proportion of births and deaths on that year. The year 2018 is displayed by default. 

### What questions are we interested in answering

1. How does the ratio birth to death, change as you explore different years? 

2. Is this pattern similar across the different provinces?

3. Is there missing data?

In [None]:
# Pie charts!

# Dropdown button to get location
geo_dp2 = widgets.Dropdown(
    options = categories[1], 
    description ='Select location:', 
    style = style, 
    disabled=False
)

year_q = widgets.Dropdown(
    options = [1946 + i for i in range(73)], 
    description ='Year A:', 
    style = style, 
    disabled=False
)

year_e = widgets.Dropdown(
    options = [1946 + i for i in range(74)], 
    description ='Year B:', 
    style = style, 
    disabled=False,
    value=2019
)

# Button widget to run subsequent cells
CD_button2 = widgets.Button(
    button_style='success',
    description="Preview Chart", 
    layout=Layout(width='15%', height='30px'),
    style=style
)    
CD_button2.on_click( rerun_cell )

# User interface menu
tab3 = VBox(children=[HBox(children=[geo_dp2,year_q,year_e]),
                      CD_button])
tab = widgets.Tab(children=[tab3])
tab.set_title(0, 'Preview Chart')
display(tab)

In [None]:


# This function takes as input a dataframe and a quarter, and returns a dataframe with data from that year only
def get_year_pd(dataframe,year):
    # Get REF_DATE value containing the year we are interested in
    df_year = dataframe.loc[(dataframe["REF_DATE"] == str(year))]
    # Rename column for easier readability
    df4 = df_year.rename(index=str, columns={"VALUE": year,"GEO":"Geography"})
    # Drop REF_DATE column - not needed anymore as we have a single year now
    df4 = df4.drop(columns="REF_DATE")

    return df4

# Use function to get first four quarters in 1990 and 2018
df_2010_f = get_year_pd(df_less2,str(year_q.value).split("-")[0] + "-01-01")
df_2010_s = get_year_pd(df_less2,str(year_q.value).split("-")[0] + "-04-01")
df_2010_t = get_year_pd(df_less2,str(year_q.value).split("-")[0] + "-07-01")
df_2010_fo = get_year_pd(df_less2,str(year_q.value).split("-")[0] + "-10-01")

df_2018_f = get_year_pd(df_less2,str(year_e.value).split("-")[0] +"-01-01")
df_2018_s = get_year_pd(df_less2,str(year_e.value).split("-")[0] +"-04-01")
df_2018_t = get_year_pd(df_less2,str(year_e.value).split("-")[0] +"-07-01")
df_2018_fo = get_year_pd(df_less2,str(year_e.value).split("-")[0] +"-10-01")

# Merge all quarters for 1990 into new dataframe
new_df10 = pd.merge(pd.merge(pd.merge(df_2010_f,df_2010_s),df_2010_t),df_2010_fo)
# Merge all quarters for 2018 into new dataframe
new_df18 = pd.merge(pd.merge(pd.merge(df_2018_f,df_2018_s),df_2018_t),df_2018_fo)
new_df = pd.merge(new_df10,new_df18)

# We are interested in the proportion of birth and death for two given years, on a quarterly basis
df3 = new_df.iloc[2:,:]
df4 = df3[df3["Estimates"]=="Births"]
df5 = df3[df3["Estimates"]=="Deaths"]

In [None]:
# Get region of interest from dropdown menu
whichprovince = geo_dp2.value

# Get dataframe containing data for that region
whichPo = new_df.loc[new_df['Geography']== whichprovince]

whichPo.set_index('Estimates', inplace=True)
# Display dataframe
whichPo

In [None]:
#load "cufflinks" library under short name "cf"
import cufflinks as cf

#command to display graphics correctly in Jupyter notebook
cf.go_offline()

In [None]:
# Plotting piecharts

# Font size format
plt.rcParams.update({'font.size': 15})

# Handle case where data is not found
if whichPo.size==0:
    fig = plt.figure(figsize=(5,5))

    plt.text(0.5,0.5,"NO DATA FOUND",fontsize=25)
    
    plt.axis("Off")
    
else:
    # Get sum of death and birth for 2019
    final_sum_birth = 0
    final_sum_death = 0

    for item in whichPo[whichPo.index=="Births"].iloc[:,7:].columns:

        final_sum_birth+= whichPo[whichPo.index=="Births"][item].sum()

    for item in whichPo[whichPo.index=="Deaths"].iloc[:,7:].columns:

        final_sum_death+= whichPo[whichPo.index=="Deaths"][item].sum()

    # Get sum of death and birth for the other year    
    final_sum_birth_o = 0
    final_sum_death_o = 0

    for item in whichPo[whichPo.index=="Births"].iloc[:,3:7].columns:

        final_sum_birth_o+= whichPo[whichPo.index=="Births"][item].sum()

    for item in whichPo[whichPo.index=="Births"].iloc[:,3:7].columns:

        final_sum_death_o+= whichPo[whichPo.index=="Deaths"][item].sum()

    # Build new dataframes
    deaths_births_o = pd.DataFrame({"Year": [str(year_q.value).split("-")[0] + ", births",\
                                           str(year_q.value).split("-")[0] + ", deaths"],
                  "Count":[final_sum_birth_o,final_sum_death_o]})

    deaths_births_n = pd.DataFrame({"Year": [str(year_e.value).split("-")[0] + ', births', \
                                             str(year_e.value).split("-")[0] + ", deaths"],
                  "Count":[final_sum_birth,final_sum_death]})

    # Plot
    deaths_births_o.iplot(kind='pie',
                      labels="Year",
                      values="Count",
                      title="Deaths and births in " + str(year_q.value).split("-")[0] )
    deaths_births_n.iplot(kind='pie',
                      labels="Year",
                      values="Count",
                     title="Deaths and births in " +  str(year_e.value).split("-")[0] )

#### Your comments here

Double click this cell to answer the questions. 

### What questions are we interested in answering

1. How does the ratio birth to death, change as you explore different years? 

2. Is this pattern similar across the different provinces?

3. Is there missing data?

<h2 align='center'>References</h2>

Statistics Canada.  Table  17-10-0059-01   Estimates of the components of natural increase, quarterly


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)