![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Births, Deaths and Marriages in Canada

## About this notebook

In this notebook I download a full dataset from StatsCan exploring how the number of deaths, births and marriages in Canada has changed throughout the course of the years.

## Data Source

Data set is obtained from https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710005901 via ProductID 17-10-0059-01. 


## What are the questions I am interested in answering? 

1. How has the number of deaths and births changed in Canada since 1946? 

2. What is the average number of births and deaths for each province between 1946 and 2019?

4. How does the ratio of births to deaths compare on any two given years?

I am interested in learning how the number of births and deaths have changed over the course of the years. In this notebook, I assume the learner is comfortable with Jupyter notebooks and Python programming languages and can visualize line charts as well as pie charts. 

In [None]:
# Installing required module
!pip install tqdm

In [None]:
# Import necessary Stats Can functions
%run -i ./StatsCan/helpers.py
%run -i ./StatsCan/scwds.py
%run -i ./StatsCan/sc.py
# Import necessary libraries
import datetime as dt
import pandas as pd
import json
import datetime
from tqdm import tnrange, tqdm_notebook
from time import sleep
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Downloading Stats Can Data

Run the code below to get the dataset found here: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710005901 

In [None]:
# # Download data 
# DATA SET PRODUCT ID  for internal use only. 
productId = '17-10-0059-01'
download_tables(str(productId))
df_fullDATA = zip_table_to_dataframe(productId)

# Clean up full dataset - remove internal use columns
cols = list(df_fullDATA.loc[:,'REF_DATE':'UOM'])+ ['SCALAR_FACTOR'] +  ['VALUE']
df_less = df_fullDATA[cols]
df_less2 = df_less.drop(["DGUID"], axis=1)

# Display only first five entries
df_less2.head(5)

## Select Data Subsets


We see that the data contains a GEO column with names of provinces, as well as Canada, along with the number of births, deaths and marriages. 

Let's take a look at the number of deaths, births and marriages in Canada between 1946 and 2020.

In [None]:
# Subsetting the data
initial_date = "1946"
final_date = "2020"
location="Canada"

df_sub = df_less2[(df_less2["REF_DATE"]>=initial_date) & 
                  (df_less2["REF_DATE"]<=final_date) &
                  (df_less2["GEO"]==location)]

display(df_sub.head(3))
display(df_sub.tail(3))


Due to the size of the table, it won't make sense to stare a the table, so let's create a line plot with the number of births, deaths and marriages over time. 

Note: you can change the value of `location` for any province you are interested in and create the corresponding plot.

In [None]:
# Plotting
fig = make_subplots(rows=1, cols=1, specs=[[{'type':'scatter'}]])

df_sub_de = df_sub[(df_sub["Estimates"]=="Deaths")]
df_sub_ma = df_sub[(df_sub["Estimates"]=="Marriages")]
df_sub_bi = df_sub[(df_sub["Estimates"]=="Births")]

fig.add_trace(go.Scatter(x=df_sub_de["REF_DATE"],
                        y=df_sub_de["VALUE"], name="Deaths",line=dict(color="#66c2a5")))

fig.add_trace(go.Scatter(x=df_sub_bi["REF_DATE"],
                        y=df_sub_bi["VALUE"],name="Births",line=dict(color="#fc8d62")))

fig.add_trace(go.Scatter(x=df_sub_ma["REF_DATE"],
                        y=df_sub_ma["VALUE"],name="Marriages",line=dict(color="#8da0cb")))

fig.update_layout(
    title_text="<b>Number of births and deaths in " + str(location) +" between 1947 and 2019</b>")

# Set x-axis title
fig.update_xaxes(title_text="<b>Year</b>")

# Set y-axes titles
fig.update_yaxes(title_text="<b>Size of estimate (units) </b>")

fig.show()

### Observations

We see that Canada-wide, the number of births had an increase in the 1960s, followed by a decrease in the 1970s. It remained relatively stable from the 1990's onward. 

We also see that, for any given year, the number of births is higher during January and April, and lower during October. 

The number of deaths has increased steadily over time. For any given year, the number of deaths reaches its highest point during January, and it's lowest during July and April.

There is a large gap in the number of marriages between 1970 and 1980, followed by a discontinuity in the data after 2004, shortly after same-sex marriage was recognized as legal. See [here](https://en.wikipedia.org/wiki/Same-sex_marriage_in_Canada#Recognition_of_foreign_legal_unions). The highest number of marriages tend to be recorded during July each year, while the lowest number of marriages tends to be recorded during January each year. 

___ 
Let's now take a look at the average number of deaths and births for each of the provinces.

In [None]:
# Subsetting the data
initial_date = "1946"
final_date = "2020"

# Births
all_locations_b = df_less2[(df_less2["REF_DATE"]>=initial_date) & 
                  (df_less2["REF_DATE"]<=final_date) &
                        (df_less2["Estimates"]=="Births")]

average_b = all_locations_b.groupby("GEO").mean().reset_index()

# Deaths
all_locations_d = df_less2[(df_less2["REF_DATE"]>=initial_date) & 
                  (df_less2["REF_DATE"]<=final_date) &
                        (df_less2["Estimates"]=="Deaths")]

average_d = all_locations_d.groupby("GEO").mean().reset_index()

####### Plotting
fig = make_subplots(rows=1, cols=1, specs=[[{'type':'bar'}]])

fig.add_trace(go.Bar(x=average_b["GEO"],
                        y=average_b["VALUE"], name="Births")).update_xaxes(categoryorder='total ascending')

fig.add_trace(go.Bar(x=average_d["GEO"],
                        y=average_d["VALUE"], name="Deaths")).update_xaxes(categoryorder='total ascending')

fig.update_layout(
    title_text="<b>Average number of births and deaths in " + str(location) +" between 1947 and 2019</b>")

# Set x-axis title
fig.update_xaxes(title_text="<b>Year</b>")

# Set y-axes titles
fig.update_yaxes(title_text="<b>Size of estimate (units) </b>")

colors = ['#af8dc3','#f7f7f7','#7fbf7b']

fig.show()

### Observations

On average, Ontario registered both the highest number of deaths and births, during the period 1957 and 2019. Quebec follows second. Yukon recorded both the lowest number of deaths and births. 


## Comparing proportion of Births and Deaths between a given year and 2019

Let's now take a look at the ratio of births to deaths for two years: 1946 and 2019 in all of Canada. 

Note that you can change the value in the `location` variable below to any of the provinces listed in the dataframe. 

In [None]:
# Selecting new location
new_location = "Canada"
# Subsetting dataframe
df_sub_n = df_less2[(df_less2["GEO"]==new_location)]

# Getting sum of estimates for 1990
sum_of_estimates_1946 = df_sub_n[(df_sub_n["REF_DATE"]>='1946') & 
                               (df_sub_n['REF_DATE']<'1947') & 
                              (df_sub_n["Estimates"] !="Marriages")].groupby("Estimates").sum()
sum_of_estimates_1946 = sum_of_estimates_1946.reset_index()

# GEtting sum of estimates for 2019
sum_of_estimates_2019 = df_sub[(df_sub["REF_DATE"]>='2019') & 
                               (df_sub['REF_DATE']<'2020')].groupby("Estimates").sum()
sum_of_estimates_2019 = sum_of_estimates_2019.reset_index()


# Display
print("Estimate sum: 1946")
display(sum_of_estimates_1946)
print("Estimate sum: 2019")
display(sum_of_estimates_2019)

In [None]:
331471.0/115358.0

In [None]:
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=sum_of_estimates_1946["Estimates"], values=sum_of_estimates_1946['VALUE'], 
                     name="Estimates",scalegroup='one'),1, 1)
fig.add_trace(go.Pie(labels=sum_of_estimates_2019["Estimates"], values=sum_of_estimates_2019['VALUE'], 
                     name="Estimates",scalegroup='one'),1,2)

# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent+name")

fig.update_layout(
    title_text=str(new_location) + ": Estimate number of births and deaths in 1946 (left), and 2019 (right)",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='1946', x=0.20, y=0.5, font_size=15, showarrow=False),
                 dict(text='2019', x=0.80, y=0.5, font_size=15, showarrow=False)])


colors = ['#f1a340', '#998ec3']

fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,
                   marker=dict(colors=colors, line=dict(color='#000000', width=2)))

fig.show()


### Observations

We see that in 1946, out of all deaths and births in Canada, less than one third of the estimates account for deaths, and over two thirds account for births. 

In 2019, we see that, out of all deaths and births in Canada, less than half of the estimates account for deaths, and over half of the estimates account for births. 

This indicates that the ratio of deaths to births increased from 1946 to 2019, implying that more people are dying than being born in 2019, relative to 1946. 

## Conclusions

We learned in this notebook how the number of deaths, births and marriages has changed over the course of 1946 up until 2019. We learned that data on marriages stopped being collected in 2004, shortly after same-sex marriage was legalized. 

We also learned that Ontario registers, on average, both the highest number of births and the highest number of deaths, relative to all other provinces. 

We learned that the ratio of deaths to births has changed over time. For instance, in 1946, the ratio of biths to deaths was about 2.9 . However, in 2019, this ratio changed to 1.3. If this ratio decreases in the upcoming years, the Canadian population could experience a decrease in population. 

<h2 align='center'>References</h2>

Statistics Canada.  Table  17-10-0059-01   Estimates of the components of natural increase, quarterly


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)