# Canada's Unemployment Rate since 1976

Dataset Columns:
REF_DATE: Year and Month
GEO: Province or Country
Sex: Male Female, or Both
Age Group: Range of ages in years
Employment: Number of people employed
Full-time Employment: Number of people employed in a full-time job
Labour force: Number of civilian, non-institutionalized people 15 years of age and over who, during the reference week, were employed or unemployed.
Part-time Employment: Number of people employed in a part-time job
Population: Number of people that are of working age, 15 years and over.
Unemployment: Number of people who, during the reference week, were without work, had looked for work in the past four weeks, and were available for work. Those persons on layoff or who had a new job to start in four weeks or less are considered unemployed.
Employment rate: % of people employed
Participation rate: The participation rate is the number of labour force participants expressed as a percentage of the population 15 years of age and over. 
Unemployment rate: % of people unemployed



In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt



url = "https://raw.githubusercontent.com/rehan-nasir/unemployment_rate_canada/refs/heads/main/dataset/Unemployment_Canada_1976_present.csv"

labor_data = pd.read_csv(url, sep=",")

In [16]:
labor_data

Unnamed: 0,REF_DATE,GEO,Sex,Age group,Employment,Full-time employment,Labour force,Part-time employment,Population,Unemployment,Employment rate,Participation rate,Unemployment rate
0,1976-01,Alberta,Both sexes,15 to 24 years,231800.0,174900.0,252300.0,56900.0,362300.0,20500.0,64.0,69.6,8.1
1,1976-01,Alberta,Both sexes,15 to 64 years,802400.0,682100.0,837500.0,120300.0,1154800.0,35000.0,69.5,72.5,4.2
2,1976-01,Alberta,Both sexes,15 years and over,819500.0,693700.0,856500.0,125800.0,1276700.0,37000.0,64.2,67.1,4.3
3,1976-01,Alberta,Both sexes,25 to 54 years,491400.0,439800.0,505800.0,51600.0,661700.0,14400.0,74.3,76.4,2.8
4,1976-01,Alberta,Both sexes,25 years and over,587700.0,518800.0,604200.0,68900.0,914400.0,16500.0,64.3,66.1,2.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...
38980,2023-01,Saskatchewan,Both sexes,15 to 64 years,553900.0,458100.0,579800.0,95800.0,719100.0,25900.0,77.0,80.6,4.5
38981,2023-01,Saskatchewan,Both sexes,15 years and over,589000.0,479000.0,615500.0,110000.0,912000.0,26500.0,64.6,67.5,4.3
38982,2023-01,Saskatchewan,Both sexes,25 to 54 years,379900.0,337300.0,394500.0,42600.0,442400.0,14600.0,85.9,89.2,3.7
38983,2023-01,Saskatchewan,Both sexes,25 years and over,507400.0,436000.0,526200.0,71500.0,774200.0,18700.0,65.5,68.0,3.6


In [10]:
labor_data.info

<bound method DataFrame.info of       REF_DATE           GEO         Sex          Age group  Employment  \
0      1976-01       Alberta  Both sexes     15 to 24 years    231800.0   
1      1976-01       Alberta  Both sexes     15 to 64 years    802400.0   
2      1976-01       Alberta  Both sexes  15 years and over    819500.0   
3      1976-01       Alberta  Both sexes     25 to 54 years    491400.0   
4      1976-01       Alberta  Both sexes  25 years and over    587700.0   
...        ...           ...         ...                ...         ...   
38980  2023-01  Saskatchewan  Both sexes     15 to 64 years    553900.0   
38981  2023-01  Saskatchewan  Both sexes  15 years and over    589000.0   
38982  2023-01  Saskatchewan  Both sexes     25 to 54 years    379900.0   
38983  2023-01  Saskatchewan  Both sexes  25 years and over    507400.0   
38984  2023-01  Saskatchewan  Both sexes  55 years and over    127500.0   

       Full-time employment  Labour force  Part-time employment   P

In [12]:
labor_data.shape

(38985, 13)

In [14]:
labor_data.isna().sum()

REF_DATE                    0
GEO                         0
Sex                         0
Age group                   0
Employment                  0
Full-time employment     1695
Labour force                0
Part-time employment     1696
Population                  0
Unemployment                6
Employment rate             0
Participation rate          0
Unemployment rate           6
dtype: int64

Columns Full-time Employment and Part-time Employment have 1685 missing or null values.

In [None]:
labor_data[labor_data.isna().any(axis=1)]



Unnamed: 0,REF_DATE,GEO,Sex,Age group,Employment,Full-time employment,Labour force,Part-time employment,Population,Unemployment,Employment rate,Participation rate,Unemployment rate
12,1976-01,Canada,Both sexes,15 to 19 years,999200.0,,1178900.0,,2330000.0,179700.0,42.9,50.6,15.2
16,1976-01,Canada,Both sexes,20 to 24 years,1500400.0,,1670500.0,,2179800.0,170100.0,68.8,76.6,10.2
19,1976-01,Canada,Both sexes,55 to 64 years,955600.0,,999700.0,,1886900.0,44100.0,50.6,53.0,4.4
81,1976-02,Canada,Both sexes,15 to 19 years,1000500.0,,1177800.0,,2333000.0,177300.0,42.9,50.5,15.1
85,1976-02,Canada,Both sexes,20 to 24 years,1508600.0,,1679300.0,,2185200.0,170700.0,69.0,76.8,10.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
38863,2022-12,Canada,Both sexes,20 to 24 years,1708900.0,,1864200.0,,2404600.0,155300.0,71.1,77.5,8.3
38866,2022-12,Canada,Both sexes,55 to 64 years,3300500.0,,3451000.0,,5144500.0,150600.0,64.2,67.1,4.4
38928,2023-01,Canada,Both sexes,15 to 19 years,959600.0,,1090200.0,,2112500.0,130600.0,45.4,51.6,12.0
38932,2023-01,Canada,Both sexes,20 to 24 years,1726700.0,,1879500.0,,2412000.0,152700.0,71.6,77.9,8.1


By looking at all the rows containing null values, we can see null values are only in rows that have "Canada" under GEO.
Upon further inspection, I see that these null values occur because of age group values exclusive to "Canada". I can prove this by checking the age group values.

In [78]:
country_age_groups = labor_data[labor_data["GEO"] == "Canada"]["Age group"]
country_age_groups.value_counts()


Age group
15 to 19 years       565
15 to 24 years       565
15 to 64 years       565
15 years and over    565
20 to 24 years       565
25 to 54 years       565
25 years and over    565
55 to 64 years       565
55 years and over    565
Name: count, dtype: int64

In [79]:
provincial_age_groups = labor_data[labor_data["GEO"] != "Canada"]["Age group"]
provincial_age_groups.value_counts()

Age group
15 to 24 years       5650
15 to 64 years       5650
15 years and over    5650
25 to 54 years       5650
25 years and over    5650
55 years and over    5650
Name: count, dtype: int64

In [86]:
shared_age_groups = list((set(country_age_groups.values) ^ set(provincial_age_groups.values)))
shared_age_groups

['55 to 64 years', '15 to 19 years', '20 to 24 years']

Age groups 55-64, 15-29, and 20-24 years are not shared between country and provincial age groups. To keep things consistent, I will drop any row containing these age groups.

In [87]:
clean_labor_data = labor_data.dropna()
clean_labor_data

Unnamed: 0,REF_DATE,GEO,Sex,Age group,Employment,Full-time employment,Labour force,Part-time employment,Population,Unemployment,Employment rate,Participation rate,Unemployment rate
0,1976-01,Alberta,Both sexes,15 to 24 years,231800.0,174900.0,252300.0,56900.0,362300.0,20500.0,64.0,69.6,8.1
1,1976-01,Alberta,Both sexes,15 to 64 years,802400.0,682100.0,837500.0,120300.0,1154800.0,35000.0,69.5,72.5,4.2
2,1976-01,Alberta,Both sexes,15 years and over,819500.0,693700.0,856500.0,125800.0,1276700.0,37000.0,64.2,67.1,4.3
3,1976-01,Alberta,Both sexes,25 to 54 years,491400.0,439800.0,505800.0,51600.0,661700.0,14400.0,74.3,76.4,2.8
4,1976-01,Alberta,Both sexes,25 years and over,587700.0,518800.0,604200.0,68900.0,914400.0,16500.0,64.3,66.1,2.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...
38980,2023-01,Saskatchewan,Both sexes,15 to 64 years,553900.0,458100.0,579800.0,95800.0,719100.0,25900.0,77.0,80.6,4.5
38981,2023-01,Saskatchewan,Both sexes,15 years and over,589000.0,479000.0,615500.0,110000.0,912000.0,26500.0,64.6,67.5,4.3
38982,2023-01,Saskatchewan,Both sexes,25 to 54 years,379900.0,337300.0,394500.0,42600.0,442400.0,14600.0,85.9,89.2,3.7
38983,2023-01,Saskatchewan,Both sexes,25 years and over,507400.0,436000.0,526200.0,71500.0,774200.0,18700.0,65.5,68.0,3.6


In [89]:
clean_labor_data.isna().sum()

REF_DATE                 0
GEO                      0
Sex                      0
Age group                0
Employment               0
Full-time employment     0
Labour force             0
Part-time employment     0
Population               0
Unemployment             0
Employment rate          0
Participation rate       0
Unemployment rate        0
dtype: int64

I'll check for any duplicate rows now.

In [92]:
clean_labor_data.duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
38980    False
38981    False
38982    False
38983    False
38984    False
Length: 37283, dtype: bool

Fortunately, there are no duplicates in our dataset. 