# Application Test for Economist Position

[](http://)**This notebook was submitted as part of a test for a job recruitment of an economist. Please feel free to comment, like, and share feedback! Here were the instructions:**

Data Analysis and Writing Test (total estimated time: 2 hours)

You have 24 hours to complete this assignment. The 24-hour countdown begins after this
document and associated data file are sent to you.
Send the three outputs (data script, plot of data and short writing up) directly to xxx.

There are two elements to this test (A and B) which are explained below:

**A) Data Analysis** <br>
Objective: Using the data provided, generate a data visualization in a scripting language of
your choice (e.g., R, Python, SAS, Stata, etc.)

Deliverable:
> a) script file containing all loading, processing and export steps (e.g., *.do, *.r, *.py, etc.)

Time Allotment: A maximum of one (1) hour <br>
Detailed Instructions:
- Generate a **line plot** of the **annual employment growth rate** (i.e., year-over-year percent
changes) from **January 2007** to the most recent data point available (November 2019)
for the following **two provinces**: Alberta and Ontario
- All data manipulations and calculation of variables must be done in the scripting
language of choice. No changes should be made to the data in the CSV file prior to
loading it into your data environment
- The final script should show all steps in the process:
loading the data, manipulating the data and calculating required variables, generating the plot, and exporting the plot image. Including explanatory comments within the script is recommended, but not required.

**B) Writing Assignment** <br>
Objective: Write a blog post for a non-technical external audience explaining the most salient patterns shown in the data visualization produced in part A of this test.

Deliverable:
> a) MS Word file with a draft blog post

Time Allotment: A maximum of one (1) hour. <br>
Detailed Instructions:
- Describe the patterns observed in the plot generated, and provide some explanation as
to why the employment growth rates in the 2 provinces move together or diverge during
the past 13 years
- The blog post can be written in the language of your choice (English or French) (If written in English, the blog should be between 200 and 500 words in length, if written in French, the blog should be between 300 and 800 words in length)
- Formatting of the text in the Word document does not matter.

**A) DATA ANALYSIS**

> ## Loading the data

In [None]:
#Matplotlib for data visualization
import matplotlib.pyplot as plt
#Numpy for aggregate statistics
import numpy as np
#Datetime for organizing dates
import datetime
#Pandas for data manipulation and analysis
import pandas as pd
df = pd.read_csv('../input/example_data.csv')

In [None]:
#First read the data: view all columns, all rows, and check if there is any missing or bad data

df.head(5)
#df.dtypes
#df.columns
#df.iloc[0:-1]
#df['month'].unique()

## Manipulating the data and calculating required variables

In [None]:
#Then, read Alberta and Ontario only between Jan 2007 to Nov 2019

df = df[['month','variable','sex','Alberta','Ontario']]

#df.loc[df['month'] == '2007-01']
#df.iloc[3347]

df = df.iloc[3348:-1]
df

In [None]:
#For the purposes of this task, we are interested in the total employment for both sexes
df = df.loc[(df['variable'] == 'Employment') & (df['sex'] == 'Both sexes')]
df

In [None]:
#Drop redundant columns 'variable' and 'sex' because all values equal 'Employment' and 'Both sexes' respectively

df = df.drop(columns=['variable', 'sex'])

#Reset index for simplicity

df.reset_index(drop=True, inplace=True)
df

#Dataframe now represents total employment in Alberta and Ontario, by month, from January 2007 to November 2019

In [None]:
#Group data by year so to calculate annual growth rate
#Column month is an object and must be changed to datetime
df.dtypes

In [None]:
df['month'] = pd.to_datetime(df['month'].astype(str), format='%Y%')
df

In [None]:
#Check to see is month column is now datetime object
df.dtypes

In [None]:
#Rename column month to Year, and for clarity rename Alberta to Alberta Employment and Ontario to Ontario Employment

df.columns = ['Year', 'Alberta Employment', 'Ontario Employment']

In [None]:
#Group data by year
grouped = df.groupby(df['Year'].map(lambda x: x.year), as_index=False)

#View months seperated by year of employment in Alberta and Ontario
for year,group in grouped:
    print (year)
    print (group)
    
#Check to see if data is still accurate
#grouped.size()
#grouped.describe()

In [None]:
#Next step is to create dataframe by year from Jan 2007 to Nov 2019

#Include Jan 2007 in new df so to be inclusive of annual employment growth rate since Jan 2007
start = df.iloc[0:1]
#start

#Determine last value of each year (Dec 2007, Dec 2008, etc.) to use for calculating annual growth rate
df = grouped.last()
#df

#Combine start and df dataframes so our scope is now organized by year between Jan 2007 to Nov 2019
df = start.append(df)

#Reset the index
df.reset_index(drop=True, inplace=True)
df

In [None]:
#Determine annual growth rate year-over-year
df[['AC%','OC%']]=df.sort_values(['Year'])[['Alberta Employment','Ontario Employment']].pct_change()*100
df

In [None]:
#Double check 
((2052.9-2027.6)/2027.6)*100

In [None]:
#Compare the percentage change data
#df['AC%'].describe()
#df['OC%'].describe()

## Generating the plot and exporting the plot image

In [None]:
plt.style.use('seaborn-whitegrid')
plt.figure(figsize=(15,4))
plt.plot( 'Year', 'AC%', data=df, marker='o', markerfacecolor='blue', markersize=5, color='skyblue', linewidth=4, label="Alberta")
plt.plot( 'Year', 'OC%', data=df, marker='o', markerfacecolor='olive', markersize=5, color='skyblue', linewidth=4, label="Ontario")
plt.legend()
plt.xlabel('Year')
plt.ylabel('percent')
plt.title('Annual Employment Growth Rate, Alberta and Ontario, Jan 2007 - Nov 2019')
plt.savefig('Annual_Employment_Growth_Rate_Alberta_Ontario_2007_2019.png')

**B) WRITING ASSIGNMENT**

**Ontario sees hike in employment rate in 2019 while Alberta declines**
<br> 9 November 2019

A recent study by the Labour Market Information Council (LMIC) comparing annual employment growth rates in the last 13 years shows that Ontario and Alberta see similar fluctuations in the rate of employment, but it is Alberta that experiences these fluctuations more severely. However, since the start of 2019 the two provinces have diverged with Ontario experiencing a hike of 2.89% in employment growth and Alberta declining by 0.14%. 

In recent previous years, there were commonalities in the fluctuations of the rate of employment between the two provinces. For example, both provinces experienced a drop and rise in employment rates during the recession and recovery of the economic crisis which began in 2007. An interesting finding is that employment rate fluctuates much more drastically for Alberta than Ontario. The standard deviation in the change of employment rate for Alberta is 1.87% compared to 1.21% for Ontario which means that on average, Alberta’s employment rate deviates from the norm more so than Ontario’s. 

In 2019, Statistics Canada estimated that employment in Alberta in natural resources declined by 9.9%. So, although trends in employment rate for Ontario and Alberta tend to move together, why do they fluctuate at different accords? How do each of these provinces act and react to shifting national and international environments?

The study conducted by LMIC was all-inclusive of specifications such as full-time and part-time employment amongst both males and females. Further studies could extract insights from these differing groups and include sector-specific data so to look more into the types and conditions of employment rate growth in Ontario and Alberta.