In [1]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')


![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)  


<h1 align='center'>Exploring Python api for Statistics Canada New Data Model (NDM)</h1>

<h4 align='center'>Laura Gutierrez Funderburk $\mid$ November 1 2018</h4>

<h2 align='center'>Abstract</h2>

In this notebook we explore functionality of the Python API for Statistics Canada developed by Ian Preston 
https://anaconda.org/ian.e.preston/stats_can

<h2 align='center'>About Stats Canada</h2>

Statistics Canada is the national statistical office. The agency ensures Canadians have the key information on Canada's economy, society and environment that they require to function effectively as citizens and decision makers. 


![Stats Canada Main Page](./StatsCanada.png)

URL: https://www.statcan.gc.ca/eng/start

<h2 align='center'>About the Stats Can Python Library</h2>

The Stats_Can library retrieves up-to-date data from the Statistics Canada API, making it a powerful tool to explore statistics on Canada. 

<h2 align='center'>Exploring the Stats Can Python Library using Jupyter Notebooks</h2>

We can explore the database from two different angles:

<h4>Retrieve the latest lists series:</h4> This option will return a comprehensive list of all tables pointing to different datasets.

<h4>Retrieve a particular study:</h4> This option will give access to metadata and data pertaining a particular study. 

<h2 align='center'>Retrieve latest list series</h2>

In this section we will explore the last 100 datasets released. We will use the Python Library Stats Can to get that data, without navigating the entire Stats Canada Database. 

In [2]:
### import stats_can
import datetime as dt
import pandas as pd
import matplotlib as plt
import stats_can
import json
from pprint import pprint

In [3]:
%run -i ./stats_can/scwds.py

In [4]:
%run -i ./stats_can/sc.py

In [5]:
changed_series = stats_can.get_changed_series_list()
changed_series_df = pd.DataFrame.from_dict(changed_series)
short_series_list = changed_series_df.head(n=25)

In [6]:
short_series_list

Unnamed: 0,coordinate,productId,releaseTime,responseStatusCode,vectorId
0,1.14.0.0.0.0.0.0.0.0,33100036,2018-11-01T08:30,0,111666237
1,1.24.0.0.0.0.0.0.0.0,33100036,2018-11-01T08:30,0,111666247
2,1.22.0.0.0.0.0.0.0.0,10100139,2018-11-01T08:30,0,39061
3,1.24.0.0.0.0.0.0.0.0,10100139,2018-11-01T08:30,0,39063
4,1.13.0.0.0.0.0.0.0.0,10100139,2018-11-01T08:30,0,39051
5,1.53.0.0.0.0.0.0.0.0,10100144,2018-11-01T08:30,0,80691306
6,1.55.0.0.0.0.0.0.0.0,10100144,2018-11-01T08:30,0,80691308
7,1.24.0.0.0.0.0.0.0.0,10100107,2018-11-01T08:30,0,36646
8,1.15.0.0.0.0.0.0.0.0,10100107,2018-11-01T08:30,0,36637
9,1.14.0.0.0.0.0.0.0.0,10100125,2018-11-01T08:30,0,19457778


<h2 align='center'>Retrieve a particular dataset</h2>


We can use the values under the productId column to retrieve what is known as "cube" or "table" metadata. Metadata act as pointers to the actual data, and provide sufficient information to understand the type of study, without diving into the details. This is useful, as some of the data within Stats Canada is large. 

Metadata can be called from the Statistics Canada Database - information is obtained in the form of a JSON file which can be formatted for processing. 

Let us pick the first entry, with productId 33100036 and obtain metadata. 

In [7]:
# Download first entry, at the time this notebook was written, the table selected had a ProductId 33100036 
download_tables("33100036")
with open('33100036.json') as f:
    data = json.load(f)

In [8]:
keys_names = [item for item in data.keys()]
for i in range(len(keys_names)-3):
    print(str(keys_names[i]) + ":\t"+ str(data[keys_names[i]]))

responseStatusCode:	0
productId:	33100036
cansimId:	176-0080
cubeTitleEn:	Daily average foreign exchange rates in Canadian dollars, Bank of Canada
cubeTitleFr:	Taux de change moyens quotidiens en dollars canadiens, Banque du Canada
cubeStartDate:	1981-05-04
cubeEndDate:	2018-10-30
nbSeriesCube:	27
nbDatapointsCube:	30941
archiveStatusCode:	2
archiveStatusEn:	CURRENT - a cube available to the public and that is current
archiveStatusFr:	ACTIF - un cube qui est disponible au public et qui est toujours mise a jour
subjectCode:	['1004', '330304']
surveyCode:	['7502']


We then see that the above is a study that contains data on "Daily average foreign exchange rates in Canadian dollars, Bank of Canada" ("Taux de change moyens quotidiens en dollars canadiens, Banque du Canada") which was conducted between January 1, 1967 and September 1, 2018. 

We can download the data and explore it using Python's libraries Matplotlib and Pandas Dataframes. 

In [9]:
pd.read_csv("./exchange.csv")

Unnamed: 0.1,Unnamed: 0,Canada,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,Type of currency,"October 26, 2018","October 27, 2018","October 28, 2018","October 29, 2018","October 30, 2018"
1,"Australian dollar, daily average",0.9274,0,0,0.9279,0.9331
2,"Brazilian real, daily average",0.357,0,0,0.3582,0.3541
3,"Chinese renminbi, daily average",0.1887,0,0,0.1885,0.1885
4,"European euro, daily average",1.4921,0,0,1.4935,1.4917
5,"Hong Kong dollar, daily average",0.1672,0,0,0.1673,0.1674
6,"Indian rupee, daily average",0.01784,0,0,0.01787,0.01784
7,"Indonesian rupiah, daily average",0.000086,0,0,0.000086,0.000086
8,"Japanese yen, daily average",0.01172,0,0,0.01167,0.01164
9,"Malaysian ringgit, daily average",0.314,0,0,0.3138,0.314


![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)