# üß™ Lab 1: Accessing Statistics Canada Data via API

**Objective**: Use Python to search and retrieve datasets from Statistics Canada using the `stats_can` library.

**Theory**:
- Statistics Canada provides a Web Data Service (WDS) API for programmatic access to public datasets.
- The `stats_can` Python package wraps this API and allows easy access to tables and vectors.
- Each dataset is identified by a table number (e.g., 14100287) and contains multiple vectors (data series).

In [14]:
# üì¶ Step 1: Install the stats_can package
# Run this cell once to install the package
!pip install stats-can
!pip freeze | grep stats-can

stats-can==3.0.0


## üîç Step 2: Search for available tables
We can search for tables by keyword (e.g., 'population', 'GDP', 'employment').

In [19]:

from stats_can import StatsCan

# Initialize the StatsCan object. It will create a 'stats_can.h5' file
# in the current directory to store downloaded data, or you can specify a path.
sc = StatsCan()

sc = StatsCan(data_folder="~/stats_can_data")
# Load data using the identified VectorID
# You can also specify a start_date to get data from a specific point in time.
# df = sc.get_data(vectors=['v65201210'], start_date='2000-01-01')

# The 'df' variable will now contain a Pandas DataFrame with the requested data.
# print(df.head())

# Search for tables related to population
# tables = sc.search_tables("population")
# tables[['productId', 'titleEn']].head(10)

ImportError: cannot import name 'StatsCan' from 'stats_can' (/usr/local/lib/python3.12/dist-packages/stats_can/__init__.py)

In [20]:
import requests
import json

# Statistics Canada API endpoint for searching tables
search_url = "https://www150.statcan.gc.ca/o1/en/wds/rest/getAllProducts"

# Parameters for the search (you might need to adjust these based on API documentation)
# For a simple search, you might not need many parameters initially.
# Let's try to get a list of all products and then filter.
# The API documentation is crucial for understanding available parameters.

response = requests.get(search_url)

# Check if the request was successful
if response.status_code == 200:
    data = response.json()
    # The structure of the response might vary, inspect the data to find the list of tables.
    # Based on some examples, the data might be a list of dictionaries.
    # Let's assume the data is a list of products.
    # You would then filter this list based on keywords like 'population'.

    # For demonstration, let's just print the first few items to see the structure
    print("API Response (first 5 items):")
    for item in data[:5]:
        print(item)

    # To actually search for 'population', you would need to iterate through the data
    # and check if the title or description contains the keyword.
    population_tables = [item for item in data if 'population' in item.get('titleEn', '').lower()]

    print(f"\nFound {len(population_tables)} tables related to 'population' (showing first 10):")
    for table in population_tables[:10]:
        print(f"Product ID: {table.get('productId')}, Title: {table.get('titleEn')}")

else:
    print(f"Error: API request failed with status code {response.status_code}")
    print(response.text)

Error: API request failed with status code 404
<!DOCTYPE html><!--[if lt IE 9]><html class="no-js lt-ie9" lang="en" dir="ltr"><![endif]--><!--[if gt IE 8]><!-->
<html class="no-js" lang="en" dir="ltr">
<!--<![endif]-->
<head>
<meta charset="utf-8">
<!-- Web Experience Toolkit (WET) / Bo√É¬Æte √É¬† outils de l'exp√É¬©rience Web (BOEW)
		wet-boew.github.io/wet-boew/License-en.html / wet-boew.github.io/wet-boew/Licence-fr.html -->
<title>Statistics Canada - We couldn&#x27;t find that Web page (Error 404) / Statistique Canada - Nous ne pouvons trouver cette page Web (Erreur 404)</title>
<meta content="width=device-width,initial-scale=1" name="viewport">
<!-- Meta data -->
<meta name="robots" content="noindex, nofollow, noarchive">
<!-- Meta data-->
<!--[if gte IE 9 | !IE ]><!-->
<link href="/wet-boew4b/assets/favicon.ico" rel="icon" type="image/x-icon">
<link rel="stylesheet" href="/wet-boew4b/css/wet-boew.min.css">
<!--<![endif]-->
<link rel="stylesheet" href="/wet-boew4b/css/theme-srv.mi

## üì• Step 3: Load a specific table
We‚Äôll load table `17100005` ‚Äî Population estimates on July 1st by age and sex.

In [10]:
# Load the table by product ID
df = sc.table_to_df("17100005")
df.head()

NameError: name 'sc' is not defined

## üßπ Step 4: Filter and explore the data
We‚Äôll look at population estimates for Ontario by age group and year.

In [None]:
# Filter for Ontario
ontario = df[df['GEO'] == 'Ontario']

# Group by year and age group
summary = ontario.groupby(['REF_DATE', 'Age group'])['VALUE'].sum().unstack()
summary.tail()

## üìä Step 5: Visualize population trends
We‚Äôll plot population by age group over time.

In [None]:
import matplotlib.pyplot as plt
summary.plot(figsize=(12, 6), title="Ontario Population by Age Group")
plt.ylabel("Population")
plt.xlabel("Year")
plt.grid(True)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.tight_layout()
plt.show()

## üìù Reflection Questions

1. What are the advantages of using an API over downloading CSVs manually?
2. How could this data be used in public health or education planning?
3. What other topics could you explore using the StatsCan API?
4. How would you automate this pipeline to update monthly or annually?