# Data Structures
`Cross-sectional` data is collected at a single point in time, capturing information across various subjects to analyze patterns and relationships among variables at that moment.

`Time-series data` is collected over different time periods for the same subject, allowing the tracking of changes and trends over time.

`Panel (or longitudinal) data` combines cross-sectional and time-series elements by collecting data on the same subjects across multiple time periods, enabling analysis of both time-based changes and cross-sectional differences.

## Memory usage
Define a function to calculate memory usage for DataFrames

In [10]:
# print memory usage of a dataframe in MB
def memory_usage(df):
    print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.3f} MB")

## Panel Data

### Download data

In [None]:
import pandas as pd
import urllib.parse

# This script demonstrates how to download a CSV file from the Statistics Canada website
# using the downloadDbLoadingData-nonTraduit.action endpoint.
base_url = "https://www150.statcan.gc.ca/t1/tbl1/en/dtl!downloadDbLoadingData-nonTraduit.action"
pid = "1410028701"
latestN =  ""
startDate = "20190101"
endDate = ""
csvLocale = "en"
selected_members = "[[1,2,3,4,5,6,7,8,9,10,11],[1,2,3,4,5,6,7,8,9],[1,2,3],[1,2,6,7],[1],[1,2]]"

#encode selected members
encoded_selected_members = urllib.parse.quote(selected_members,safe='')

# Construct the full URL
url = "{}?pid={}&latestN={}&startDate={}&endDate={}&csvLocale={}\
&selectedMembers={}".format(base_url,pid,latestN,startDate,endDate,
csvLocale,encoded_selected_members)

# Print the URL
print(url)

# Read the CSV data into a DataFrame
df_1 = pd.read_csv(url)

# Display the DataFrame and its memory usage
print(df_1.info())
print(df_1.shape)
memory_usage(df_1)
df_1.tail()

https://www150.statcan.gc.ca/t1/tbl1/en/dtl!downloadDbLoadingData-nonTraduit.action?pid=1410028701&latestN=&startDate=20190101&endDate=&csvLocale=en&selectedMembers=%5B%5B1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%5D%2C%5B1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%5D%2C%5B1%2C2%2C3%5D%2C%5B1%2C2%2C6%2C7%5D%2C%5B1%5D%2C%5B1%2C2%5D%5D


  df_1 = pd.read_csv(url)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170236 entries, 0 to 170235
Data columns (total 19 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   REF_DATE                      170236 non-null  object 
 1   GEO                           170236 non-null  object 
 2   DGUID                         170236 non-null  object 
 3   Labour force characteristics  170236 non-null  object 
 4   Gender                        170236 non-null  object 
 5   Age group                     170236 non-null  object 
 6   Statistics                    170236 non-null  object 
 7   Data type                     170236 non-null  object 
 8   UOM                           170236 non-null  object 
 9   UOM_ID                        170236 non-null  int64  
 10  SCALAR_FACTOR                 170236 non-null  object 
 11  SCALAR_ID                     170236 non-null  int64  
 12  VECTOR                        170236 non-nul

Unnamed: 0,REF_DATE,GEO,DGUID,Labour force characteristics,Gender,Age group,Statistics,Data type,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
170231,2024-09,British Columbia,2021A000259,Employment rate,Women+,55 years and over,Estimate,Unadjusted,Percent,239,units,0,v2066966,11.9.3.7.1.2,31.3,,,,1
170232,2024-10,British Columbia,2021A000259,Employment rate,Women+,55 years and over,Estimate,Unadjusted,Percent,239,units,0,v2066966,11.9.3.7.1.2,30.7,,,,1
170233,2024-11,British Columbia,2021A000259,Employment rate,Women+,55 years and over,Estimate,Unadjusted,Percent,239,units,0,v2066966,11.9.3.7.1.2,28.9,,,,1
170234,2024-12,British Columbia,2021A000259,Employment rate,Women+,55 years and over,Estimate,Unadjusted,Percent,239,units,0,v2066966,11.9.3.7.1.2,29.4,,,,1
170235,2025-01,British Columbia,2021A000259,Employment rate,Women+,55 years and over,Estimate,Unadjusted,Percent,239,units,0,v2066966,11.9.3.7.1.2,28.7,,,,1


### Remove extra columns

In [58]:
# drop extra columns
columns_to_drop = ['DGUID', 'COORDINATE', 'Statistics', 'UOM', 'SCALAR_FACTOR',
'UOM_ID','SCALAR_ID','VECTOR', 'STATUS', 'SYMBOL', 'TERMINATED', 'DECIMALS']

df_2 = df_1.drop(columns=[col for col in columns_to_drop if col in df_1.columns], axis=1)

# Display the DataFrame and its memory usage
memory_usage(df_2)
df_2.tail()

Memory usage: 60.661 MB


Unnamed: 0,REF_DATE,GEO,Labour force characteristics,Gender,Age group,Data type,VALUE
170231,2024-09,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,31.3
170232,2024-10,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,30.7
170233,2024-11,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,28.9
170234,2024-12,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,29.4
170235,2025-01,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,28.7


## Time Series

### Add date index
First step in creating a time series is to add a DatetimeIndex

In [59]:
# convert the 'REF_DATE' column to a datetime object
df_2['REF_DATE']=pd.to_datetime(df_2['REF_DATE'], format='%Y-%m')

# set the 'REF_DATE' column as the index
df_2.set_index('REF_DATE', inplace=True)

# convert the index to a period index
df_2.index = df_2.index.to_period('M')

# Display the DataFrame and its memory usage
memory_usage(df_2)
df_2.tail()


Memory usage: 52.868 MB


Unnamed: 0_level_0,GEO,Labour force characteristics,Gender,Age group,Data type,VALUE
REF_DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-09,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,31.3
2024-10,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,30.7
2024-11,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,28.9
2024-12,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,29.4
2025-01,British Columbia,Employment rate,Women+,55 years and over,Unadjusted,28.7


### Pivot Table
* Each column in the following table is a time series.
* Pivot table transforms the dataset from the long to wide format and create columns index.

In [63]:
# pivot from long to wide format
df_3 = df_2.pivot_table(index=['REF_DATE'],
columns=['Data type','GEO','Gender','Age group','Labour force characteristics'], 
values='VALUE')

# Display the DataFrame and its memory usage
memory_usage(df_3)
print(df_3.info())
print(df_3.shape)
df_3.tail()

Memory usage: 1.299 MB
<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 73 entries, 2019-01 to 2025-01
Freq: M
Columns: 2332 entries, ('Seasonally adjusted', 'Alberta', 'Men+', '15 to 24 years', 'Employment') to ('Unadjusted', 'Saskatchewan', 'Women+', '55 years and over', 'Unemployment rate')
dtypes: float64(2332)
memory usage: 1.3 MB
None
(73, 2332)


Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
GEO,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,...,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan
Gender,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,...,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+
Age group,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over
Labour force characteristics,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate,Employment,...,Unemployment rate,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate
REF_DATE,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5,Unnamed: 15_level_5,Unnamed: 16_level_5,Unnamed: 17_level_5,Unnamed: 18_level_5,Unnamed: 19_level_5,Unnamed: 20_level_5,Unnamed: 21_level_5
2024-09,165.9,53.8,106.1,196.6,59.9,63.8,308.3,30.7,15.6,1343.3,...,4.7,58.1,33.1,42.1,59.3,16.1,33.8,175.3,1.2,2.0
2024-10,168.1,54.3,106.4,203.9,61.7,65.8,309.7,35.7,17.5,1349.5,...,4.1,56.7,32.3,39.6,58.1,17.1,33.1,175.5,1.4,2.4
2024-11,174.0,55.9,111.2,202.3,62.7,65.0,311.0,28.3,14.0,1364.1,...,3.3,56.4,32.1,39.9,58.1,16.5,33.1,175.6,1.6,2.8
2024-12,174.4,55.9,109.8,206.7,64.6,66.2,312.2,32.3,15.6,1377.3,...,4.2,56.4,32.1,37.9,57.4,18.5,32.7,175.7,0.9,1.6
2025-01,175.7,56.1,106.4,207.5,69.3,66.2,313.3,31.8,15.3,1378.4,...,3.8,54.7,31.1,35.2,56.2,19.5,31.9,175.9,1.5,2.7


In [64]:
# for convenience
df = df_3

### MultiIndex columns

In [65]:
# columns is a multiindex
print(type(df.columns),'\n')
for i in range(0, len(df.columns.names)):
    print(f'Level {i}: {df.columns.levels[i]}\n')

<class 'pandas.core.indexes.multi.MultiIndex'> 

Level 0: Index(['Seasonally adjusted', 'Unadjusted'], dtype='object', name='Data type')

Level 1: Index(['Alberta', 'British Columbia', 'Canada', 'Manitoba', 'New Brunswick',
       'Newfoundland and Labrador', 'Nova Scotia', 'Ontario',
       'Prince Edward Island', 'Quebec', 'Saskatchewan'],
      dtype='object', name='GEO')

Level 2: Index(['Men+', 'Total - Gender', 'Women+'], dtype='object', name='Gender')

Level 3: Index(['15 to 24 years', '15 years and over', '25 to 54 years',
       '55 years and over'],
      dtype='object', name='Age group')

Level 4: Index(['Employment', 'Employment rate', 'Full-time employment', 'Labour force',
       'Part-time employment', 'Participation rate', 'Population',
       'Unemployment', 'Unemployment rate'],
      dtype='object', name='Labour force characteristics')



## Reshaping

### Stacking

In [66]:
#.stack() rotates the lowest level of the column MultiIndex to the row index
df.stack().tail()


  df.stack().tail()


Unnamed: 0_level_0,Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
Unnamed: 0_level_1,GEO,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,Alberta,...,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan
Unnamed: 0_level_2,Gender,Men+,Men+,Men+,Men+,Total - Gender,Total - Gender,Total - Gender,Total - Gender,Women+,Women+,...,Men+,Men+,Total - Gender,Total - Gender,Total - Gender,Total - Gender,Women+,Women+,Women+,Women+
Unnamed: 0_level_3,Age group,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over
REF_DATE,Labour force characteristics,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4,Unnamed: 22_level_4
2025-01,Part-time employment,69.3,151.6,48.0,,166.9,469.4,196.0,106.6,97.6,317.8,...,11.0,13.4,31.6,111.3,46.8,32.9,20.5,75.9,35.8,19.5
2025-01,Participation rate,66.2,73.9,93.1,44.8,65.0,68.8,87.7,39.4,63.6,63.6,...,92.6,44.3,57.7,66.1,89.2,37.9,58.2,61.8,85.6,31.9
2025-01,Population,313.3,2003.5,1069.1,621.1,608.1,4002.0,2112.5,1281.4,294.8,1998.5,...,238.1,164.9,148.7,958.0,468.5,340.8,71.1,477.3,230.4,175.9
2025-01,Unemployment,31.8,102.0,55.8,14.4,52.7,183.7,108.0,23.0,20.9,81.8,...,11.0,4.5,11.0,35.4,18.4,6.0,4.0,12.9,7.4,1.5
2025-01,Unemployment rate,15.3,6.9,5.6,5.2,13.3,6.7,5.8,4.6,11.1,6.4,...,5.0,6.2,12.8,5.6,4.4,4.6,9.7,4.4,3.8,2.7


In [41]:
#Stacking will move the lowest level of the column MultiIndex to the
# row index and creates a multilevel row index
df.stack().index

  df.stack().index


MultiIndex([('2019-01',           'Employment'),
            ('2019-01',      'Employment rate'),
            ('2019-01', 'Full-time employment'),
            ('2019-01',         'Labour force'),
            ('2019-01', 'Part-time employment'),
            ('2019-01',   'Participation rate'),
            ('2019-01',           'Population'),
            ('2019-01',         'Unemployment'),
            ('2019-01',    'Unemployment rate'),
            ('2019-02',           'Employment'),
            ...
            ('2024-12',    'Unemployment rate'),
            ('2025-01',           'Employment'),
            ('2025-01',      'Employment rate'),
            ('2025-01', 'Full-time employment'),
            ('2025-01',         'Labour force'),
            ('2025-01', 'Part-time employment'),
            ('2025-01',   'Participation rate'),
            ('2025-01',           'Population'),
            ('2025-01',         'Unemployment'),
            ('2025-01',    'Unemployment rate')],
   

In [10]:
# We can also pass in an argument to select the level we would like to stack
df.stack(level=2).tail()

  df.stack(level=2).tail()


Unnamed: 0_level_0,Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
Unnamed: 0_level_1,GEO,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,...,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan
Unnamed: 0_level_2,Age group,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over
Unnamed: 0_level_3,Labour force characteristics,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate,Employment,...,Unemployment rate,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate
REF_DATE,Gender,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4,Unnamed: 22_level_4
2024-12,Total - Gender,2693.3,53.9,1364.0,3137.6,1329.2,62.8,4993.3,444.4,14.2,20917.4,...,4.5,127.9,37.6,95.8,132.1,32.1,38.8,340.6,4.2,3.2
2024-12,Women+,1318.2,55.0,563.4,1502.7,754.8,62.7,2397.9,184.6,12.3,9858.0,...,4.2,56.4,32.1,37.9,57.4,18.5,32.7,175.7,0.9,1.6
2025-01,Men+,1399.7,53.8,831.6,1637.7,568.2,63.0,2599.4,237.9,14.5,11091.2,...,5.0,68.5,41.5,55.1,73.0,13.4,44.3,164.9,4.5,6.2
2025-01,Total - Gender,2723.8,54.5,1395.3,3153.6,1328.5,63.1,4999.6,429.8,13.6,20993.4,...,4.4,123.2,36.2,90.3,129.2,32.9,37.9,340.8,6.0,4.6
2025-01,Women+,1324.0,55.2,563.7,1515.9,760.3,63.2,2400.3,191.9,12.7,9902.2,...,3.8,54.7,31.1,35.2,56.2,19.5,31.9,175.9,1.5,2.7


In [11]:
# multiple levels can be stacked at once
df.stack(level=[1,2]).tail()

  df.stack(level=[1,2]).tail()


Unnamed: 0_level_0,Unnamed: 1_level_0,Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
Unnamed: 0_level_1,Unnamed: 1_level_1,Age group,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over
Unnamed: 0_level_2,Unnamed: 1_level_2,Labour force characteristics,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate,Employment,...,Unemployment rate,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate
REF_DATE,GEO,Gender,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3
2025-01,Canada,Total - Gender,2723.8,54.5,1395.3,3153.6,1328.5,63.1,4999.6,429.8,13.6,20993.4,...,6.1,4309.9,33.9,3304.8,4564.5,1005.2,35.9,12709.0,254.5,5.6
2025-01,Canada,Women+,1324.0,55.2,563.7,1515.9,760.3,63.2,2400.3,191.9,12.7,9902.2,...,5.5,1960.5,29.5,1365.8,2057.9,594.8,31.0,6644.2,97.4,4.7
2025-01,Saskatchewan,Men+,42.0,54.1,30.4,48.6,11.6,62.6,77.6,6.7,13.8,324.0,...,5.0,68.5,41.5,55.1,73.0,13.4,44.3,164.9,4.5,6.2
2025-01,Saskatchewan,Total - Gender,81.4,54.7,49.9,92.2,31.5,62.0,148.7,10.8,11.7,606.4,...,4.4,123.2,36.2,90.3,129.2,32.9,37.9,340.8,6.0,4.6
2025-01,Saskatchewan,Women+,39.4,55.4,19.5,43.5,19.9,61.2,71.1,4.1,9.4,282.4,...,3.8,54.7,31.1,35.2,56.2,19.5,31.9,175.9,1.5,2.7


### Transpose
This is similar to Statistics Canada beyond 20/20 format

In [None]:
# transpose the data to make it easier to view
df.transpose()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,REF_DATE,2019-01,2019-02,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,...,2024-04,2024-05,2024-06,2024-07,2024-08,2024-09,2024-10,2024-11,2024-12,2025-01
Data type,GEO,Gender,Age group,Labour force characteristics,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
Seasonally adjusted,Canada,Men+,15 to 24 years,Employment,1332.8,1332.1,1323.1,1363.0,1318.5,1313.1,1305.9,1300.7,1283.5,1302.7,...,1395.1,1354.9,1352.4,1346.5,1355.4,1368.0,1379.2,1379.7,1375.1,1399.7
Seasonally adjusted,Canada,Men+,15 to 24 years,Employment rate,58.3,58.3,57.9,59.6,57.7,57.4,57.1,56.8,56.0,56.8,...,55.7,53.7,53.3,52.7,52.8,53.1,53.4,53.3,53.0,53.8
Seasonally adjusted,Canada,Men+,15 to 24 years,Full-time employment,782.6,783.3,771.9,796.4,772.7,770.0,753.1,762.3,737.1,752.9,...,827.9,760.4,793.9,781.2,769.3,780.0,795.5,797.6,800.7,831.6
Seasonally adjusted,Canada,Men+,15 to 24 years,Labour force,1502.6,1513.3,1506.0,1525.5,1486.2,1486.9,1495.3,1495.4,1488.4,1496.2,...,1607.6,1572.7,1588.8,1600.1,1613.5,1615.2,1615.8,1635.3,1634.9,1637.7
Seasonally adjusted,Canada,Men+,15 to 24 years,Part-time employment,550.2,548.9,551.1,566.6,545.8,543.1,552.7,538.4,546.4,549.7,...,567.2,594.5,558.4,565.3,586.2,588.0,583.7,582.0,574.5,568.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Unadjusted,Saskatchewan,Women+,55 years and over,Part-time employment,19.0,18.9,19.3,20.2,17.2,17.0,16.8,17.2,18.9,18.3,...,17.9,17.7,17.2,15.3,15.7,16.1,17.1,16.5,18.5,19.5
Unadjusted,Saskatchewan,Women+,55 years and over,Participation rate,35.9,35.4,36.8,37.3,36.1,36.7,36.7,37.8,37.7,37.3,...,32.6,33.0,33.5,32.3,32.3,33.8,33.1,33.1,32.7,31.9
Unadjusted,Saskatchewan,Women+,55 years and over,Population,162.5,162.7,163.0,163.3,163.6,163.9,164.3,164.5,164.8,165.1,...,174.3,174.5,174.7,174.9,175.1,175.3,175.5,175.6,175.7,175.9
Unadjusted,Saskatchewan,Women+,55 years and over,Unemployment,2.2,2.0,2.3,2.3,2.4,2.3,3.6,4.8,2.0,1.5,...,1.4,1.2,1.6,2.0,1.9,1.2,1.4,1.6,0.9,1.5


## Selecting Data

### Selecting by loc

In [77]:
# select rows by index label
df.loc['2023':'2024']

Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
GEO,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,Canada,...,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan,Saskatchewan
Gender,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,...,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+
Age group,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over
Labour force characteristics,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate,Employment,...,Unemployment rate,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate
REF_DATE,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5,Unnamed: 15_level_5,Unnamed: 16_level_5,Unnamed: 17_level_5,Unnamed: 18_level_5,Unnamed: 19_level_5,Unnamed: 20_level_5,Unnamed: 21_level_5
2023-01-01,1339.3,57.9,763.5,1500.3,575.8,64.8,2314.4,160.9,10.7,10557.1,...,3.1,62.1,36.1,42.2,62.9,19.9,36.6,171.8,0.8,1.3
2023-02-01,1349.2,58.1,777.5,1514.8,571.7,65.2,2323.3,165.6,10.9,10571.5,...,3.4,60.0,34.9,40.2,61.8,19.7,36.0,171.9,1.8,2.9
2023-03-01,1366.5,58.6,786.3,1517.4,580.2,65.1,2332.5,150.9,9.9,10612.8,...,3.6,59.7,34.7,40.6,61.4,19.2,35.7,172.1,1.7,2.8
2023-04-01,1370.0,58.5,785.0,1533.9,585.0,65.5,2341.0,163.9,10.7,10640.8,...,3.9,58.8,34.1,39.9,60.8,18.9,35.3,172.3,2.0,3.3
2023-05-01,1342.1,57.1,777.7,1518.6,564.4,64.6,2350.3,176.5,11.6,10647.9,...,2.8,59.7,34.6,38.1,61.0,21.7,35.4,172.4,1.3,2.1
2023-06-01,1364.8,57.8,779.3,1549.6,585.5,65.6,2360.9,184.8,11.9,10720.3,...,3.0,59.0,34.2,40.8,60.8,18.2,35.2,172.6,1.9,3.1
2023-07-01,1383.1,58.3,797.4,1544.9,585.6,65.1,2373.0,161.8,10.5,10714.8,...,4.3,53.1,30.7,36.8,55.0,16.3,31.8,172.9,1.9,3.5
2023-08-01,1364.7,57.2,804.7,1554.0,560.0,65.1,2386.9,189.2,12.2,10745.8,...,6.7,52.3,30.2,35.7,55.7,16.6,32.2,173.1,3.3,5.9
2023-09-01,1364.4,56.8,783.9,1540.1,580.5,64.2,2400.3,175.7,11.4,10771.8,...,3.7,57.6,33.2,38.8,59.3,18.8,34.2,173.3,1.7,2.9
2023-10-01,1368.0,56.7,774.0,1552.7,594.0,64.3,2414.7,184.7,11.9,10803.8,...,3.2,58.4,33.7,38.6,59.6,19.7,34.4,173.4,1.2,2.0


In [79]:
# suubset the data on rows and a single column
df.loc['2024', ('Seasonally adjusted', 'Canada','Total - Gender', 
                '15 years and over','Employment')]

REF_DATE
2024-01-01    20577.1
2024-02-01    20607.7
2024-03-01    20614.5
2024-04-01    20700.5
2024-05-01    20698.3
2024-06-01    20715.9
2024-07-01    20712.9
2024-08-01    20742.6
2024-09-01    20779.3
2024-10-01    20782.6
2024-11-01    20826.4
2024-12-01    20917.4
Name: (Seasonally adjusted, Canada, Total - Gender, 15 years and over, Employment), dtype: float64

In [None]:
# Header helps to know the selected time series
df.loc['2024', [('Seasonally adjusted', 'Canada', 'Employment',
                'Men+', '15 years and over')]]

Data type,Seasonally adjusted
GEO,Canada
Labour force characteristics,Employment
Gender,Men+
Age group,15 years and over
REF_DATE,Unnamed: 1_level_5
2024-01-01,10857.7
2024-02-01,10876.6
2024-03-01,10867.5
2024-04-01,10944.0
2024-05-01,10921.6
2024-06-01,10917.3
2024-07-01,10923.9
2024-08-01,10932.8
2024-09-01,10950.4
2024-10-01,10964.1


In [57]:
# Selecting two columms
df.loc['2024', [('Seasonally adjusted', 'Canada', 'Employment',
                'Men+', '15 years and over'),
                ('Seasonally adjusted', 'Canada', 'Employment',
                'Women+', '15 years and over')]]

Data type,Seasonally adjusted,Seasonally adjusted
GEO,Canada,Canada
Labour force characteristics,Employment,Employment
Gender,Men+,Women+
Age group,15 years and over,15 years and over
REF_DATE,Unnamed: 1_level_5,Unnamed: 2_level_5
2024-01-01,10857.7,9719.4
2024-02-01,10876.6,9731.1
2024-03-01,10867.5,9747.0
2024-04-01,10944.0,9756.5
2024-05-01,10921.6,9776.7
2024-06-01,10917.3,9798.6
2024-07-01,10923.9,9789.0
2024-08-01,10932.8,9809.8
2024-09-01,10950.4,9828.9
2024-10-01,10964.1,9818.5


In [None]:
# A practical way to select columns is to create a set of conditions
# eaach condition can use bitwise operators to combine multiple conditions
cond_1 = df.columns.get_level_values('GEO') == 'Canada'
cond_2 = (df.columns.get_level_values('Labour force characteristics') == 'Employment') | \
(df.columns.get_level_values('Labour force characteristics') == 'Unemployment')
cond_3 = df.columns.get_level_values('Data type') == 'Unadjusted'
cond_4 = df.columns.get_level_values('Age group') == '15 years and over'
cond_5 = df.columns.get_level_values('Gender') == 'Total - Gender'

#use loc to select the data
df.loc['2024', cond_1 & cond_2 & cond_3 & cond_4 & cond_5]


Data type,Unadjusted,Unadjusted
GEO,Canada,Canada
Labour force characteristics,Employment,Unemployment
Gender,Total - Gender,Total - Gender
Age group,15 years and over,15 years and over
REF_DATE,Unnamed: 1_level_5,Unnamed: 2_level_5
2024-01-01,20246.6,1314.5
2024-02-01,20369.9,1309.7
2024-03-01,20411.5,1407.6
2024-04-01,20583.5,1370.4
2024-05-01,20925.4,1399.6
2024-06-01,21048.5,1383.4
2024-07-01,20915.6,1505.9
2024-08-01,20887.4,1682.5
2024-09-01,20844.0,1334.2
2024-10-01,20861.8,1327.7


### Selecting by xs

In [81]:
# cross-sections can be used to select data at a specific level of a MultiIndex
# one value for each level
df.xs(level='GEO', key='Canada', axis=1).tail()

Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
Gender,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,Men+,...,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+,Women+
Age group,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over,55 years and over
Labour force characteristics,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate,Employment,...,Unemployment rate,Employment,Employment rate,Full-time employment,Labour force,Part-time employment,Participation rate,Population,Unemployment,Unemployment rate
REF_DATE,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4
2024-09-01,1368.0,53.1,780.0,1615.2,588.0,62.7,2577.4,247.2,15.3,10950.4,...,5.1,1973.5,29.9,1411.3,2043.5,562.2,30.9,6606.2,69.9,3.4
2024-10-01,1379.2,53.4,795.5,1615.8,583.7,62.5,2584.8,236.6,14.6,10964.1,...,5.3,1971.3,29.8,1408.5,2051.0,562.9,31.0,6616.3,79.7,3.9
2024-11-01,1379.7,53.3,797.6,1635.3,582.0,63.1,2590.2,255.6,15.6,10998.7,...,5.4,1959.6,29.6,1397.8,2056.4,561.8,31.0,6625.6,96.7,4.7
2024-12-01,1375.1,53.0,800.7,1634.9,574.5,63.0,2595.4,259.8,15.9,11059.3,...,4.9,1979.3,29.8,1389.9,2078.8,589.4,31.3,6634.6,99.4,4.8
2025-01-01,1399.7,53.8,831.6,1637.7,568.2,63.0,2599.4,237.9,14.5,11091.2,...,5.5,1960.5,29.5,1365.8,2057.9,594.8,31.0,6644.2,97.4,4.7


In [82]:
# we can filter on multiple levels
df.xs(level=('GEO', 
'Labour force characteristics'), key=('Canada', 'Employment'), axis=1)

Data type,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,Seasonally adjusted,...,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted,Unadjusted
Gender,Men+,Men+,Men+,Men+,Total - Gender,Total - Gender,Total - Gender,Total - Gender,Women+,Women+,...,Men+,Men+,Total - Gender,Total - Gender,Total - Gender,Total - Gender,Women+,Women+,Women+,Women+
Age group,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,...,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over,15 to 24 years,15 years and over,25 to 54 years,55 years and over
REF_DATE,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2019-01-01,1332.8,9956.3,6369.9,2253.7,2604.5,18920.5,12246.5,4069.5,1271.8,8964.2,...,6268.3,2208.6,2427.6,18579.8,12134.3,4017.9,1199.0,8874.3,5866.0,1809.3
2019-02-01,1332.1,9963.0,6369.8,2261.0,2618.5,18965.7,12269.0,4078.2,1286.3,9002.7,...,6278.7,2224.9,2457.0,18673.8,12176.7,4040.1,1222.8,8936.0,5898.1,1815.1
2019-03-01,1323.1,9969.4,6376.1,2270.2,2598.6,18932.9,12234.6,4099.7,1275.6,8963.5,...,6293.9,2246.5,2459.1,18674.2,12144.5,4070.6,1224.3,8899.0,5850.6,1824.1
2019-04-01,1363.0,10034.8,6377.5,2294.3,2660.9,19053.7,12263.3,4129.6,1297.9,9019.0,...,6328.0,2275.2,2521.9,18875.6,12238.0,4115.7,1244.8,8995.4,5910.0,1840.6
2019-05-01,1318.5,10026.4,6412.8,2295.2,2617.6,19067.3,12311.6,4138.0,1299.2,9040.9,...,6445.1,2307.7,2712.0,19253.1,12385.3,4155.8,1333.6,9121.9,5940.2,1848.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-09-01,1368.0,10950.4,7217.9,2364.5,2686.2,20779.3,13765.5,4327.6,1318.2,9828.9,...,7281.5,2401.4,2597.2,20844.0,13872.0,4374.9,1268.1,9832.2,6590.5,1973.5
2024-10-01,1379.2,10964.1,7214.6,2370.4,2704.4,20782.6,13759.8,4318.4,1325.3,9818.5,...,7260.6,2413.4,2630.2,20861.8,13847.0,4384.7,1277.2,9834.9,6586.3,1971.3
2024-11-01,1379.7,10998.7,7247.0,2372.0,2700.8,20826.4,13801.4,4324.2,1321.1,9827.7,...,7275.4,2388.1,2598.2,20828.6,13882.7,4347.8,1272.3,9839.2,6607.3,1959.6
2024-12-01,1375.1,11059.3,7277.4,2406.8,2693.3,20917.4,13842.4,4381.7,1318.2,9858.0,...,7255.5,2390.5,2615.8,20843.6,13858.0,4369.8,1283.6,9865.5,6602.6,1979.3
