## Investigative focus: How AI will reshape Canadian labour markets

Team member: kevin801@my.yorku.ca

Dataset: Canadian wages [datasets](https://open.canada.ca/data/en/dataset/adad580f-76b0-4502-bd05-20c125de9116) (2012–2024)

Objective:

You are working with Canadian wages datasets available in this workspace. Based on the data, the investigative focus is to explore and present:

• How AI is expected to reshape the job market, by industry and region
• Demographic impacts of AI adoption (e.g., how race, gender, income level, and geography influence access to emerging opportunities)
• Current trends in workforce diversity, wage equity, and skill gaps
• Which groups are most likely to benefit from AI, and which face barriers to participation in the evolving economy

Guidelines and approach suggestions:

- Use the wage datasets from 2012 to 2024 (not all years required) and any supplementary datasets you deem useful (labour force surveys, Census, job posting APIs, occupational task data).
- Explore changes over time by industry and region (province/territory, urban/rural). Identify industries with high automation/AI exposure and those creating new opportunities.
- Analyze demographic differentials (gender, race/ethnicity if available, age, education, income brackets). Note where data gaps exist and propose methods to address them.
- Examine workforce diversity and wage equity trends: compute median wages by group, Gini or Theil indices by subgroup, representation in high-growth occupations.
- Identify skill gaps by comparing required skills (tech, soft skills) vs. current workforce credentials. Suggest reskilling pathways and policy interventions.
- Provide clear visualizations and concise, actionable recommendations for policymakers, educators, businesses, and community groups.

Deliverables:

1. A descriptive summary file `INVESTIGATIVE_FOCUS.md` (also in the repo).
2. Updated `notebook.ipynb` with exploratory data analysis, visualizations, and initial findings.
3. Optional: additional data download scripts or helper modules to fetch supplementary datasets.

Notes:
- If race/ethnicity is not present in datasets, document the gap and propose proxies or complementary sources (e.g., Census data at geographic granularity).
- Be explicit about assumptions (e.g., mapping NAICS to automation exposure) and document all data transformations.
- Where appropriate, provide reproducible code cells in `notebook.ipynb` and brief explanations next to visuals.

End of brief.

### Midterm report - Instructions

The first step of the project involves understanding the investigative focus and determining how you will develop a solution. There are multiple datasets with multiple features, but you are only required to use those relevant to your solution. Be sure to explain your desired approach, along with other important preliminary details, in your midterm project report.

You need to have a clear, precise understanding of the problem that you have chosen, to begin constructing the proposed solution.

A few very important factors to consider when writing this midterm report (project proposal) are:

● Feasibility
Can your team accomplish this given your current skills and the skills you will be learning throughout the program?
Can your team complete this in the time frame of the program?

● Impact
Does the proposal directly address the need(s) identified in the previous step?
Will the final product provide a material improvement over existing solutions/operations in terms of efficiency, accuracy, etc.?

● Usability
Will the final product be integrable into existing systems and/or workflows?
Is the final product easy for one to familiarize oneself with and use?

Add this text to your midterm report and expand where relevant to your proposed approach.

## Problem (5 marks)

- What problem are we addressing?
  - Answer: Analyze how AI adoption is reshaping Canadian labour markets by industry and region, with emphasis on wage dynamics, equity, and access to emerging opportunities.
- What research questions will we answer with the data?
  - Answer:
    - How have wages (low/median/high/average) evolved from 2012–2024 by province/territory and by industry/occupation?
    - Which industries and regions show the strongest AI exposure and corresponding wage/participation changes?
    - Which demographic groups appear to benefit most, and which face barriers (supplement with Census where needed)?
    - Where are the largest skill gaps relative to AI-related demand?

## Impact (5 marks)

- If we answer the proposed questions, how will the institution be impacted?
  - Answer:
    - Identify sectors/regions needing targeted reskilling and inclusive hiring programs.
    - Highlight wage equity gaps and underrepresented groups in AI-augmented roles.
    - Guide funding and curriculum updates for in-demand digital and complementary skills.
    - Support proactive labour market planning under accelerated automation/augmentation.

## Data (15 marks)

- Approximately how large is the dataset we'll analyze?
  - Answer: Multiple annual CSVs (2012–2024). We'll start with 2020–2024 for timeliness and expand as needed.
- How was the data collected, and does collection affect analysis? (5)
  - Answer: Government of Canada labour market/wage datasets (official administrative and survey sources). Methodology/classification updates (e.g., NAICS/NOC) may affect comparability year-over-year.
- What is the data cleanliness (missing values) and what cleaning/transformations will we do? (2+3)
  - Answer: There are many missing values for low, median, high, and average wages with the note ‘Due to data limitations… refer to provincial level’. We will (i) flag these, (ii) backfill from provincial aggregates when appropriate, and (iii) document imputation rules.
- Which variables are not present but would be nice to have? (5)
  - Answer:
    - First and third quartiles (Q1/Q3) across all years.
    - Type of technology used in workplaces (to distinguish pre-AI tech vs. AI).
    - Average salary per industry consistently for 2012–2024 (currently robust only for 2022–2024).

## Methods — Variables and Visualizations (25 marks)

- Which existing variables are most relevant? (5)
  - Answer:
    - Year; Province/Territory; Region; Industry/Sector (NAICS) and/or Occupation (NOC); Wages (low, median, high, average); optional employment counts/hours and demographics if available or joined.
- Will we need to create new variables? Which ones? (5)
  - Answer:
    - Real (CPI-adjusted) wages; wage dispersion measures (Q1/Q3 estimates, IQR, top/bottom ratios); AI exposure index by occupation/industry (joined from task-based exposure studies); aggregation levels (metro/provincial; industry groupings).
- What visualizations and summary statistics will we use to analyze each variable? (5)
  - Answer:
    - Time series of median wages by industry/region; distributions (hist/box/violin); missingness heatmap; YoY deltas, CAGR, dispersion summaries.
- What visualizations and statistics will we use to analyze relationships? (5)
  - Answer:
    - Heatmaps of wage change vs. AI exposure by industry/region; small multiples across provinces; regression or partial correlations (as exploratory).
- Pick one visualization and explain how it helps answer the question. (5)
  - Answer:
    - Heatmap of 2020–2024 real median wage change by industry (rows) and province (columns), annotated with industry AI-exposure terciles — reveals alignment/mismatch between exposure and wage outcomes and surfaces regional disparities for targeted policy.

## Concerns (5 marks)

- What concerns do we have about the integrity of the dataset?
  - Answer: Missing wages and changing classifications may bias trends; we'll track provenance and note breaks.
- What are potential drawbacks of the analysis and visualizations proposed?
  - Answer: Imputation/backfilling introduces uncertainty; we'll sensitivity-test with and without imputations.
- Any additional concerns about the overall analysis?
  - Answer: Equity insights may require supplemental demographic data (e.g., Census); we'll separate descriptive vs. inferential claims and avoid over-attribution to AI without corroborating evidence.