# Housing Crisis in Canada

## Overview
This project aims to analyze the housing crisis in Canada, exploring various factors contributing to the crisis and potential solutions.

## Installation
To run this project, ensure you have Python installed on your system. You'll also need to install the following Python libraries:
- NumPy
- pandas

You can install these libraries via pip:

```bash
pip install numpy pandas
```

## Usage
To use this project, follow these steps:
1. Clone this repository to your local machine.
2. Install the required dependencies as mentioned in the Installation section.
3. Open the Jupyter Notebook (or Python script) containing the analysis.
4. Execute the code cells to perform the analysis and visualize the results.

## Data
This analysis utilizes two datasets downloaded from Government of Canada's open data portal, related to the housing crisis in Canada. Please download the following datasets and place them in the appropriate directory:
1. [immigrant-status-and-period-of-immigration-e](https://open.canada.ca/data/en/dataset/9adddd8a-e15b-497c-86af-641457a78bea/resource/255012de-7f8a-4f5e-b62a-bc438dc89543)
2. [18100205](https://open.canada.ca/data/en/dataset/324befd1-893b-42e6-bece-6d30af3dd9f1)

## Analysis
The analysis includes the following steps:
- Data preprocessing
- Exploratory data analysis
- Statistical analysis

## Results
The key findings from the analysis will be summarized here.

## Future Work
Potential future enhancements or additional analyses for this project may include:
- Incorporating more datasets for a comprehensive analysis
- Building predictive models to forecast housing trends
- Exploring policy recommendations to address the housing crisis

## Contributors
- Muhammad Hunain Muneer

## License
MIT License

## Contact
For questions, feedback, or contributions, please contact hunain.muneer1995@gmail.com


###  Loading the two datasets using the pandas data frames

In [215]:
# Loading the housing data

import pandas as pd
import os

# Construct the file path relative to the notebook's directory
file_path = os.path.abspath("../data/raw/housing-index/18100205.csv")

try:
    housing_data = pd.read_csv(file_path)
    # Proceed with further processing of the dataframe
except FileNotFoundError:
    print("Error: File not found. Please ensure the file path is correct.")

#checking the first few rows of the data
housing_data.head()

Unnamed: 0,REF_DATE,GEO,DGUID,New housing price indexes,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,1981-01,Canada,2016A000011124,Total (house and land),"Index, 201612=100",347,units,0,v111955442,1.1,38.2,,,,1
1,1981-01,Canada,2016A000011124,House only,"Index, 201612=100",347,units,0,v111955443,1.2,36.1,,,,1
2,1981-01,Canada,2016A000011124,Land only,"Index, 201612=100",347,units,0,v111955444,1.3,40.6,E,,,1
3,1981-01,Atlantic Region,2016A00011,Total (house and land),"Index, 201612=100",347,units,0,v111955445,2.1,,..,,,1
4,1981-01,Atlantic Region,2016A00011,House only,"Index, 201612=100",347,units,0,v111955446,2.2,,..,,,1


In [217]:
# Loading the immigration data

file_path_immigration = os.path.abspath("../data/raw/immigrant.csv")

try:
    # Try reading the CSV files with different encodings until successful
    immigration_data = pd.read_csv(file_path_immigration, encoding='latin1')
    # Proceed with further processing of the dataframes
except FileNotFoundError:
    print("Error: File not found. Please ensure the file paths are correct.")
except UnicodeDecodeError:
    print("Error: Unable to decode the file. Please check the file's encoding.")

#checking the first few rows of the data
immigration_data.head()

Unnamed: 0,"Table Title: Immigrant Status and Period of Immigration (6) for the Population Aged 15 Years and Over, in Private Households of Canada, Provinces, Territories and 11 selected Census Metropolitan Areas (CMA) (25), 2011 National Household Survey, Statistics Canada",Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,Immigrant status and period of immigration,Canada,Newfoundland and Labrador,Prince Edward Island,Nova Scotia / Nouvelle-Écosse,New Brunswick / Nouveau-Brunswick,Quebec,Ontario,Manitoba,Saskatchewan,...,Halifax CMA,Québec City CMA,Montréal CMA,Ottawa - Gatineau CMA,Toronto CMA,Winnipeg CMA,Regina CMA,Calgary CMA,Edmonton CMA,Vancouver CMA
1,Total - Immigrant status and period of immigra...,27259525,431045,114195,768060,622440,6474590,10473665,946940,812505,...,325050,634200,3120060,1005005,4546140,590295,170070,976570,935285,1926230
2,Non-immigrants,20543700,421170,107085,717140,593840,5511745,6912395,774365,744685,...,292010,602990,2278110,773525,2047970,451125,149025,665410,698240,1009070
3,Immigrants,6398855,8315,6415,44660,25890,902990,3442895,165005,60500,...,28685,28605,789440,220640,2416425,133380,19055,290760,216460,870035
4,Before 1996,3837770,5415,3455,28690,16530,496790,2148600,90000,29120,...,16775,11985,439575,130700,1408665,74005,9415,145920,123735,486590


**-> housing_data is loaded in correct format but the immigration_data has to be changed to horizontal data from vertical**

**-> We want to have the areas/province/metropolitan areas in the same column and can have same data type for the columns in our dataset. Currently we have string and numerical data in the same column as show above**

In [222]:
# Converting data form horizontal view to vertical view so all the areas are in the same column
immigration_data = immigration_data.transpose()
immigration_data.head()

Unnamed: 0,0,1,2,3,4,5,6
"Table Title: Immigrant Status and Period of Immigration (6) for the Population Aged 15 Years and Over, in Private Households of Canada, Provinces, Territories and 11 selected Census Metropolitan Areas (CMA) (25), 2011 National Household Survey, Statistics Canada",Immigrant status and period of immigration,Total - Immigrant status and period of immigra...,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011
Unnamed: 1,Canada,27259525,20543700,6398855,3837770,1620885,940195
Unnamed: 2,Newfoundland and Labrador,431045,421170,8315,5415,1215,1690
Unnamed: 3,Prince Edward Island,114195,107085,6415,3455,920,2035
Unnamed: 4,Nova Scotia / Nouvelle-Écosse,768060,717140,44660,28690,7485,8485


In [223]:
# The index is not correct so we will drop it from the dataset
# We will use reset_index() method to drop the index and we want to change the dataset itself and will set the inplace parameter to True

immigration_data.reset_index(drop= True, inplace= True)
immigration_data.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Immigrant status and period of immigration,Total - Immigrant status and period of immigra...,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011
1,Canada,27259525,20543700,6398855,3837770,1620885,940195
2,Newfoundland and Labrador,431045,421170,8315,5415,1215,1690
3,Prince Edward Island,114195,107085,6415,3455,920,2035
4,Nova Scotia / Nouvelle-Écosse,768060,717140,44660,28690,7485,8485


In [224]:
# The column in our dataset is wrong as we want the first row to be our columns
# Current columns

immigration_data.columns

RangeIndex(start=0, stop=7, step=1)

In [226]:
# Setting the frist row as the columns of the dataframe

immigration_data.columns = immigration_data.iloc[0]
immigration_data.head()

Unnamed: 0,Immigrant status and period of immigration,Total - Immigrant status and period of immigration,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011
0,Immigrant status and period of immigration,Total - Immigrant status and period of immigra...,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011
1,Canada,27259525,20543700,6398855,3837770,1620885,940195
2,Newfoundland and Labrador,431045,421170,8315,5415,1215,1690
3,Prince Edward Island,114195,107085,6415,3455,920,2035
4,Nova Scotia / Nouvelle-Écosse,768060,717140,44660,28690,7485,8485


In [227]:
# Removing the first row from the dataset since we have that as our columns

immigration_data = immigration_data.iloc[1:]
immigration_data.head()

Unnamed: 0,Immigrant status and period of immigration,Total - Immigrant status and period of immigration,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011
1,Canada,27259525,20543700,6398855,3837770,1620885,940195
2,Newfoundland and Labrador,431045,421170,8315,5415,1215,1690
3,Prince Edward Island,114195,107085,6415,3455,920,2035
4,Nova Scotia / Nouvelle-Écosse,768060,717140,44660,28690,7485,8485
5,New Brunswick / Nouveau-Brunswick,622440,593840,25890,16530,3970,5395


In [228]:
# checking the columns of the immigration dataset

immigration_data.columns

Index(['Immigrant status and period of immigration',
       'Total - Immigrant status and period of immigration',
       '  Non-immigrants', '  Immigrants', '    Before 1996',
       '    1996 to 2005', '    2006 to 2011'],
      dtype='object', name=0)

### Both of the datasets (housing_date & immigration_data) are in correct format

**Lets look at the columns of the datasets**

In [234]:
# columns of the housing_data

housing_data.columns

Index(['REF_DATE', 'GEO', 'DGUID', 'New housing price indexes', 'UOM',
       'UOM_ID', 'SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR', 'COORDINATE', 'VALUE',
       'STATUS', 'SYMBOL', 'TERMINATED', 'DECIMALS'],
      dtype='object')

In [230]:
# columns of the immigration_data

immigration_data.columns

Index(['Immigrant status and period of immigration',
       'Total - Immigrant status and period of immigration',
       '  Non-immigrants', '  Immigrants', '    Before 1996',
       '    1996 to 2005', '    2006 to 2011'],
      dtype='object', name=0)

In [232]:
# immigration_date has some issues
# Notice there are leading and trailing white spaces in the names of our columns which will make it difficult to access them
# Removing the leading and trailing white spaces

immigration_data.columns = immigration_data.columns.str.strip()
immigration_data.columns

Index(['Immigrant status and period of immigration',
       'Total - Immigrant status and period of immigration', 'Non-immigrants',
       'Immigrants', 'Before 1996', '1996 to 2005', '2006 to 2011'],
      dtype='object', name=0)

### Joining both datasets

**In order to join the two datasets we need a similar column/s**

In [236]:
# Adding a new column (area_code) to both datasets that will be an abbrevation of the Area
# e.g. Ontario -> ON , Manitoba -> MB

# Lets check the different areas in our housing_data dataset

housing_data['GEO'].unique()

## ADD CODE...................

array(['Canada', 'Atlantic Region', 'Newfoundland and Labrador',
       "St. John's, Newfoundland and Labrador", 'Prince Edward Island',
       'Charlottetown, Prince Edward Island', 'Nova Scotia',
       'Halifax, Nova Scotia', 'New Brunswick',
       'Saint John, Fredericton, and Moncton, New Brunswick', 'Quebec',
       'Québec, Quebec', 'Sherbrooke, Quebec', 'Trois-Rivières, Quebec',
       'Montréal, Quebec', 'Ottawa-Gatineau, Quebec part, Ontario/Quebec',
       'Ontario', 'Ottawa-Gatineau, Ontario part, Ontario/Quebec',
       'Oshawa, Ontario', 'Toronto, Ontario', 'Hamilton, Ontario',
       'St. Catharines-Niagara, Ontario',
       'Kitchener-Cambridge-Waterloo, Ontario', 'Guelph, Ontario',
       'London, Ontario', 'Windsor, Ontario', 'Greater Sudbury, Ontario',
       'Prairie Region', 'Manitoba', 'Winnipeg, Manitoba', 'Saskatchewan',
       'Regina, Saskatchewan', 'Saskatoon, Saskatchewan', 'Alberta',
       'Calgary, Alberta', 'Edmonton, Alberta', 'British Columbia',
     

In [243]:
# Getting rid of the white spaces in the column and checking unique values

immigration_data['Immigrant status and period of immigration'] = immigration_data['Immigrant status and period of immigration'].str.strip()
immigration_data['Immigrant status and period of immigration'].unique()

array(['Canada', 'Newfoundland and Labrador', 'Prince Edward Island',
       'Nova Scotia / Nouvelle-Écosse',
       'New Brunswick / Nouveau-Brunswick', 'Quebec', 'Ontario',
       'Manitoba', 'Saskatchewan', 'Alberta', 'British Columbia', 'Yukon',
       'Northwest Territories', 'Nunavut', "St. John's CMA",
       'Halifax CMA', 'Québec City CMA', 'Montréal CMA',
       'Ottawa - Gatineau CMA', 'Toronto CMA', 'Winnipeg CMA',
       'Regina CMA', 'Calgary CMA', 'Edmonton CMA', 'Vancouver CMA'],
      dtype=object)

In [244]:
area_code = {'Canada':'CAN', 'Newfoundland and Labrador':'NL', 'Prince Edward Island':'PE', 'Nova Scotia / Nouvelle-Écosse':'NS', 'New Brunswick / Nouveau-Brunswick':'NB', 'Quebec':'QC', 'Ontario':'ON', 'Manitoba':'MB', 'Saskatchewan':'SK', 'Alberta':'AB', 'British Columbia':'BC', 'Yukon':'YT', 'Northwest Territories':'NT', 'Nunavut':'NU'}

immigration_data['area_code'] = immigration_data['Immigrant status and period of immigration'].map(area_code)
immigration_data


Unnamed: 0,Immigrant status and period of immigration,Total - Immigrant status and period of immigration,Non-immigrants,Immigrants,Before 1996,1996 to 2005,2006 to 2011,area_code
1,Canada,27259525,20543700,6398855,3837770,1620885,940195,CAN
2,Newfoundland and Labrador,431045,421170,8315,5415,1215,1690,NL
3,Prince Edward Island,114195,107085,6415,3455,920,2035,PE
4,Nova Scotia / Nouvelle-Écosse,768060,717140,44660,28690,7485,8485,NS
5,New Brunswick / Nouveau-Brunswick,622440,593840,25890,16530,3970,5395,NB
6,Quebec,6474590,5511745,902990,496790,227160,179040,QC
7,Ontario,10473665,6912395,3442895,2148600,884025,410270,ON
8,Manitoba,946940,774365,165005,90000,31800,43205,MB
9,Saskatchewan,812505,744685,60500,29120,11165,20215,SK
10,Alberta,2888740,2239430,596100,322145,159920,114030,AB


In [254]:
housing_data['area_code'] = housing_data['GEO'].map(area_code)
housing_data[housing_data['area_code'] == 'ON']

Unnamed: 0,REF_DATE,GEO,DGUID,New housing price indexes,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS,area_code
48,1981-01,Ontario,2016A000235,Total (house and land),"Index, 201612=100",347,units,0,v111955490,17.1,,..,,,1,ON
49,1981-01,Ontario,2016A000235,House only,"Index, 201612=100",347,units,0,v111955491,17.2,,..,,,1,ON
50,1981-01,Ontario,2016A000235,Land only,"Index, 201612=100",347,units,0,v111955492,17.3,,..,,,1,ON
168,1981-02,Ontario,2016A000235,Total (house and land),"Index, 201612=100",347,units,0,v111955490,17.1,,..,,,1,ON
169,1981-02,Ontario,2016A000235,House only,"Index, 201612=100",347,units,0,v111955491,17.2,,..,,,1,ON
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61729,2023-11,Ontario,2016A000235,House only,"Index, 201612=100",347,units,0,v111955491,17.2,127.6,,,,1,ON
61730,2023-11,Ontario,2016A000235,Land only,"Index, 201612=100",347,units,0,v111955492,17.3,120.2,E,,,1,ON
61848,2023-12,Ontario,2016A000235,Total (house and land),"Index, 201612=100",347,units,0,v111955490,17.1,125.1,,,,1,ON
61849,2023-12,Ontario,2016A000235,House only,"Index, 201612=100",347,units,0,v111955491,17.2,127.4,,,,1,ON
