# Project Overview

This project aims to analyze the data from the total imports of the top 25 industries in Canada. The objectives are to discover trends and patterns to predict the behavior of these industries for the current year, 2024.

The data was collected from the official website of the [Trade Data Online by the Government of Canada](https://www.ic.gc.ca/app/scr/tdst/tdo/crtr.html?reportType=TI&grouped=GROUPED&searchType=All&timePeriod=5%7cComplete+Years&currency=CDN&naArea=9999&countryList=ALL&productType=NAICS&toFromCountry=CDN&changeCriteria=true).

## Data Collection Criteria

- **Trade Type:** Total imports
- **Trader:** Canada
- **Trading Partner:** All countries (Total)
- **Time Period (Specific Years):** 2019, 2020, 2021, 2022, 2023
- **Value:** $ Canadian (current dollars)
- **Industry:** Top 25 industries (5-digit NAICS codes)

## Libraries Used for Analysis

- **Pandas:** Used for data preparation and cleaning. It allows for easy loading of data from CSV files, handling missing values, and merging multiple datasets into a single DataFrame for comprehensive analysis.
- **NumPy:** Used for efficient data manipulation and numerical computations. It provides support for large, multi-dimensional arrays and matrices, which are essential for handling the dataset and performing various mathematical operations.
- **scikit-learn:** Used for applying machine learning algorithms to the data. Specifically, it will be used for linear regression to identify trends and make predictions about the behavior of the top 25 industries in Canada.

##Project Structure
-**clean_data.py:** Python script for cleaning the data and making predictions.
-**data/:** Directory containing the raw and cleaned data files.
-**README.md:** Project documentation.
-**requirements.txt:** List of dependencies required for the project.

##How to run the code
git clone https://github.com/yourusername/canadian-imports-data-analysis.git
cd canadian-imports-data-analysis

##Install dependencies
Make sure you have pandas, numpy, and scikit-learn installed. If not, install them using pip:
pip install -r requirements.txt

##Run the script:
Update the path to your data file in clean_data.py and run the script:
python clean_data.py

##Output:
The cleaned data will be saved as cleaned_data.csv in the data/ directory. The script will also output the predicted import value for the current year.





In [12]:
#**Importing the Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

#**Loading the data

csv_files = [
    '../data/raw/Imports_2019.csv',
    '../data/raw/Imports_2020.csv',
    '../data/raw/Imports_2021.csv',
    '../data/raw/Imports_2022.csv',
    '../data/raw/Imports_2023.csv'   
]


In [13]:
#**Preparing the data
#Combine all the csv_files into a one DataFrame
df_list = []
for file in csv_files:
    df = pd.read_csv(file)
    df_list.append(df)
    
#Concatenate all DataFrames 
combined_df = pd.concat(df_list, ignore_index=True)


In [14]:
#**Displying information
print(combined_df.info())
print(combined_df.head(38))


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 175 entries, 0 to 174
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Title             170 non-null    object
 1   Canadian imports  170 non-null    object
dtypes: object(2)
memory usage: 2.9+ KB
None
                                                Title  \
0                                          Industries   
1                                              Origin   
2                                         Destination   
3                                              Period   
4                                               Units   
5                                                 NaN   
6   33611 - Automobile and light-duty motor vehicl...   
7   21111 - Oil and gas extraction (except oil sands)   
8   32541 - Pharmaceutical and medicine manufacturing   
9   33641 - Aerospace product and parts manufacturing   
10                       32411 - Petrole