# # Weather Classification using AutoGluon

# ## Project Overview

# This notebook uses weather data from 17 Canadian cities to perform a **classification** task. 
# The continuous temperature column (`temperature_celsius`) is transformed into temperature categories:
# - **Low**: Temperature below 0°C
# - **Moderate**: 0°C to 20°C
# - **High**: Above 20°C

## **📂 Data Dictionary**

| **Column Name**       | **Data Type** | **Units**               | **Description**                                                                 |
|------------------------|---------------|--------------------------|---------------------------------------------------------------------------------|
| `temperature_celsius`  | float         | Degrees Celsius          | Actual temperature measured in degrees Celsius.                                |
| `feels_like_celsius`   | float         | Degrees Celsius          | Temperature in degrees Celsius adjusted for human perception due to wind/humidity. |
| `humidity`             | integer       | Percentage (%)           | Percentage of moisture present in the air.                                     |
| `pressure`             | integer       | Millibars (mbar)         | Atmospheric pressure in millibars (mbar).                                      |
| `wind_speed`           | float         | Kilometers per hour (km/hr) | Speed of wind in kilometers per hour.                                           |

---

# ## Step 1: Load Libraries and Dataset

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularPredictor
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# 📁 Load Data
real_time_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\weather_data.csv', encoding='latin1')
historical_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\historical_hourly_data.csv', encoding='latin1')
forecast_24h_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\24_hour_forecast.csv', encoding='latin1')
forecast_14d_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\14_day_forecast.csv', encoding='latin1')

In [3]:
# 🔄 Merge and Clean Data
common_columns = ['temperature_celsius', 'feels_like_celsius', 'humidity', 'pressure', 'wind_speed']
weather_df = pd.concat([real_time_df, historical_df, forecast_24h_df, forecast_14d_df], ignore_index=True)
weather_df = weather_df[common_columns].dropna()

In [4]:
# 👀 View Sample Data
weather_df.head()

Unnamed: 0,temperature_celsius,feels_like_celsius,humidity,pressure,wind_speed
0,7.2,5.09,73,1030,3.09
1,7.03,4.34,81,1030,4.02
2,2.0,-2.21,58,1025,4.63
3,4.21,0.39,49,1023,4.92
4,5.67,2.62,83,1013,4.12


In [5]:
# 🏷️ Create Target Variable for Wind Speed Classification
weather_df['wind_category'] = pd.cut(
    weather_df['wind_speed'],
    bins=[-1, 3, 15, float('inf')],  # Define wind speed ranges
    labels=['Calm', 'Breezy', 'Windy']  # Corresponding categories
)

In [6]:
# ✨ Data Preparation
target = 'wind_category'
X = weather_df.drop(columns=[target])
y = weather_df[target]

In [7]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
# Combine X_train and y_train into a single DataFrame for AutoGluon
train_data = pd.concat([X_train, y_train], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)

In [9]:
# 🚀 Train AutoGluon Classifier
predictor = TabularPredictor(label=target, problem_type='multiclass').fit(train_data)

No path specified. Models will be saved in: "AutogluonModels\ag-20250415_005518"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.19045
CPU Count:          4
Memory Avail:       9.06 GB / 16.00 GB (56.6%)
Disk Space Avail:   37.45 GB / 126.51 GB (29.6%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets

In [10]:
# 📊 Evaluate the Classification Model
performance = predictor.evaluate(test_data)
print("Classification Performance:")
print(performance)

Classification Performance:
{'accuracy': 0.9951219512195122, 'balanced_accuracy': 0.666666666666667, 'mcc': 0.9860468532787814}


In [12]:
# 🔍 View Leaderboard and Feature Importance
print(predictor.leaderboard(test_data))

                  model  score_test  score_val eval_metric  pred_time_test  \
0         LightGBMLarge    0.995122   1.000000    accuracy        0.000000   
1              LightGBM    0.995122   1.000000    accuracy        0.003997   
2   WeightedEnsemble_L2    0.995122   1.000000    accuracy        0.003997   
3      RandomForestEntr    0.995122   1.000000    accuracy        0.114561   
4      RandomForestGini    0.995122   1.000000    accuracy        0.159568   
5            LightGBMXT    0.990244   0.993902    accuracy        0.008204   
6               XGBoost    0.990244   0.993902    accuracy        0.014514   
7        NeuralNetTorch    0.990244   0.987805    accuracy        0.015645   
8       NeuralNetFastAI    0.990244   1.000000    accuracy        0.015731   
9              CatBoost    0.985366   1.000000    accuracy        0.006001   
10       ExtraTreesEntr    0.985366   0.987805    accuracy        0.115941   
11       ExtraTreesGini    0.985366   0.987805    accuracy      

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


In [13]:
# 🔍 View Feature Importance
print(predictor.feature_importance(test_data))

Computing feature importance via permutation shuffling for 5 features using 204 rows with 5 shuffle sets...
	0.29s	= Expected runtime (0.06s per shuffle set)
	0.31s	= Actual runtime (Completed 5 of 5 shuffle sets)


                     importance    stddev   p_value  n  p99_high   p99_low
wind_speed             0.322549  0.025329  0.000005  5  0.374703  0.270395
temperature_celsius    0.012745  0.002685  0.000223  5  0.018273  0.007217
feels_like_celsius     0.010784  0.004101  0.002091  5  0.019229  0.002340
humidity               0.000000  0.000000  0.500000  5  0.000000  0.000000
pressure               0.000000  0.000000  0.500000  5  0.000000  0.000000
