# # Weather Classification using AutoGluon

# ## Project Overview

# **🌍 Wind Speed Classification Using Real-Time Weather Data**

This notebook classifies 17 Canadian cities into categories (`Calm`, `Breezy`, and `Windy`) based on wind speed using AutoGluon. It includes data cleaning, preparation, model training, and evaluation steps.

---



## **📂 Data Dictionary**

| **Column Name**       | **Data Type** | **Units**               | **Description**                                                                 |
|------------------------|---------------|--------------------------|---------------------------------------------------------------------------------|
| `temperature_celsius`  | float         | Degrees Celsius          | Actual temperature measured in degrees Celsius.                                |
| `feels_like_celsius`   | float         | Degrees Celsius          | Temperature in degrees Celsius adjusted for human perception due to wind/humidity. |
| `humidity`             | integer       | Percentage (%)           | Percentage of moisture present in the air.                                     |
| `pressure`             | integer       | Millibars (mbar)         | Atmospheric pressure in millibars (mbar).                                      |
| `wind_speed`           | float         | Kilometers per hour (km/hr) | Speed of wind in kilometers per hour.                                           |

---

# ## Step 1: Load Libraries and Dataset

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularPredictor
import matplotlib.pyplot as plt
import seaborn as sns

In [11]:
# 📁 Load Data
real_time_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\weather_data.csv', encoding='latin1')
historical_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\historical_hourly_data.csv', encoding='latin1')
forecast_24h_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\24_hour_forecast.csv', encoding='latin1')
forecast_14d_df = pd.read_csv('C:\\Users\\biauser\\PycharmProjects\\PythonProject\\14_day_forecast.csv', encoding='latin1')

In [25]:
# 🔄 Merge and Clean Data
common_columns = ['city', 'temperature_celsius', 'feels_like_celsius', 'humidity', 'pressure', 'wind_speed']
weather_df = pd.concat([real_time_df, historical_df, forecast_24h_df, forecast_14d_df], ignore_index=True)
weather_df = weather_df[common_columns].dropna()

In [26]:
# 👀 View Sample Data
weather_df.head()

Unnamed: 0,city,temperature_celsius,feels_like_celsius,humidity,pressure,wind_speed
0,Vancouver,7.2,5.09,73,1030,3.09
1,Victoria,7.03,4.34,81,1030,4.02
2,Calgary,2.0,-2.21,58,1025,4.63
3,Edmonton,4.21,0.39,49,1023,4.92
4,Regina,5.67,2.62,83,1013,4.12


You can view the full **EDA Report** here: [Exploratory Data Analysis Report - Laptop Dataset](https://sandeepmondkar14.github.io/pages/combined_weather_report.html).

In [None]:
# 🏷️ Create Target Variable for Wind Speed Classification
weather_df['wind_category'] = pd.cut(
    weather_df['wind_speed'],
    bins=[-1, 3, 15, float('inf')],  # Define wind speed ranges
    labels=['Calm', 'Breezy', 'Windy']  # Corresponding categories
)

In [28]:
# Check the distribution of the categories
print("Wind Speed Category Distribution:")
print(weather_df['wind_category'].value_counts())

Wind Speed Category Distribution:
wind_category
Breezy    738
Calm      284
Windy       1
Name: count, dtype: int64


In [29]:
# ✨ Data Preparation
target = 'wind_category'
X = weather_df.drop(columns=[target])
y = weather_df[target]

In [30]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [31]:
# Combine X_train and y_train into a single DataFrame for AutoGluon
train_data = pd.concat([X_train, y_train], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)

In [32]:
# 🚀 Train AutoGluon Classifier
predictor = TabularPredictor(label=target, problem_type='multiclass').fit(train_data)

No path specified. Models will be saved in: "AutogluonModels\ag-20250415_012133"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.19045
CPU Count:          4
Memory Avail:       8.64 GB / 16.00 GB (54.0%)
Disk Space Avail:   37.43 GB / 126.51 GB (29.6%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets

In [33]:
# 📊 Evaluate the Classification Model
performance = predictor.evaluate(test_data)
print("Classification Performance:")
print(performance)

Classification Performance:
{'accuracy': 0.9951219512195122, 'balanced_accuracy': 0.666666666666667, 'mcc': 0.9860468532787814}


In [34]:
# 🔍 View Leaderboard and Feature Importance
print(predictor.leaderboard(test_data))

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


                  model  score_test  score_val eval_metric  pred_time_test  \
0              LightGBM    0.995122   1.000000    accuracy        0.004997   
1   WeightedEnsemble_L2    0.995122   1.000000    accuracy        0.007990   
2              CatBoost    0.995122   1.000000    accuracy        0.020338   
3         LightGBMLarge    0.995122   1.000000    accuracy        0.037002   
4      RandomForestEntr    0.995122   1.000000    accuracy        0.129043   
5      RandomForestGini    0.995122   1.000000    accuracy        0.153998   
6               XGBoost    0.990244   0.993902    accuracy        0.069005   
7       NeuralNetFastAI    0.990244   0.993902    accuracy        0.084123   
8            LightGBMXT    0.985366   0.993902    accuracy        0.051002   
9        ExtraTreesEntr    0.975610   0.981707    accuracy        0.140203   
10       ExtraTreesGini    0.975610   0.987805    accuracy        0.143379   
11       NeuralNetTorch    0.970732   0.981707    accuracy      

In [35]:
# 🔍 View Feature Importance
print(predictor.feature_importance(test_data))

Computing feature importance via permutation shuffling for 6 features using 204 rows with 5 shuffle sets...
	0.84s	= Expected runtime (0.17s per shuffle set)
	0.25s	= Actual runtime (Completed 5 of 5 shuffle sets)


                     importance    stddev   p_value  n  p99_high   p99_low
wind_speed             0.322549  0.025329  0.000005  5  0.374703  0.270395
temperature_celsius    0.012745  0.002685  0.000223  5  0.018273  0.007217
feels_like_celsius     0.010784  0.004101  0.002091  5  0.019229  0.002340
city                   0.000000  0.000000  0.500000  5  0.000000  0.000000
humidity               0.000000  0.000000  0.500000  5  0.000000  0.000000
pressure               0.000000  0.000000  0.500000  5  0.000000  0.000000


In [40]:
# ✅ Display the classified cities
classified_cities = weather_df[['city', 'wind_speed', 'wind_category']]
print(classified_cities)

            city  wind_speed wind_category
0      Vancouver        3.09        Breezy
1       Victoria        4.02        Breezy
2        Calgary        4.63        Breezy
3       Edmonton        4.92        Breezy
4         Regina        4.12        Breezy
...          ...         ...           ...
1018  St. John's        8.56        Breezy
1019  St. John's        6.68        Breezy
1020  St. John's        3.93        Breezy
1021  St. John's        4.14        Breezy
1022  St. John's        6.21        Breezy

[1023 rows x 3 columns]


# **🌬 Wind Speed Classification for 17 Canadian Cities**

## **Conclusion**

The classification of wind speed for 17 Canadian cities reveals the following key insights:
- **Calm**: Only two cities—Toronto and Ottawa—fall under this category, with wind speeds below 3 km/hr.
- **Breezy**: The majority of cities (15 out of 17) are categorized as `Breezy`, with wind speeds ranging between 3 km/hr and 15 km/hr.
- **Windy**: None of the cities in the dataset meet the threshold of wind speeds exceeding 15 km/hr.

The dominance of `Breezy` classifications across most cities highlights typical moderate wind conditions prevalent in the dataset.

---

## **Summary**

This workflow for classifying wind speed included the following steps:
1. **Data Cleaning**:
   - Filtered the dataset to focus on `city` and `wind_speed` attributes.
   - Handled missing values to ensure clean and reliable data.
   
2. **Target Variable Creation**:
   - Defined three categories for wind speed:
     - **Calm**: Wind speed < 3 km/hr.
     - **Breezy**: Wind speed between 3 km/hr and 15 km/hr.
     - **Windy**: Wind speed > 15 km/hr.
   - Classified cities accordingly using the `pd.cut()` function.

3. **Classification Results**:
   - **Calm**: Cities with low wind speeds, including Toronto and Ottawa.
   - **Breezy**: Cities with moderate wind speeds, accounting for the majority of the dataset.
   - **Windy**: None of the cities fell under this category, as wind speeds in the dataset did not exceed 15 km/hr.

This analysis provides a clear understanding of wind conditions across cities, and it lays the groundwork for further exploration, such as integrating additional weather attributes for classification or analyzing seasonal wind variations.