# Project 3 - FYS-STK4155

This notebook contains the code which produces the results for the Project 3 report in FYS-STK4155 regarding PyTorch application on building a Neural Network which handles weather type classification.

The dataset used was retrieved from: https://www.kaggle.com/datasets/nikhil7280/weather-type-classification/data, 17.11.25.

*Fall 2025*

**Authors:** Martine Jenssen Pedersen, Sverre Manu Johansen & Kjersti Stangeland

In [31]:
import tqdm as notebook_tqdm

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import seaborn as sns

from torch.utils.data import Dataset, DataLoader, random_split
from sklearn.preprocessing import LabelEncoder

import matplotlib.style as mplstyle

mplstyle.use(["ggplot", "fast"])

sns.set_context("notebook", font_scale=1.3)
sns.set_style("whitegrid")

In [19]:
# import kagglehub
# path = kagglehub.dataset_download("nikhil7280/weather-type-classification")

In [20]:
path = '/Users/kjesta/Desktop/Masteremner/FYS-STK4155/Project_3_FYSSTK/kagglehub/datasets/nikhil7280/weather-type-classification/versions/1/weather_classification_data.csv'

To make the dataset work with PyTorch, we have to convert the features which are strings to numerical values.

In [21]:
ds = pd.read_csv(path)
ds.head()

Unnamed: 0,Temperature,Humidity,Wind Speed,Precipitation (%),Cloud Cover,Atmospheric Pressure,UV Index,Season,Visibility (km),Location,Weather Type
0,14.0,73,9.5,82.0,partly cloudy,1010.82,2,Winter,3.5,inland,Rainy
1,39.0,96,8.5,71.0,partly cloudy,1011.43,7,Spring,10.0,inland,Cloudy
2,30.0,64,7.0,16.0,clear,1018.72,5,Spring,5.5,mountain,Sunny
3,38.0,83,1.5,82.0,clear,1026.25,7,Spring,1.0,coastal,Sunny
4,27.0,74,17.0,66.0,overcast,990.67,1,Winter,2.5,mountain,Rainy


In [22]:
class WeatherDataset(Dataset):
    """
    Weather Type Classification dataset.
    Owner of dataset: NIKHIL NARAYAN, KAGGLE USERNAME: nikhil7280
    Link to dataset: https://www.kaggle.com/datasets/nikhil7280/weather-type-classification
    """
    def __init__(self, csv_file, transform=None):
        self.data = pd.read_csv(csv_file)  # Load data from CSV file

        # Identify categorical columns
        cat_cols = self.data.select_dtypes(include=["object"]).columns

        # Create label encoders for each categorical column
        self.encoders = {col: LabelEncoder() for col in cat_cols}

        # Apply encoding
        # String to numerical conversion
        for col in cat_cols:
            self.data[col] = self.encoders[col].fit_transform(self.data[col])

        # Store features and labels  
        self.X = self.data.drop(columns=["Weather Type"]).values.astype("float32")
        self.y = self.data["Weather Type"].values.astype("int64")

        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        X = torch.tensor(self.X[idx], dtype=torch.float32)
        y = torch.tensor(self.y[idx], dtype=torch.long)

        if self.transform:
            X = self.transform(X)

        return X, y

In [24]:
class WeatherNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, hidden),
            nn.ReLU(),
            nn.Linear(hidden, hidden),
            nn.ReLU(),
            nn.Linear(hidden, num_classes)
        )

    def forward(self, x):
        return self.model(x)

In [30]:
dataset = WeatherDataset(csv_file=path)

train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
    
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

batch_size = 64

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader   = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

input_dim = dataset.X.shape[1]  # number of features
hidden = 64
num_classes = len(dataset.encoders["Weather Type"].classes_)

model = WeatherNet()

cost_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

epochs = 50

for epoch in range(epochs):
    model.train()
    total_cost = 0

    for X, y in train_loader:
        optimizer.zero_grad()
        output = model(X)
        cost = cost_function(output, y)
        cost.backward()
        optimizer.step()
        total_cost += cost.item()

    # Validation
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for X, y in test_loader:
            output = model(X)
            preds = torch.argmax(output, dim=1)
            correct += (preds == y).sum().item()
            total += y.size(0)

    accuracy = correct / total

    print(f"Epoch {epoch+1}/{epochs} | Loss: {total_cost:.3f} | Test Acc: {accuracy:.3f}")

Epoch 1/50 | Loss: 316.435 | Test Acc: 0.770
Epoch 2/50 | Loss: 111.906 | Test Acc: 0.856
Epoch 3/50 | Loss: 104.720 | Test Acc: 0.845
Epoch 4/50 | Loss: 92.867 | Test Acc: 0.814
Epoch 5/50 | Loss: 92.235 | Test Acc: 0.869
Epoch 6/50 | Loss: 93.120 | Test Acc: 0.842
Epoch 7/50 | Loss: 87.245 | Test Acc: 0.789
Epoch 8/50 | Loss: 87.832 | Test Acc: 0.848
Epoch 9/50 | Loss: 85.937 | Test Acc: 0.767
Epoch 10/50 | Loss: 80.656 | Test Acc: 0.877
Epoch 11/50 | Loss: 77.225 | Test Acc: 0.864
Epoch 12/50 | Loss: 78.351 | Test Acc: 0.820
Epoch 13/50 | Loss: 79.810 | Test Acc: 0.838
Epoch 14/50 | Loss: 76.638 | Test Acc: 0.811
Epoch 15/50 | Loss: 75.069 | Test Acc: 0.857
Epoch 16/50 | Loss: 72.973 | Test Acc: 0.880
Epoch 17/50 | Loss: 71.242 | Test Acc: 0.880
Epoch 18/50 | Loss: 68.520 | Test Acc: 0.842
Epoch 19/50 | Loss: 73.989 | Test Acc: 0.872
Epoch 20/50 | Loss: 67.478 | Test Acc: 0.885
Epoch 21/50 | Loss: 63.651 | Test Acc: 0.867
Epoch 22/50 | Loss: 67.853 | Test Acc: 0.890
Epoch 23/50 | Lo

# Plan

Vi vil:
* Gjøre et klassifiseringsproblem på værtyper. Teste logistisk regresjon og nevralt nettverk (kanskje mer?).
* Vi skal se på: confusion matrix, accuracy score +++
* Tester ulike aktiveringsfunksjoner for problemet
* Finne artikler på dette??
* Skrive en rapport ala de forrige. Diskutere modell/metode/oppsett blablabla. 