# Project 3 - FYS-STK4155

This notebook contains the code which produces the results for the Project 3 report in FYS-STK4155 regarding PyTorch application on building a Neural Network which handles weather type classification.

The dataset used was retrieved from: https://www.kaggle.com/datasets/nikhil7280/weather-type-classification/data, 17.11.25.

*Fall 2025*

**Authors:** Martine Jenssen Pedersen, Sverre Manu Johansen & Kjersti Stangeland

### Plans for the project

**Test different...***
* Model architectures (# layers, # nodes)
* Activation functions
    * tanh
    * relu
    * lrelu
    * sigmoid
    * gelu
* Learning rates

**Use**
* Adam optimizer because we found that was best in Project 2.
* Our own built NN
* NN using PyTorch
* Own built logistic regression
* Logistic regression through Scikitlear
* Accuracy score, CrossEntropy as metric
* Heatmaps, ROC curve, cumulative gain, confusion matrix

**Motivation**
* Classification is useful!
* See how NN can see links and nonlinearities
* For meteorological purposes, ML is interesting as it can lower computational costs in forecasting (not what we do here tho but still)
* Help blind people?? lmao

In [8]:
from pathlib import Path
import os
import sys
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

from functions.make_dataset import *
from functions.nn_pytorch import *

In [9]:
import tqdm as notebook_tqdm

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import torch.nn as nn

from torch.utils.data import Dataset, DataLoader, random_split
from sklearn.preprocessing import LabelEncoder

import matplotlib.style as mplstyle

mplstyle.use(["ggplot", "fast"])

sns.set_context("notebook", font_scale=1.3)
sns.set_style("whitegrid")

In [10]:
path = '/Users/kjesta/Desktop/Masteremner/FYS-STK4155/Project_3_FYSSTK/kagglehub/datasets/nikhil7280/weather-type-classification/versions/1/weather_classification_data.csv'

In [11]:
ds = pd.read_csv(path)
ds.head()

Unnamed: 0,Temperature,Humidity,Wind Speed,Precipitation (%),Cloud Cover,Atmospheric Pressure,UV Index,Season,Visibility (km),Location,Weather Type
0,14.0,73,9.5,82.0,partly cloudy,1010.82,2,Winter,3.5,inland,Rainy
1,39.0,96,8.5,71.0,partly cloudy,1011.43,7,Spring,10.0,inland,Cloudy
2,30.0,64,7.0,16.0,clear,1018.72,5,Spring,5.5,mountain,Sunny
3,38.0,83,1.5,82.0,clear,1026.25,7,Spring,1.0,coastal,Sunny
4,27.0,74,17.0,66.0,overcast,990.67,1,Winter,2.5,mountain,Rainy


To make the dataset work with PyTorch, we convert features which are categorical (strings) to numerical values.

In [12]:
ds['Weather Type'].unique()

array(['Rainy', 'Cloudy', 'Sunny', 'Snowy'], dtype=object)

In [13]:
dataset = WeatherDataset(csv_file=path)

train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
    
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

In [20]:
batch_size = 64

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader   = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

input_dim = dataset.X.shape[1]  # number of features
hidden = 64
num_classes = len(dataset.encoders["Weather Type"].classes_)

model = WeatherNN(input_dim=input_dim, hidden_dim=hidden, num_hidden_layers=10, output_dim=num_classes, activation="relu")
cost_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

epochs = 50

for epoch in range(epochs):
    model.train()
    total_cost = 0

    for X, y in train_loader:
        optimizer.zero_grad()
        output = model(X)
        cost = cost_function(output, y)
        cost.backward()
        optimizer.step()
        total_cost += cost.item()

    # Validation
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for X, y in test_loader:
            output = model(X)
            preds = torch.argmax(output, dim=1)
            correct += (preds == y).sum().item()
            total += y.size(0)

    accuracy = correct / total

    print(f"Epoch {epoch+1}/{epochs} | Loss: {total_cost:.3f} | Test Acc: {accuracy:.3f}")

Epoch 1/50 | Loss: 172.510 | Test Acc: 0.777
Epoch 2/50 | Loss: 110.977 | Test Acc: 0.824
Epoch 3/50 | Loss: 94.622 | Test Acc: 0.804
Epoch 4/50 | Loss: 82.990 | Test Acc: 0.849
Epoch 5/50 | Loss: 81.018 | Test Acc: 0.826
Epoch 6/50 | Loss: 72.244 | Test Acc: 0.856
Epoch 7/50 | Loss: 69.439 | Test Acc: 0.836
Epoch 8/50 | Loss: 67.181 | Test Acc: 0.862
Epoch 9/50 | Loss: 64.644 | Test Acc: 0.865
Epoch 10/50 | Loss: 60.851 | Test Acc: 0.854
Epoch 11/50 | Loss: 62.674 | Test Acc: 0.849
Epoch 12/50 | Loss: 57.038 | Test Acc: 0.869
Epoch 13/50 | Loss: 57.621 | Test Acc: 0.860
Epoch 14/50 | Loss: 57.131 | Test Acc: 0.851
Epoch 15/50 | Loss: 57.478 | Test Acc: 0.853
Epoch 16/50 | Loss: 57.005 | Test Acc: 0.871
Epoch 17/50 | Loss: 55.320 | Test Acc: 0.849
Epoch 18/50 | Loss: 56.401 | Test Acc: 0.843
Epoch 19/50 | Loss: 53.061 | Test Acc: 0.875
Epoch 20/50 | Loss: 51.869 | Test Acc: 0.874
Epoch 21/50 | Loss: 56.474 | Test Acc: 0.864
Epoch 22/50 | Loss: 52.844 | Test Acc: 0.862
Epoch 23/50 | Los

# Plan

Vi vil:
* Gjøre et klassifiseringsproblem på værtyper. Teste logistisk regresjon og nevralt nettverk (kanskje mer?).
* Vi skal se på: confusion matrix, accuracy score +++
* Tester ulike aktiveringsfunksjoner for problemet
* Finne artikler på dette??
* Skrive en rapport ala de forrige. Diskutere modell/metode/oppsett blablabla. 