# Homework 5: 12707 and 12607

Homework created by Ryan Albelda 
Inspired by: https://curiousily.com/posts/build-your-first-neural-network-with-pytorch/

## Build Your First Neural Network with PyTorch

This homework is out of 50 points.

### About data: 
This archive contains 2075259 measurements gathered in a house located in Sceaux (7km of Paris, France) between December 2006 and November 2010 (47 months). Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

Learn more about the data here: https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption

Citation: 
Hebrail, G. & Berard, A. (2006). Individual Household Electric Power Consumption [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C58K54.

# Part 1: Name, Homework Objectives

#### 1) Name, Andrew ID, Time to finish the homework 
*12607: 3 points, 12707: 3 points*

'###' 

##### Objectives of Homework
- Clean data to be numeric and remove missing rows 
- Separate features and target variable 
- Use standard scalar and why we are using tensors. Learn what tensors are here
- Use torch.nn.Module and understand how to read it: understand how layers of nodes correspond to lines of code
- Find the activation function part of the NN 
- Understand the tools for error metrics in a NN 
- Understand there is optimizer tool for NN and research other optimizer methods
- Learn what a learning rate is 
- Understand how epoch ties into loss errors 
- Convert data to be binary classification and apply various model metrics 

# Part 2: Loading Data and cleaning the data

#### Import Packages 

In [12]:
import os
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from torch import nn, optim
from sklearn.preprocessing import StandardScaler
import matplotlib.image as mpimg

#### Import Data and look at the data 

In [13]:
df = pd.read_csv('household_power_consumption.csv', sep=',', low_memory=False)

In [14]:
df.head()

Unnamed: 0.1,Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,4,16/12/2006,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075259 entries, 0 to 2075258
Data columns (total 10 columns):
 #   Column                 Dtype  
---  ------                 -----  
 0   Unnamed: 0             int64  
 1   Date                   object 
 2   Time                   object 
 3   Global_active_power    object 
 4   Global_reactive_power  object 
 5   Voltage                object 
 6   Global_intensity       object 
 7   Sub_metering_1         object 
 8   Sub_metering_2         object 
 9   Sub_metering_3         float64
dtypes: float64(1), int64(1), object(8)
memory usage: 158.3+ MB


#### 2) Clean the data 
* Convert all columns (except Date, Time, and Unnamed: 0) to numeric
* Fix the dropping line `df.###` to drop the rows with missing values. 
    * Per data documentation there are missing values:https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption
        * If this was 100 point assignment you would explore if the values are missing  
* Drop  'Date', 'Time', and  'Unnamed: 0'  columns

*12607: 8 points, 12707: 6 points*

'###'

In [16]:
# Convert all columns (except Date, Time, and Unnamed: 0) to numeric
for col in df.columns:
    if col not in [###]:
        df[col] = pd.to_numeric(df[###], errors='coerce')
        
# Drop rows with missing values
df.###(inplace=True)

# Drop 'Date', 'Time',and  'Unnamed: 0'  columns
df.drop(columns=[###], inplace=True)


SyntaxError: '[' was never closed (124557764.py, line 10)

### More Preprocessing 

In [None]:
# No categorical features left, so this will not change the dataframe, but safe to keep:
df = pd.get_dummies(df, drop_first=True)

#### Test and Train data Set

#### 3) Separate the Features and Targets: 
- Our target variable is `Global_active_power`

*12607: 5 points, 12707: 4 points*

'###'

In [None]:
# Separate features and target
X = df.drop(columns='###')
y = ###
# Train-test split
# DO NOT CHANGE THIS, we want everyone to get the same results for this homework 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=14)


The code scales training and test features using StandardScaler, then converts both the features (X_train, X_test) and labels (y_train, y_test) into PyTorch tensors for use in a neural network. 

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.preprocessing import StandardScaler

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to torch tensors
X_train = torch.from_numpy(X_train).float()
y_train = torch.squeeze(torch.from_numpy(y_train.to_numpy()).float())

X_test = torch.from_numpy(X_test).float()
y_test = torch.squeeze(torch.from_numpy(y_test.to_numpy()).float())

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

We use StandardScaler to normalize the features so that each one has a mean of 0 and a standard deviation of 1.

This helps neural networks (and many other ML models) train faster and more reliably by:

   1)  Preventing features with large scales from dominating others

   2) Helping gradients flow better, especially in deep networks

   3) Improving model convergence during training

# Part 3: Building a Neural Network



We’ll build a simple Neural Network (NN) that predicts the global active power consumption of a building.

Our input will use data from several sensor-based features (e.g., voltage, current, sub-metering values). We’ll create an appropriate input layer that matches the number of these features.

The output will be a single number representing the predicted Global_active_power usage at a given time.

We’ll include two hidden layers between the input and output layers. These hidden layers learn internal representations that help the model make accurate predictions. All layers will be fully connected (dense).

To implement this, we’ll define a PyTorch neural network class that inherits from torch.nn.Module.

#### 4) Research and write 1-3 sentences exploring what the torch.nn.Module is and its function.  12707: Why does it have different layer types? 

*12607: 4 points (only explore and write function), 12707: 4 points, answer layer question too*


'###'


In [None]:
# Define neural network
class Net(nn.Module):
    def __init__(self, n_features):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(n_features, 5)
        self.fc2 = nn.Linear(5, 3)
        self.fc3 = nn.Linear(3, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.fc3(x))

net = Net(X_train.shape[1])
print(net)


In [None]:
# Run code to see image
img = mpimg.imread("NN_HW5.png")
# Display the image
plt.imshow(img)
plt.axis('off')  
plt.show()

#### 5) We are going to focus on: 

    def __init__(self, n_features):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(n_features, 5)
        self.fc2 = nn.Linear(5, 3)
        self.fc3 = nn.Linear(3, 1)
        
Part of the code. 

Using the image for help, explain in 1-2 sentences how `self.fc2 = nn.Linear(5, 3)` is different from `self.fc3 = nn.Linear(3, 1)`. 

12707 only: Also, looking at the image:  One of these layers, in the image, is not a correct representation of the network in the code. What layer is it? In what way is it incorrect? 

*12607: 6 points (skip how image is incorrect), 12707: 6 points*



'###'


*answer here*

We start by creating the layers of our model in the constructor. The forward() method is where the magic happens. It accepts the input x and allows it to flow through each layer. There is a corresponding backward pass (defined for you by PyTorch) that allows the model to learn from the errors that is currently making.

# Part 4: Activation Functions

You might notice the calls to F.relu and torch.sigmoid. Why do we need those?

One of the cool features of Neural Networks is that they can approximate non-linear functions. In fact, it is proven that they can approximate any function.

It is difficult to approximate non-linear functions using only linear layers. Activation functions allow you to break from the linear world and learn (hopefully) more. You’ll usually find them applied to an output of some layer.

Those functions must be hard to define, right?



### ReLU

Not at all, let start with the ReLU definition (one of the most widely used activation function):

`ReLU(x)=max(0,x)`


Easy peasy, the result is the maximum value of zero and the input:

In [None]:
# Run code to see image
img = mpimg.imread("ReLu.png")
# Display the image
plt.imshow(img) 
plt.axis('off')   
plt.show()

### Sigmoid

The sigmoid is useful when you need to make a binary decision/classification (answering with a yes or a no).

`Sigmoid(x)= 1 / (1 + e^-x)`


The sigmoid squishes the input values between 0 and 1. But in a super kind of way:



In [None]:
# Run code to see image
img = mpimg.imread("Sigmoid.png")
# Display the image
plt.imshow(img) 
plt.axis('off')   
plt.show()


Back to the code. 

#### 6) Use of activation functions: 

       def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return torch.sigmoid(self.fc3(x))
 
In a simple explanation, (1-3 sentences), what is the purpose of these activation functions. Note how `F.relu(self.fc2(x))` is different from `torch.sigmoid(self.fc3(x))`  (no math required in answer)  

12707: How might things change in the function of the network if we added another layer?   


*12607: 5 points (skip how network would change), 12707: 4 points*



'###'

*answer here*

# Part 5: Training

### Loss Function

With the model in place, we need to find parameters that predict the future power. First, we need something to tell us how good we’re currently doing:

In [None]:
criterion = nn.MSELoss()


The Mean Squared Error (MSE) loss is a function that measures the average squared difference between the predicted values of a model and the actual values. In our case, it compares the model's predictions with the true values. It does not require the predictions to be passed through a sigmoid function; instead, it works directly with the raw output of the model. The closer this value gets to 0, the more accurate your model's predictions are.

#### 7) Look up another loss function, in PyTorch, that can be used in an NN. 12707: Explain how this is different then MSEloss. Write 1-3 sentences about this method.

*12607: 2 points (skip how it is different), 12707: 4 points*

'###'

*answer here*

### Optimization



Imagine that each parameter of our NN is a knob. The optimizer’s job is to find the perfect positions for each knob so that the loss gets close to 0.

Real-world models can contain millions or even billions of parameters. With so many knobs to turn, it would be nice to have an efficient optimizer that quickly finds solutions.

Contrary to what you might believe, optimization in Deep Learning is just satisfying. In practice, you’re content with good enough parameter values that give you an acceptable accuracy.

While there are tons of optimizers you can choose from, Adam is a safe first choice. PyTorch has a well-debugged implementation you can use:

In [None]:
optimizer = torch.optim.Adam(net.parameters(), lr=0.001, weight_decay=1e-5)



Naturally, the optimizer requires the parameters. The second argument lr is learning rate. It is a tradeoff between how good parameters you’re going to find and how fast you’ll get there. Finding good values for this can be black magic and a lot of brute-force “experimentation”.

#### 8) Research another optimizer algorithm, in PyTorch, that one can use for NN. 12707: How is this one different then optim.Adam?  Write 1-3 sentences about this algorithm. 

*12607: 2 points (skip how it is different), 12707: 4 points*

Provide citation if applies to your answer. 

'###'

*answer here*

#### 9) What does lr = 0.001 mean? What would it mean if it was 0.1? What would it mean if it was 0.0000001?

*12607: 3 points,  12707: 3 points*


'###'

*answer here*

# Part 6: Energy Consumption Forecasting

With all the pieces of the puzzle in place, we can start training our model:

*Note this will take a few minutes to run*

In [None]:
def round_tensor(t, decimal_places=3):
    return round(t.item(), decimal_places) 
# Training loop (modified for regression)
for epoch in range(1500):
    net.train()
    y_pred = net(X_train).squeeze()
    train_loss = criterion(y_pred, y_train)

    if epoch % 100 == 0:
        net.eval()
        with torch.no_grad():
            y_test_pred = net(X_test).squeeze()
            test_loss = criterion(y_test_pred, y_test)
            print(
f'''epoch {epoch}
Train set - loss: {round_tensor(train_loss)}
Test  set - loss: {round_tensor(test_loss)}
''')

    optimizer.zero_grad()
    train_loss.backward()
    optimizer.step()


#### 10) How long did this take your computer to run? Note: the time should be listed at the bottom of the cell. If no time provided by your machine, an estimate is fine. 
#### Why do you think it took so long? What does epoch 1500 means here? Do you notice anything happening to the loss value? Does the loss value always get significantly better (over 5% change)  with every epoch? 


*if taking over 8 mins to run set the for epoch in range(1500) to be range (1000)*

*12607: 5 points , 12707: 5 points*

'###'

*answer here*

During training, we present the data to the model repeatedly over multiple epochs. For each epoch, we measure the loss, backpropagate the error, and update the model parameters using the optimizer.


# Part 7: Evaluation, using binary classification methods 

To help us do the classification we are going to set a threshold level for global active power. We want to see the likelihood of it being more than 0.4. 

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score, precision_score, recall_score, f1_score, accuracy_score
import matplotlib.pyplot as plt

# Choose threshold to define "high power usage"
threshold = 0.4


# Binarize true labels
y_test_class = (y_test > threshold).int()

# Get predictions from model
net.eval()
with torch.no_grad():
    y_test_pred = net(X_test).squeeze()

# Binarize predictions
y_pred_class = (y_test_pred > threshold).int()


In [None]:
plt.hist(y_test_pred.numpy(), bins=40)
plt.axvline(threshold, color='red', linestyle='--', label=f'Threshold = {threshold}')
plt.title('Predicted Values Distribution')
plt.xlabel('Predicted Global Active Power')
plt.ylabel('Count')
plt.legend()
plt.grid(True)
plt.show()

### 11) Print out the accuracy, precision, recall, and f1 score

*12607: 6 points , 12707: 6 points*

'###'

In [None]:
from sklearn.metrics import '###'

# Need to keep these lines to be able to use numpy and the evaluation metrics 
# Convert tensors to numpy
y_test_class_np = y_test_class.detach().numpy()
y_pred_class_np = y_pred_class.detach().numpy()

# Classification metrics
print("Accuracy:", ###)
print("Precision:",###)
print("Recall:", ###)
print("F1 Score:", ###)


In [None]:
from sklearn.metrics import roc_auc_score, roc_curve

fpr, tpr, _ = roc_curve(y_test_class_np, y_test_pred.detach().numpy())
auc_score = roc_auc_score(y_test_class_np, y_pred_class_np)
plt.plot(fpr, tpr, label=f'Area under ROC curve = {auc_score:.3f}')
plt.plot([0, 1], [0, 1], 'k--')
plt.title('ROC Curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.grid(True)
plt.show()


### 12) Is this AUC better than random? Is it perfect?

*12607: 1 point , 12707: 1 point* 


'###'

*answer here*