## Addressing Our Architecture Issues
1. **Our sigmoid function squishes predictions between 0 and 1, but our labels are between 2 and -2. How can we fix this?**

Normalize the alarm labels according to our activation function in the prediction layer. For example normalize labels between -1 and 1 if using tanh, or normalize labels between 0 and 1 if using sigmoid.

2. **If we decide to not use multilabel classification and instead train individual networks, is there a library that makes this easier for us?**

Yes, the MultiOutputClassifier class of scikit learn is compatible with Keras classifiers

3. **What is a better accuracy metric to measure model preformance considering data is sparse?**

We can create custom Keras metrics functions to calculate the accuracy for the model in terms of predicting only alarm states that are on

4. **What loss function should we use now that out labels are not 0 and 1?**

Read notebook for my ideas

In [199]:
import torch
import torch.nn as nn

In [303]:
# Option 1
# 81 alarms each with 5 possible states
# multi-hot encoding of length 405
# Train model using binary cross entropy loss on those labels

model = nn.Linear(20, 5) 
x = torch.randn(1, 20)
y = torch.tensor([[0,1,0,1,0]]).float()

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

for epoch in range(20):
    optimizer.zero_grad()
    output = (model(x))
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    print('Loss: {:.3f}'.format(loss.item()))

Loss: 0.406
Loss: 0.358
Loss: 0.319
Loss: 0.287
Loss: 0.260
Loss: 0.237
Loss: 0.218
Loss: 0.202
Loss: 0.187
Loss: 0.175
Loss: 0.164
Loss: 0.154
Loss: 0.145
Loss: 0.137
Loss: 0.130
Loss: 0.124
Loss: 0.118
Loss: 0.113
Loss: 0.108
Loss: 0.103


In [330]:
# Option 2
# Normalize alarms to be bewteen 0 and 1
# (-2, -1, 0, 1, 2) --> (0, 1, 2, 3, 4) -- > (0., 0.25, 0.50, 0.75, 1.0)
# Train model using binary cross entropy loss on those labels

import numpy as np
from sklearn.preprocessing import MinMaxScaler
labels = np.array([1,2,3,4,5])
scaler = MinMaxScaler()
normed_labels = scaler.fit_transform(labels.reshape(-1, 1))

print("original labels: ", labels)
print("normalized labels: ", normed_labels.reshape(-1))

model = nn.Linear(20, 5) 
x = torch.randn(1, 20)
y = torch.tensor([[0., 0.25, 0.50, 0.75, 1.0]]).float()

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-1)

for epoch in range(20):
    optimizer.zero_grad()
    output = (model(x))
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    print('Loss: {:.3f}'.format(loss.item()))

original labels:  [1 2 3 4 5]
normalized labels:  [0.   0.25 0.5  0.75 1.  ]
Loss: 0.921
Loss: 0.841
Loss: 0.775
Loss: 0.719
Loss: 0.672
Loss: 0.633
Loss: 0.601
Loss: 0.574
Loss: 0.551
Loss: 0.532
Loss: 0.515
Loss: 0.501
Loss: 0.489
Loss: 0.479
Loss: 0.470
Loss: 0.462
Loss: 0.455
Loss: 0.449
Loss: 0.444
Loss: 0.439


In [331]:
# Option 3
# Create a custom loss function that allows for our original categorical labels
# We could use weighted BCE as a starting point to weight incorrect positive predictions