# ACT CW2 Q2

__Q2 Objective:__

Process dataset using a Neural Network.

__Plan__

We have a binary classification problem, so we are sorting data into one of two classes based on the input values.


__Workflow__ (from Source 1)

1. Get data ready (transform into tensors)
2. Build or pick a pretrained model to suit your problem
3. Pick a loss function and optimiser
4. Build a training loop
(Loop through steps 2-4)
5. Fit the model to the data and make a prediction
6. Evaluate the model
7. Improve through experimentation
8. Save and reload the trained model


__Links__ 

(move to the bottom in a bit)

* https://www.learnpytorch.io/00_pytorch_fundamentals/
* https://www.learnpytorch.io/01_pytorch_workflow/
* https://www.learnpytorch.io/02_pytorch_classification/
* https://docs.pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
* 


More general links:

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html
* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html
* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html
* https://www.geeksforgeeks.org/deep-learning/converting-a-pandas-dataframe-to-a-pytorch-tensor/
* https://saturncloud.io/blog/how-do-i-convert-a-pandas-dataframe-to-a-pytorch-tensor/
* https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html
* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html
* https://www.geeksforgeeks.org/pandas/adding-new-column-to-existing-dataframe-in-pandas/
* https://scikit-learn.org/stable/modules/preprocessing.html
* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
* https://codesignal.com/learn/courses/introduction-to-pytorch-tensors/lessons/defining-a-dataset-with-pytorch-tensors
* https://docs.pytorch.org/tutorials/beginner/data_loading_tutorial.html
* https://discuss.pytorch.org/t/what-do-tensordataset-and-dataloader-do/107017
* https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
* https://docs.pytorch.org/docs/stable/nn.html
* https://realpython.com/ref/glossary/subclass/
* https://www.w3schools.com/python/python_classes.asp
* https://stackoverflow.com/questions/8609153/why-do-we-use-init-in-python-classes
* https://stackoverflow.com/questions/2709821/what-is-the-purpose-of-the-self-parameter-why-is-it-needed
* https://docs.pytorch.org/docs/stable/generated/torch.nn.Linear.html
* https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html
* https://www.geeksforgeeks.org/deep-learning/understanding-the-forward-function-output-in-pytorch/
* https://docs.pytorch.org/docs/stable/nn.html
* https://www.geeksforgeeks.org/python/activation-functions-in-pytorch/
* https://machinelearningmastery.com/activation-functions-in-pytorch/
* https://docs.pytorch.org/docs/stable/generated/torch.nn.BCELoss.html
* https://docs.pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
* https://stackoverflow.com/questions/75979632/pytorchs-nn-bcewithlogitsloss-behaves-totaly-differently-than-nn-bceloss
* https://docs.pytorch.org/docs/stable/generated/torch.optim.Adam.html
* https://docs.pytorch.org/docs/stable/generated/torch.nn.Dropout.html~
* 

### Import Libraries

In [83]:
# import necessary libraries

import numpy as np # numpy
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting
# import seaborn as sns # for data visualisation

# machine learning libraries

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import torch # pytorch library
from torch.utils.data import TensorDataset, DataLoader # for batching data
from torch import nn, optim # neural networks and optimiser

### Set up GPU

If there is a GPU available for PyTorch to use, this will be much faster for running the neural network than using the CPU, as GPUs are much faster for executing matrix multiplication operations.

__Note:__ 

My computer's processor has an integrated graphics card and no dedicated GPU. The code cell below will return False and any code run on my personal computer will be run on the CPU, which may be slower or less efficient. 

However, I will still include the code cell below for cases when this code is run on a PC or Google Colab. If using Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU.


In [84]:
# is there a GPU available?
gpu_TF = torch.cuda.is_available()

# print the result
print("GPU available:")
print(gpu_TF)

# if there is a GPU available

if gpu_TF == True:
    # there is a GPU available
    # set the device to the graphics processor
    device = torch.device("cuda")
else:
    # there is no GPU available
    # use the CPU instead
    device = torch.device("cpu")

# print the device we are using
print("\nDevice:")
print(device)

# we can now store tensors on the selected device using .to(device)
# and this will work for both cpu and gpu devices

GPU available:
False

Device:
cpu


## 
> ## Preparing the Dataset

### Load in the data

Load in the data file (currently in csv file), add the data to a pandas dataframe, and inspect the dataframe to check that it has loaded in correctly.

In [90]:
# import the data file
# data is in the file "psion_upsilon.csv"

# read in the file and store it in a pandas dataframe
rawdata_df = pd.read_csv('psion_upsilon.csv')

In [91]:
# check the dataframe has loaded in correctly

# print the shape of the dataframe
print("Size of dataframe:")
print(rawdata_df.shape) # 40,000 rows x 22 columns
print(f"\n")

# print the column headings
print("Column headings:")
print(rawdata_df.columns)
print(f"\n")

# check top few rows of data
print("Top few rows of dataframe:")
print(rawdata_df.head)

Size of dataframe:
(40000, 22)


Column headings:
Index(['Unnamed: 0', 'Run', 'Event', 'type1', 'E1', 'px1', 'py1', 'pz1', 'pt1',
       'eta1', 'phi1', 'Q1', 'type2', 'E2', 'px2', 'py2', 'pz2', 'pt2', 'eta2',
       'phi2', 'Q2', 'class'],
      dtype='object')


Top few rows of dataframe:
<bound method NDFrame.head of        Unnamed: 0     Run       Event type1       E1      px1      py1  \
0               0  167807  1101779335     G   5.8830   3.6101   2.3476   
1               1  167102   286049970     G  13.7492  -1.9921  11.8723   
2               2  160957   190693726     G   8.5523   1.4623   4.5666   
3               3  166033   518823971     G   7.5224   0.1682  -3.5854   
4               4  163589    49913789     G  12.4683   8.1310  -1.6633   
...           ...     ...         ...   ...      ...      ...      ...   
39995       39995  166033   460063858     G  21.1411  -9.3928  10.8857   
39996       39996  173692   573648364     G  29.4819  16.1461  21.9823   
39997       

### Look at proportion of Binary Classes

The nature of this problem is determining the type of the outgoing meson based on the properties of the 2 incoming muons. There are 2 possible types - J/psi and Upsilon.

We can see how many instances of each meson there are in our dataset of 40,000 samples.

In [93]:
# how many occurences of J/psi and upsilon in 'class'

# how many samples in each class
print(rawdata_df['class'].value_counts())

print("\n") # space between outputs

# what proportion of the samples are in each class
print(rawdata_df['class'].value_counts(normalize=True))

class
upsilon    20000
J/psi      20000
Name: count, dtype: int64


class
upsilon    0.5
J/psi      0.5
Name: proportion, dtype: float64


This dataset has an exact 50/50 split between the classes. As this is a segment of a larger (>5TB) dataset, this is probably by design. This should make it easier to train the neural network correctly, as the data shows no bias towards either of the classes.

### Remove Unnecessary Data Columns

The first 3 columns contain index, run number, and event number. These are parameters used when recording and storing the data points, but they are not physical properties and do not have any effect on the type of particle created. Therefore, they are irrelevant to determining output class.

In [94]:
# remove the first 3 columns
# by defining a new dataframe
# that only contains the relevant variables

# drop columns 0, 1, and 2
# so keep all rows, and columns 3-21
# (df.iloc indices are start inclusive and end exclusive)
reduced_df = rawdata_df.iloc[:, 3:]

# check the properites of the new dataframe are what we want
# print out the new shape and column headings

print("Size of reduced dataframe:")
print(reduced_df.shape)
# 40,000 rows x 19 columns

print(f"\n")
print("Column headings:")
print(reduced_df.columns)

# this is what we expect
# we have removed 'Unnamed (index)', 'Run', and 'Event'
# and kept all 40,000 samples

Size of reduced dataframe:
(40000, 19)


Column headings:
Index(['type1', 'E1', 'px1', 'py1', 'pz1', 'pt1', 'eta1', 'phi1', 'Q1',
       'type2', 'E2', 'px2', 'py2', 'pz2', 'pt2', 'eta2', 'phi2', 'Q2',
       'class'],
      dtype='object')


Now that we have removed the first 3 columns, we can look at the other variables.

To build a neural network in PyTorch, the data should be stored in a tensor, which can only contain numerical values. We can check the type of all of the columns in the reduced dataframe.

In [95]:
# get the datatypes of each column in the dataframe
get_types = reduced_df.dtypes

# print the data types of each column
print(get_types)
# and how many of each type there are
print("\nSummary of column types:")
print(get_types.value_counts())

type1     object
E1       float64
px1      float64
py1      float64
pz1      float64
pt1      float64
eta1     float64
phi1     float64
Q1         int64
type2     object
E2       float64
px2      float64
py2      float64
pz2      float64
pt2      float64
eta2     float64
phi2     float64
Q2         int64
class     object
dtype: object

Summary of column types:
float64    14
object      3
int64       2
Name: count, dtype: int64


Of the 19 columns in the reduced dataframe, 16 of them are numeric (14 float and 2 int). These values can all be converted to a single numeric type (e.g. float64) upon transformation into a tensor. However, there are also 3 'object'-type variables that are unable to be processed by the neural network.

The first 2 non-numeric columns are 'type1' and 'type2', which tell us the types of the first and second muon respectively, whether they are a global muon (G), or a tracker muon (T). The other non-numeric variable is 'class', which tells us the type of the meson created by the collision, either J/psi or Upsilon.

There are a few options for dealing with these variables:

1) Label encoding or assigning category codes. These methods assign an integer value to every possible string value. This is a very efficient method of converting non-numeric variables but can easily be misinterpreted by the neural network. Some machine learning models, including neural networks, will treat integer-encoded data as numeric, and make assumptions that are not true for categorical data, leading to incorrect assumptions and correlations. This can be avoid by using embedding layers in the neural network.
2) One-hot encoding. This avoids the pitfalls of integer-encoding, but increases the dimensionality of the problem. Certain methods of one-hot encoding also produce Boolean values instead of numeric ones, which also cannot be interpreted by the neural network.
3) Deleting the non-numeric values. This avoids the problem of encoding the data, but can lead to important variables being neglected and the neural network not learning the correct connections. If we decide not to include a certain variable, we must have a good reason for doing so.

In deciding which of these method to use, we should consider the following points:

* 'type1' contains only 1 unique value. As we found in Q1, 'type1' contained only 'G' values, so 100% of these particles were global muons. This variable can be immediately ignored as there can be no correlation between 'type1' and 'class'.
* Proportion of each value in 'type2'. This column contains both 'G' and 'T' values, although around 90% of the samples have a 'G' value. This is a relatively small portion of the samples, especially when the division of class among the samples is a 50/50 split.
* Feature importance. In Q1, after training a decision tree, the relative importance of each feature was plotted. The importance of 'type2' was one of the lowest, at only 0.2%. We should weigh up the possibility of introducing extra parameters with how useful the result may or may not be to the resulting neural network.
* The type of each ingoing muon ('G' or 'T') is not a physical property of the muon, or a reflection of the physics governing it. It is a parameter denoting how the muon was detected, either locally or globally. As this only refers to the equipment used to detect the particle, and not information about the particle itself, it should not have any effect.
* 'class' is our target variable, and cannot be ignored or deleted, so it must be encoded using one of the above methods.
* The issues with integer-encoding don't apply to target data. The neural network treats labels differently to input data, because it only uses them to compare prediction with reality, not to learn connections. This means the 'class' variable can be label encoded with no loss of accuracy.

Based on the factors above, the most efficient choice for this dataset is to remove the 'type1' and 'type2' columns entirely, and to encode the 'class' data using integer labels.

In [96]:
# remove 'type1' and 'type2' from the dataset
features_df = reduced_df.drop(['type1', 'type2'], axis=1)

# the resultant dataframe contains only the relevant physical features

# check that these 2 columns have been removed
print("New shape:")
print(features_df.shape) # 40000 x 17
print("\nNew column headings:")
print(features_df.columns) # type columns no longer present

New shape:
(40000, 17)

New column headings:
Index(['E1', 'px1', 'py1', 'pz1', 'pt1', 'eta1', 'phi1', 'Q1', 'E2', 'px2',
       'py2', 'pz2', 'pt2', 'eta2', 'phi2', 'Q2', 'class'],
      dtype='object')


### Label encoding the target data

The target data needs to be converted to numeric data before being transformed into a tensor. Because this is just the label array, we can take the simple approach of mapping each category to an integer without the risk of training the neural network incorrectly.

In [97]:
# assign an integer value to each class type

# add these values to a new column, 'class_int'
# so we don't lose the original data

# convert strings to numerical values
# use category codes to assign integers
features_df['class_int'] = features_df['class'].astype('category').cat.codes

# print out the first few rows of both columns
# first 8 rows, last 2 columns
print(features_df.iloc[:8, -2:])

# save the mapping so we can access it later
class_mapping = dict(enumerate(features_df['class'].astype('category').cat.categories))

# print the mapping key
print("\nInteger mapping:")
print(class_mapping)


     class  class_int
0  upsilon          1
1    J/psi          0
2  upsilon          1
3    J/psi          0
4  upsilon          1
5  upsilon          1
6    J/psi          0
7  upsilon          1

Integer mapping:
{0: 'J/psi', 1: 'upsilon'}


### Create Features matrix and Target array

We now have a dataframe with 18 columns, including 16 features and 1 target variable (across 2 columns). All of these variables are numeric in type and relate to physical properties of the particles.

Before transforming into tensors, we should group the dataset into 2 separate objects - a feature matrix (X) and a target array (y).

In [99]:
# split into feature matrix and target array
# X and y

# the feature matrix contains the input data
# drop the 'class' and 'class_int' columns from the dataframe
X_total = features_df.drop(['class', 'class_int'], axis=1)

# the target array is the information with which we want the data to be classified (the "label")
# this data is the end column of the features dataframe
y_total = features_df['class_int']

# check type and size of both X and y

print("X_total:")
print(type(X_total)) # pd dataframe
print(X_total.shape) # (40000, 16)

print("\ny_total")
print(type(y_total)) # pd series
print(y_total.shape) # (40000,)

# the features matrix has been split into the correct X and y arrays

X_total:
<class 'pandas.core.frame.DataFrame'>
(40000, 16)

y_total
<class 'pandas.core.series.Series'>
(40000,)


## 
> ## Preprocessing the Data

### Convert Pandas objects into NumPy arrays

We will eventually need the dataset in the form of a PyTorch tensor for use in the neural network. For a pandas dataframe, this means converting to a NumPy array first and then a tensor (even if this is not explicitly coded, it is still processed this way by the program). 

However, we first need to scale and split the data, which cannot be done on tensors directly, so should perform these operations on the intermediate NumPy arrays.

In [100]:
# dataframe.values
# gets just the data and no column names
# may also increase the storage of each datatype
# by setting all columns to same datatype (probably float64)

# convert X_total and y_total to numpy arrays
X = X_total.values
y = y_total.values

# check the properties of these arrays

print("\nX:")
print(type(X)) # numpy array
print(X.shape) # (40000, 16)

print("\ny")
print(type(y)) # numpy array
print(y.shape) # (40000,)


X:
<class 'numpy.ndarray'>
(40000, 16)

y
<class 'numpy.ndarray'>
(40000,)


### Split into Test and Training data

We don't want to use all of our data points to train the model. We can split the full dataset into data used to train the model (training data) and data that we can use to analyse the effectiveness of the model once it has been trained (test data).

As there are 40,000 samples, we have enough data to do a 75/25 split. I will therefore use 30,000 samples to train the model and the other 10,000 to test the model.

Even though we have a very evenly split dataset (50/50 split between the 2 output classes), we can still use stratify to make sure that we have a good balance of classes in each subset of the data.

In [101]:
# use train_test_split to split the data
# allocate some of the data for training the neural network
# and the rest for checking its accuracy

# use 30,000 samples for training
# so train_size = 0.75, test_size = 0.25

# use a random state integer for reproducible random shuffling

# inputs are the X and y numpy arrays
# split the data into 4 separate objects
# use stratify to preserve class balance
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=71, stratify=y)

In [102]:
# check the types and sizes of the outputs

# training data
# should be 30,000 randomly selected samples
print("X_train:")
print(X_train.shape) # 30000 x 16
print(type(X_train)) # np array
print("y_train:")
print(y_train.shape) # 30000 (1D)
print(type(y_train)) # np array

print(f"\n") # space between outputs

# test data
# should be the 10,000 remaining samples
print("X_test:")
print(X_test.shape) # 10000 x 16
print(type(X_test)) # np array
print("y_test:")
print(y_test.shape) # 10000
print(type(y_test)) # np array

# the arrays have been split randomly as specified in the function

X_train:
(30000, 16)
<class 'numpy.ndarray'>
y_train:
(30000,)
<class 'numpy.ndarray'>


X_test:
(10000, 16)
<class 'numpy.ndarray'>
y_test:
(10000,)
<class 'numpy.ndarray'>


### Split into Validation data as well?

### Feature Scaling

Neural networks perform better when dealing with inputs on the same scale, and most of the functions used in a neural network work best when dealing with data that is roughly Gaussian (mean = 1, variance = 1). As the features of this dataset are all on different scales, they should be standardised before using them to train a neural network.

However, we only want to use the training data to fit the scaler, so that we don't indirectly use any information from the testing set to train the network.

In [104]:
# initialise an instance of the standard scaler
scaler = StandardScaler()

# fit to the training data only
scaler.fit(X_train)

# transform both sets of X data
# using the parameters from X_train
scaler.transform(X_train)
scaler.transform(X_test)

# all of the X data has now been normalised

# to view the mean and std of each feature
print("Mean (by feature):")
print(scaler.mean_)
print("\nStandard Deviation (by feature):")
print(scaler.scale_)

Mean (by feature):
[ 1.37120108e+01 -1.03732320e-01 -1.71637590e-01 -4.15723233e-02
  7.57993686e+00 -8.73109000e-03 -5.99268167e-02  6.58666667e-02
  1.14661400e+01  3.87268600e-02  1.71115413e-01  9.12610767e-02
  7.33469605e+00  4.11902000e-03  7.42526667e-02 -6.58666667e-02]

Standard Deviation (by feature):
[10.26548918  6.56887743  6.47730307 14.43055082  5.46271391  1.25031988
  1.82965108  0.99782843  9.36487508  6.38112183  6.35297916 11.74920959
  5.30913823  1.05181092  1.80715698  0.99782843]


### Transform dataframes into PyTorch tensors

We now have 4 NumPy arrays, split and standardised. In order to use them in the neural network, they need to be PyTorch tensors.

(After building the neural network, add code pushing tensors .to(device))

In [105]:
# transform X_train, X_test, y_train, and y_test
# all into torch tensors

# first check that they only contain numeric dtypes
print("X:")
print(X_total.dtypes.value_counts())
print("y:")
print(y_total.dtypes)

# all types int64 and float64, which are fine for tensor transformations
# convert them all to floats upon transformation

X:
float64    14
int64       2
Name: count, dtype: int64
y:
int8


In [106]:
# transform X arrays to tensors
X_train_ten = torch.from_numpy(X_train).float()
X_test_ten = torch.from_numpy(X_test).float()

# transform y arrays to tensors
# need to use .unsqueeze
# to change shape (n_samples,) to (n_samples, 1)
y_train_ten = torch.from_numpy(y_train).float().unsqueeze(1)
y_test_ten = torch.from_numpy(y_test).float().unsqueeze(1)

# use .float()
# to convert all numerical types to floats

# check properties of these tensors
# type and shape
# just check the training data

# features
print("X tensor:")
print(type(X_train_ten)) # tensor
print(X_train_ten.shape) # 30000 x 16
# target
print("\ny tensor:")
print(type(y_train_ten)) # tensor
print(y_train_ten.shape) # 30000 x 1

# these are both torch.tensor objects
# and have the correct shape

X tensor:
<class 'torch.Tensor'>
torch.Size([30000, 16])

y tensor:
<class 'torch.Tensor'>
torch.Size([30000, 1])


### Store data together using TensorDataset

In [108]:
# store all training data in one tensor
train_set = TensorDataset(X_train_ten, y_train_ten)

# do the same for test data
test_set = TensorDataset(X_test_ten, y_test_ten)

# we cannot check the sizes of these objects directly
# as they not tensors
print(type(train_set))
# they are an object within the class TensorDataset

<class 'torch.utils.data.dataset.TensorDataset'>


### Split training data into batches

In [109]:
# split each dataset into batches
# for feeding into the neural network
# use a standard batch size of 64

# split the training data
# shuffle before batching
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

# do the same for the test data
# this does not need to be shuffled
test_loader = DataLoader(test_set, batch_size=64)

### Plot the Training Data  (maybe)

As we have the pairplot from the Q1 notebook, which uses an identical dataset, leave this bit for now.

## 
> ## Building the Neural Network

### Overview of Model Structure

__Useful PyTorch Functions__

* nn.Parameter
* model.state_dict()
* torch.randn()
* torch.inference_mode()

__Model Architecture and Parameters:__

- Input features = 16
- Output features = 1
- Number of hidden layers = 2
- Number of neurons per hidden layer = 32-64
- Fully connected layers
- Activation function = ReLU
- Loss function = BCEWithLogitsLoss
- Optimiser = Adam

<br>

- Number of training iterations = ??
- Confusion matrix = sklearn function (find)
- Classification report = sklearn function (find)

<br>

__Still to figure out:__
- What is a feedforward network?
- How many training iterations should we do?
- What is a training epoch?
- How do we shuffle the data between epochs?

### Build the Neural Network

Start by defining a model class and constructing the NN architecture.

__Why have we chosen this network architecture?__

- The number of input and output features are not a deliberate choice, but a result of the shape of the features matrix and target array. Each sample in the dataset has 16 physical parameters, and 1 label.

- Number of hidden layers = 2. For a binary classification, we will not benefit from a deep network like other problems would (e.g. image analysis, text classification, numerical prediction). Adding too many layers to this problem would likely not increase the accuracy of the solution, but would make it more prone to overfitting.

- 64 neurons in layer 1 and 32 in layer 2.

- Fully connected layers

- ReLU activation function between layers.

- No sigmoid function after the final layer. As this is a classification problem, the output should be a probability, with a value between 0 and 1.

In [110]:
# construct a new model class called ParticleClassifier
# that inherits from the nn.Module class
# and so gives it access to PyTorch functionalities

class ParticleClassifier(nn.Module):

    # initialise an instance of this class
    def __init__(self):

        # initialise the parent class
        super().__init__()
        
        # create a stack of (4 for now) hidden layers
        # each with (8 for now / 12 / 16) neurons

        self.layer_stack = nn.Sequential(
            # list of linear layers in order
            # separated by commas
            # these will be stacked by nn.Sequential

            # in_features for first layer = number of features in X_train
            # out_features(layer_n) = in_features(layer_n+1)
            # out_features for final layer = number of features in y_train

            nn.Linear(16, 64), # layer 1
            nn.ReLU(), # activation function

            nn.Linear(64, 64), # layer 2
            nn.ReLU(),
            
            nn.Linear(64, 1) # output layer
        ) # end of sequence

    # define a new instance of the model class
    # for the forward pass
    def forward(self, x):

        # pass x through all the NN layers
        out = self.layer_stack(x)
        # return the output to the training loop
        return out
    
# end of class construction block

### Initialise the model

In [114]:
# do i need to create a random seed for the initial parameters?
#torch.manual_seed(int)

# create an instance of the model
# and pass it to device (gpu / cpu)
#model_v0 = ParticleClassifier().to(device)

In [111]:
# create an instance of the model
pc_model = ParticleClassifier()

# print the model
print(pc_model)

# just prints out the structure we defined above

# print the model parameters
print(pc_model.parameters())

ParticleClassifier(
  (layer_stack): Sequential(
    (0): Linear(in_features=16, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=1, bias=True)
  )
)
<generator object Module.parameters at 0x000001D1A4DD7840>


### Create a Loss Function

Loss function:

torch.nn.BCEWithLogitsLoss()

This function is an improvment upon torch.nn.BCELoss(), where BCE = Binary Cross Entropy. BCE is used to calculate the loss of accuracy in binary classification problems.

<br>

- Loss function = BCEWithLogitsLoss. As we have a classification problem, the output is a probability, so must be between 0 and 1. The outputs from layer 2 may not produce the desired ouputs, so it is typical to add a sigmoid function after the final layer. However, this loss function, BCEWithLogitsLoss, is an improved version of its predecessor, BCELoss, that adds a sigmoid function automatically before optimising. This function is recommended over using 2 separate functions as it compiles both mathematical operations into 1 and reduces error.

In [112]:
# define the loss function
loss_fn = nn.BCEWithLogitsLoss()

### Create an Optimiser

- Optimiser function:

Adam optimiser - torch.optim.Adam()

- Learning rate:

Default setting is 0.001. Through trial and error this has proved to be the most effective learning rate for this network.

In [113]:
# define a learning rate
# default is 1e-3 (0.001)
l_rate = 0.0

# create an optimiser
# using the parameters from our model
# and the learning rate we have chosen
optimiser = optim.Adam(params=pc_model.parameters(), lr=l_rate)

### Train the model using a training loop

In [28]:
# now we have a class
# a model instance
# a loss function
# and an optimiser

# we can now build a training loop
# by following the steps in 01. PyTorch Workflow Fundamentals
# section: PyTorch training loop

In [None]:
# set model to 'training mode'
pc_model.train()

### Evaluate the model

In [None]:
# set model to 'evaluation mode'
model_v0.eval()