Preparing the Data
---------

First step is to prepare numerical input data.

We will use pandas to load the data and toold from scikit-learn to perform encoding.

There is dataset class provided by PyTorch, which can be used to customize and load our dataset.

In [None]:
# Skeleton of custom Dataset class

# dataset definition
class CSVDataset(Dataset):
    # load the dataset
    def __init__(self, path):
        # store the inputs and outputs
        self.X = ...
        self.y = ...
 
    # number of rows in the dataset
    def __len__(self):
        return len(self.X)
 
    # get a row at an index
    def __getitem__(self, idx):
        return [self.X[idx], self.y[idx]]

After loading, there is an instance (DataLoader class) provided during training and evaluation of our model.
This DataLoader instance can be created for training, testing or validation datasets.

To split the dataset there is an inhouse function `random_split()`. 
DataLoader also accepts batch size and shuffling (even with every epoch).


In [None]:
# sample
# create the dataset
dataset = CSVDataset(...)
# select rows from the dataset
train, test = random_split(dataset, [[...], [...]])
# create a data loader for train and test sets
train_dl = DataLoader(train, batch_size=32, shuffle=True)
test_dl = DataLoader(test, batch_size=1024, shuffle=False)

Once defined, a DataLoader can be enumerated, yielding one batch worth of samples each iteration.

In [None]:
...
# train the model
for i, (inputs, targets) in enumerate(train_dl):
    ....

Defining the model
-------

- Defining model refers to defining a class which extends th Module class (https://pytorch.org/docs/stable/nn.html#module)

- The constructor of your class defines the layers of the model and the forward() function is the override that defines how to forward propagate input through the defined layers of the model.

- Many layers are available, such as 
    - `Linear` for fully connected layers,
    - `Conv2d` for convolutional layers,
    - `MaxPool2d` for pooling layers.


- Activation functions can also be defined as layers, such as ReLU, Softmax, and Sigmoid.


In [None]:
#  a simple MLP model with one layer
# model definition
class MLP(Module):
    # define model elements
    def __init__(self, n_inputs):
        super(MLP, self).__init__()
        self.layer = Linear(n_inputs, 1)
        self.activation = Sigmoid()
 
    # forward propagate input
    def forward(self, X):
        X = self.layer(X)
        X = self.activation(X)
        return X

#The weights of a given layer can also be initialized after the layer is defined in the constructor.
#Common examples include the Xavier and He weight initialization schemes. For example:

xavier_uniform_(self.layer.weight)

Train the model
------

Training process requires us to define a loss function and an optimization algorithm.
Some common loss functions are:
- BCELoss: Binary cross-entropy loss for binary classification.
- CrossEntropyLoss: Categorical cross-entropy loss for multi-class classification.
- MSELoss: Mean squared loss for regression.

More function can be found here: [Loss and Loss Functions for Training Deep Learning Neural Networks](https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/)

Stochastic gradient descent is used for optimization, and is the standard algorithm provided by [SGD class](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) (default). We can also use [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam)

In [None]:
# define the optimization
criterion = MSELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

Training models involve enumerating `DataLoader` for training dataset.

Firstly, a loop is required for no. of training epochs. Then inner loop is required for the mini-batches gor stochastic gradient 

In [None]:
...
# enumerate epochs
for epoch in range(100):
    # enumerate mini batches
    for i, (inputs, targets) in enumerate(train_dl):
    	...

Each update to the model involves the same general pattern comprised of:

- Clearing the last error gradient.
- A forward pass of the input through the model.
- Calculating the loss for the model output.
- Backpropagating the error through the model.
- Update the model in an effort to reduce loss.

In [None]:
# clear the gradients
optimizer.zero_grad()
# compute the model output
yhat = model(inputs)
# calculate loss
loss = criterion(yhat, targets)
# credit assignment
loss.backward()
# update model weights
optimizer.step()

Evaluate the model
----

Once the model is `fit` we can evaluate its performance using test dataset. And we can still use `DataLoader` for test dataset and collect predictions for test set, then compare the predictions to the expected values of the test set and calculating a performace metric.

In [None]:
for i, (inputs, targets) in enumerate(test_dl):
    # evaluate the model on the test set
    yhat = model(inputs)
    ...

Make predictions
----

A ***fit model*** can be used to make predictions on new data, i.e. you can use either single image or single row of data to make prediction. However, this requires you to wrap the data in PyTorch Tensor data structure.

A **Tensor** is just the PyTorch version of a NumPy array for holding data. It also allows you to perform the automatic differentiation tasks in the model graph, like calling `backward()` when training the model. The prediction will be Tensor too, although you can retrieve NumPy array by [detaching the Tensor](https://pytorch.org/docs/stable/autograd.html#torch.Tensor.detach) from the automatic differentiation graph and calling the NumPy function.

In [None]:
...
# convert row to data
row = Variable(Tensor([row]).float())
# make prediction
yhat = model(row)
# retrieve numpy array
yhat = yhat.detach().numpy()