### Creating a MLP regression model with PyTorch
The first step is to import all the required dependencies. Then, we will use a Multilayer Perceptron based model, which is essentially a stack of layers containing neurons that can be trained.
We also have to ensure that the dataset is prepared into a DataLoader, which ensures that data is shuffled and batched appropriately.
Then, we pick a loss function and initialize it. We also init the model and the optimizer (Adam).
Finally, we create the training loop, which effectively contains the high-level training process captured in code.

In [24]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import pandas as pd
import MyData
import matplotlib.pyplot as plt
import numpy as np

# Loading and preparing the data
personally I don't think normalization is needed for this project but here is the way:

In [25]:
df = pd.read_csv('dataset/newdataset.csv')
# d = preprocessing.normalize(df, axis=0)
# scaled_df = pd.DataFrame(d, columns=names)
# scaled_df.head()

Splitting the dataset into test and train with 20% to 80% ratio using scikit-learn API. Then, saving them separately to directory.

In [26]:
train, test = train_test_split(df, test_size=0.2, random_state=42)
test.to_csv('dataset/testdata.csv', index=False)
np.shape(test)

(220436, 9)

Splitting validation set from train set:

In [27]:
train, validation = train_test_split(train, test_size=0.2, random_state=42)
train.to_csv('dataset/traindata.csv', index=False)
validation.to_csv('dataset/validationdata.csv', index=False)
np.shape(validation)


         8.590299999999999325e-02  3.558799999999999741e-01  \
595786                   0.404973                  1.202640   
822041                   1.730303                  1.313080   
359892                   2.116903                  0.601320   
675018                   2.184403                  0.030680   
206002                   0.619733                  0.454055   
...                           ...                       ...   
36465                    0.092039                  0.674950   
1043488                  1.853003                  1.257860   
559273                   0.085903                  0.355880   
724376                   0.110447                  0.319064   
139357                   2.184403                  0.625860   

         2.454399999999999984e-02  0.000000000000000000e+00  \
595786                   0.079767                       0.0   
822041                   1.116744                       0.0   
359892                   0.190214                     

In [29]:
print(validation[139357,:])

InvalidIndexError: (139357, slice(None, None, None))

Defining hyper parameters

In [12]:
learning_rate = 1e-6
batch_size = 25

In [13]:
class GainDataset(torch.utils.data.Dataset):

    def __init__(self, file_name):
        gain_df = pd.read_csv(file_name)

        x = gain_df.iloc[:, 0:6].values
        y = gain_df.iloc[:, 6:9].values

        self.x_train = torch.tensor(x, dtype=torch.float32)
        self.y_train = torch.tensor(y, dtype=torch.float32)

    def __len__(self):
        return len(self.y_train)

    def __getitem__(self, idx):
        return self.x_train[idx], self.y_train[idx]

Creating a loader for the test set which will read the data within batch size and put into memory.
Note that each shuffle is set to false for the test loader.
Here we also define inputs and outputs (features and lables). 

In [14]:
path = 'dataset/traindata.csv'
train_data = GainDataset(path)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
print("The tensor shape in a training set is: ", len(train_loader) * batch_size)

path = 'dataset/testdata.csv'
test_data = GainDataset(path)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)
print("The tensor shape in a test set is: ", len(test_loader) * batch_size)

path = 'dataset/validationdata.csv'
valid_data = GainDataset(path)
valid_loader = DataLoader(valid_data, batch_size=batch_size, shuffle=False)
print("The tensor shape in a valid set is: ", len(valid_loader) * batch_size)

The tensor shape in a training set is:  705400
The tensor shape in a test set is:  220450
The tensor shape in a valid set is:  176350


# Defining the neural network
Because we want the best fit on 6 inputs and 3 outputs, we will have two hidden layers.

In [15]:
input_size = 6
output_size = 3

class Network(nn.Module):
    def __init__(self, input_size, output_size):
        super(Network, self).__init__()

        self.layer1 = nn.Linear(input_size, 64)
        self.layer2 = nn.Linear(64, 32)
        self.layer3 = nn.Linear(32, 16)
        self.layer4 = nn.Linear(16, output_size)

    def forward(self, x):
        x = nn.functional.relu(self.layer1(x))
        x = nn.functional.relu(self.layer2(x))
        x = nn.functional.relu(self.layer3(x))
        x = self.layer4(x)
        return x

# Instantiate the model
model = Network(input_size, output_size)

Define your execution device

In [16]:
device = torch.device("cpu")
print("The model will be running on", device, "device\n")
model.to(device)  # Convert model parameters and buffers to CPU or Cuda

The model will be running on cpu device



Network(
  (layer1): Linear(in_features=6, out_features=64, bias=True)
  (layer2): Linear(in_features=64, out_features=32, bias=True)
  (layer3): Linear(in_features=32, out_features=16, bias=True)
  (layer4): Linear(in_features=16, out_features=3, bias=True)
)

Function to save the model

In [17]:
def saveModel():
    path = "trainedmodels/MLP2.pth"
    torch.save(model.state_dict(), path)

Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer

In [18]:
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=0.0001)

In [19]:
def set_globvar():
    global best_val_error    # Needed to modify global copy of best_val_error
    best_val_error = 1e10

## Training Function

In [20]:
def train(num_epochs):
    best_val_error = 1e10
    print("Begin training...")
    for epoch in range(1, num_epochs + 1):
        running_train_loss = 0.0 
        running_accuracy = 0.0 
        running_vall_loss = 0.0 
        total = 0 

        # Training Loop
        for data in train_loader:
            # for data in enumerate(train_loader, 0):
            inputs, outputs = data
            optimizer.zero_grad()  # zero the parameter gradients
            predicted_outputs = model(inputs)  # predict output from the model
            train_loss = loss_fn(predicted_outputs, outputs)  # calculate loss for the predicted output
            train_loss.backward()  # backpropagation
            optimizer.step()  # adjust parameters based on the calculated gradients
            # print('epoch {}, train loss {}'.format(epoch, train_loss.data))
            running_train_loss += train_loss.item()  # track the loss value

        # Calculate training loss value
        train_loss_value = running_train_loss / len(train_loader)
        # print('Completed training batch', epoch, 'Training Loss is: %.4f' % train_loss_value)
        
        # Validation Loop 
        with torch.no_grad(): 
            model.eval() 
            for data in valid_loader: 
               inputs, outputs = data 
               predicted_outputs = model(inputs) 
               val_loss = loss_fn(predicted_outputs, outputs) 
            #    print('epoch {}, validation loss {}'.format(epoch, val_loss.data))
               running_vall_loss += val_loss.item()  
               running_accuracy += running_vall_loss
 
        # Calculate validation loss value 
        val_loss_value = running_vall_loss/len(valid_loader) 
        print('Completed validation batch', epoch, 'Validation Loss is: %.4f' % val_loss_value)
        
        # Calculate accuracy as the number of correct predictions in the validation batch divided by the total number of predictions done.  
        accuracy = running_accuracy/len(valid_loader)
    
        # print("Average accuracy: %f" % accuracy)
        # print("Test count error: %f" % val_loss_value/250)    
 
        # Save the model if the accuracy is the best 
        if val_loss_value < best_val_error:
            best_val_error = val_loss_value

            saveModel()
        # Print the statistics of the epoch
        print('Completed training batch', epoch, 'Training Loss is: %.4f' % train_loss_value)

## Testing

In [21]:
def test():
    # Load the model that we saved at the end of the training loop
    model = Network(input_size, output_size)
    path = "trainedmodels/MLP2.pth"
    model.load_state_dict(torch.load(path))
    model.eval()
    running_accuracy = 0

    with torch.no_grad():
        for data in test_loader:
            inputs, outputs = data
            outputs = outputs.to(torch.float32)
            predicted_outputs = model(inputs)
            error = predicted_outputs - outputs # honestly I dont know what to do with the error nor how to interpret it haha:))
            running_accuracy = mean_squared_error(outputs, predicted_outputs)

        print('Accuracy of the model based on the test set of', len(test_loader) * batch_size,
              'inputs is: %d %%' % running_accuracy) # this should somehow be in percents

Now, we run the main program:

In [23]:
if __name__ == "__main__":
    torch.manual_seed(42)
    num_epochs = 60
    train(num_epochs)
    print('Finished Training\n')
    test()

Begin training...
epoch 1, train loss 20511.9609375
epoch 1, train loss 18055.55078125
epoch 1, train loss 17637.916015625
epoch 1, train loss 19925.26953125
epoch 1, train loss 22732.939453125
epoch 1, train loss 21213.25
epoch 1, train loss 19207.015625
epoch 1, train loss 22522.279296875
epoch 1, train loss 18670.869140625
epoch 1, train loss 15138.236328125
epoch 1, train loss 22238.359375
epoch 1, train loss 17270.08984375
epoch 1, train loss 18715.41796875
epoch 1, train loss 20124.38671875
epoch 1, train loss 17840.677734375
epoch 1, train loss 20928.421875
epoch 1, train loss 20155.58203125
epoch 1, train loss 22658.283203125
epoch 1, train loss 20032.80078125
epoch 1, train loss 18703.59765625
epoch 1, train loss 17032.33984375
epoch 1, train loss 22303.076171875
epoch 1, train loss 19071.095703125
epoch 1, train loss 20618.91796875
epoch 1, train loss 18373.373046875
epoch 1, train loss 22388.029296875
epoch 1, train loss 18764.265625
epoch 1, train loss 25344.447265625
epoch

KeyboardInterrupt: 

# Loading and using the model
** Please refer to gettingoutputs.py **