## References:
- Machine Learning with Python Cookbook, 2nd Edition, Kyle Gallatin, Chris Albon, O'Reilly Media, Inc.
- http://alexlenail.me/NN-SVG/index.html
- https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
- https://learn.udacity.com/courses/ud187

### Traditional programming vs machine learning:

<img src="./traditional_vs_machine_learning1.png" align="left" width="500" />

### Let's look inside an individual artificial neural network node:

We are targeting to find the rule to convert temperature in Celsius scale to Fahrenheit which is mentioned above using a one node neural network.

<img src="./neural_network_one_node.png" align="left" width="500" />

### Building a full blown artificial neural network from the concept mentioned above:

<img src="./neural_network_nodes.png" align="left" width="800" />

### Training an artifical neural network:

<img src="./neural_network_training.png" align="left" width="800" />

### It's time for coding :-)

In [1]:
# We import all the necessary libraries at the beginning
import torch
import torch.nn as nn 
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader, TensorDataset
from torch.optim import RMSprop 
from sklearn.datasets import make_regression 
from sklearn.model_selection import train_test_split

In [2]:
print(torch.__version__)

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

# Additional Info when using CUDA
if device.type == 'cuda':
    print(torch.cuda.is_available())
    print(torch.cuda.device_count())
    print(torch.cuda.device(0))
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

2.0.1+cu118
Using device: cuda

True
1
<torch.cuda.device object at 0x0000027D21F06760>
NVIDIA GeForce RTX 3060 Laptop GPU
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB


### What is CUDA?
CUDA is a programming model and computing toolkit developed by NVIDIA. It enables you to perform compute-intensive operations faster by parallelizing tasks across GPUs. CUDA is the dominant API used for deep learning although other options are available, such as OpenCL. PyTorch provides support for CUDA in the torch.

In [3]:
# We generate a sample regression dataset
features, target = make_regression(n_features = 5, n_samples = 1000)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size = 0.1, random_state = 1)

In [4]:
# Let's look into top few records from the training dataset
pd.DataFrame(np.hstack((features_train, target_train.reshape(-1,1)))).head()

Unnamed: 0,0,1,2,3,4,5
0,1.790331,-2.10158,-0.110377,-0.422331,0.016574,-110.669147
1,1.305396,-0.690108,-1.007596,0.157143,1.297853,24.863116
2,0.751363,-0.671943,0.117528,-0.175634,-0.064575,-27.985575
3,1.160472,1.359596,1.955973,1.232684,-0.160331,271.165883
4,0.148997,-0.997192,-0.796308,0.080886,0.720172,-72.511648


In [5]:
# Let's look into top few records from the test dataset
pd.DataFrame(np.hstack((features_test, target_test.reshape(-1,1)))).head()

Unnamed: 0,0,1,2,3,4,5
0,-1.403591,0.978448,0.155021,0.38615,0.367363,33.490278
1,0.037344,-1.267029,-1.186725,-2.213819,2.129834,-258.604581
2,1.192807,0.087156,-0.112835,-1.984649,-2.141426,-66.565532
3,0.313051,-0.062769,2.076876,1.006667,0.159313,85.773754
4,-0.673847,1.755563,-0.673929,0.856422,1.371782,174.391616


In [6]:
# Print the type and shape of the dataset
print('Training feature set -> type - {}, shape - {}'.format(type(features_train), features_train.shape))
print(features_train[0])

Training feature set -> type - <class 'numpy.ndarray'>, shape - (900, 5)
[ 1.7903311  -2.10157994 -0.11037699 -0.42233097  0.01657354]


In [7]:
# Print the type and shape of the dataset
print('Test feature set -> type - {}, shape - {}'.format(type(features_test), features_test.shape))
print(features_test[0])

Test feature set -> type - <class 'numpy.ndarray'>, shape - (100, 5)
[-1.40359058  0.97844822  0.15502097  0.38614953  0.36736304]


In [8]:
# Print the type and shape of the dataset
print('Training target set -> type - {}, shape - {}'.format(type(target_train), target_train.shape))
print(target_train[0])

Training target set -> type - <class 'numpy.ndarray'>, shape - (900,)
-110.66914726665077


In [9]:
# Print the type and shape of the dataset
print('Test target set -> type - {}, shape - {}'.format(type(target_test), target_test.shape))
print(target_test[0])

Test target set -> type - <class 'numpy.ndarray'>, shape - (100,)
33.490278361320044


In [10]:
# To work with Pytorch, we need to convert the ndarrays to tensors
x_train = torch.from_numpy(features_train).float() 
y_train = torch.from_numpy(target_train).float().view(-1,1) 
x_test = torch.from_numpy(features_test).float()
y_test = torch.from_numpy(target_test).float().view(-1,1)

In [11]:
# Print the type and shape of the converted dataset
print('Training feature set -> type - {}, shape - {}, is CUDA - {}'.format(type(x_train), x_train.shape, x_train.is_cuda))
print(x_train[0])

Training feature set -> type - <class 'torch.Tensor'>, shape - torch.Size([900, 5]), is CUDA - False
tensor([ 1.7903, -2.1016, -0.1104, -0.4223,  0.0166])


In [12]:
# Print the type and shape of the converted dataset
print('Test feature set -> type - {}, shape - {}, is CUDA - {}'.format(type(x_test), x_test.shape, x_test.is_cuda))
print(x_test[0])

Test feature set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 5]), is CUDA - False
tensor([-1.4036,  0.9784,  0.1550,  0.3861,  0.3674])


In [13]:
# Print the type and shape of the converted dataset
print('Train target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_train), y_train.shape, y_train.is_cuda))
print(y_train[0])

Train target set -> type - <class 'torch.Tensor'>, shape - torch.Size([900, 1]), is CUDA - False
tensor([-110.6691])


In [14]:
# Print the type and shape of the converted dataset
print('Test target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_test), y_test.shape, y_test.is_cuda))
print(y_test[0])

Test target set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 1]), is CUDA - False
tensor([33.4903])


In [15]:
# Now, send all the datasets to GPU, if available
x_train = x_train.to(device)
x_test = x_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

# Print the type and shape of the converted dataset
print('Test target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_test), y_test.shape, y_test.is_cuda))
print(y_test[0])

Test target set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 1]), is CUDA - True
tensor([33.4903], device='cuda:0')


In [16]:
# Define the Neural Netowrk to solve this regression problem
class RegressorNeuralNet(nn.Module): 
    def __init__(self): 
        super(RegressorNeuralNet, self).__init__()
        self.fc1 = nn.Linear(5, 10) # in_features=5, out_features=10
        self.fc2 = nn.Linear(10, 10) # in_features=10, out_features=10
        self.fc3 = nn.Linear(10, 1)  # in_features=10, out_features=1
        
    def forward(self, x): 
        x = nn.functional.relu(self.fc1(x))    # We used RELU as the activation function
        x = nn.functional.relu(self.fc2(x))    # We used RELU as the activation function
        x = self.fc3(x)                        # No activation function at the output layer as this is a regression problem
        return x

### Activation function:
- The purpose of an activation function is to introduce **non-linearity** into the output of a neuron.
- Following are few popular activation functions for hidden layers:
    - Rectified linear activation function or RELU
    - Sigmoid
    - TanH
- Following are few popular activation functions for output layers:
    - Linear
    - Sigmoid
    - Softmax

<table>
    <tr>
      <td>
      <img src='./Linear.png' width=300>
      </td>
      <td>
      <img src='./RELU.png' width=300>
      </td>
     </tr>
    <tr>
      <td>
      Linear
      </td>
      <td>
      RELU
      </td>
     </tr>  
    <tr>
      <td>
      <img src='./Sigmoid.png' width=300>
      </td>
      <td>
      <img src='./TanH.png' width=300>
      </td>
     </tr>
    <tr>
      <td>
      Sigmoid
      </td>
      <td>
      TanH
      </td>
     </tr>     
</table>

In [17]:
# Initialize the network now
network = RegressorNeuralNet()

# Define loss function and the optimizer
criterion = nn.MSELoss()
optimizer = RMSprop(network.parameters())

## Calculating loss for regression problems:
### We need to know how close are the predictions of the network to the actual targets
- Mean Absolute Error
- Mean Squared Error, this is in the square of the target value, so not very intuititive
- Root Mean Squared Error, this is in the same unit as the target value

In [18]:
# Print the network
network

RegressorNeuralNet(
  (fc1): Linear(in_features=5, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=1, bias=True)
)

<img src="./regressor_neural_network.png" align="left" width="500" />

In [19]:
# Send the network to GPU as well, if available
network.to(device)

RegressorNeuralNet(
  (fc1): Linear(in_features=5, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=1, bias=True)
)

In [20]:
# Next, we define the dataloader
train_data = TensorDataset(x_train, y_train) 
train_loader = DataLoader(train_data, batch_size = 100, shuffle = True)

In [21]:
# Define the number of EPOCHS we want to train
EPOCHS = 20

In [22]:
# Start the training
for epoch in range(EPOCHS):
    for batch_idx, (data, target) in enumerate(train_loader): 
        optimizer.zero_grad()  # This is very important to zero out the gradient before every iterations
        output = network(data) # This is where we are performing the forward propagation
        loss = criterion(output, target) # This is where we are calculating the loss
        loss.backward() # This is where we are performing the backward propagation
        optimizer.step() # This is where we update the network parameters
    print("Epoch:", epoch+1, "\tLoss:", loss.item())

Epoch: 1 	Loss: 8944.6884765625
Epoch: 2 	Loss: 7146.892578125
Epoch: 3 	Loss: 2484.45703125
Epoch: 4 	Loss: 1083.3734130859375
Epoch: 5 	Loss: 519.3450927734375
Epoch: 6 	Loss: 259.06658935546875
Epoch: 7 	Loss: 243.70608520507812
Epoch: 8 	Loss: 235.18472290039062
Epoch: 9 	Loss: 212.51470947265625
Epoch: 10 	Loss: 164.8671417236328
Epoch: 11 	Loss: 161.14413452148438
Epoch: 12 	Loss: 119.09620666503906
Epoch: 13 	Loss: 153.30059814453125
Epoch: 14 	Loss: 104.56725311279297
Epoch: 15 	Loss: 88.17508697509766
Epoch: 16 	Loss: 83.92818450927734
Epoch: 17 	Loss: 65.28795623779297
Epoch: 18 	Loss: 68.6308364868164
Epoch: 19 	Loss: 56.77515411376953
Epoch: 20 	Loss: 54.32670593261719


In [23]:
# Evaluate neural network 
with torch.no_grad(): # When we evaluate the network, we dont track the gradients
    output = network(x_test) # We derive the output
    test_loss = float(criterion(output, y_test)) # We calculate the loss
    print("Test MSE:", test_loss)

Test MSE: 39.95069885253906
