## References:
- Machine Learning with Python Cookbook, 2nd Edition, Kyle Gallatin, Chris Albon, O'Reilly Media, Inc.
- http://alexlenail.me/NN-SVG/index.html
- https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
- https://learn.udacity.com/courses/ud187
- https://en.wikipedia.org/wiki/Artificial_neural_network
- https://aws.amazon.com/what-is/neural-network/
- https://thedatascientist.com/what-deep-learning-is-and-isnt/
- https://www.ibm.com/topics/gradient-descent

### Traditional programming vs machine learning:

<img src="./traditional_vs_machine_learning1.png" align="center" width="800" />

### What is an artificial neural network?

A artificial neural network is a method in artificial intelligence that teaches computers to process data in a way that is **inspired by the human brain**. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain.

Neural networks can help computers make intelligent decisions with limited human assistance. This is because they can **learn and model the relationships between input and output data that are nonlinear and complex**.

Reference : https://aws.amazon.com/what-is/neural-network/ <br>

<img src="./Neuron3.png" align="center" width="500" />

Image credit : https://en.wikipedia.org/wiki/Artificial_neural_network

### From artificial neural network to 'deep' artificial neural network

<img src="./simple_vs_deep_neural_network.png" align="center" width="900" />

- **Input layer** : Information from outside the world enters into the artifical neural network from the input layer.
- **Hidden layer** : Hidden layers take their input from the input layer or other hidden layers. Artificial neural networks can have a large number of hidden layers. Each hidden layer analyzes the output from the previous layer, processes it further, and passes it on to the next layer.
- **Output layer** : The output layer gives the final result of all the data processing by the artificial neural network. It can have single or multiple nodes. For instance, if we have a binary (yes/no) classification problem, the output layer will have one output node, which will give the result as 1 or 0. However, if we have a multi-class classification problem, the output layer might consist of more than one output node.

Image credit : https://thedatascientist.com/what-deep-learning-is-and-isnt/

### Traditional machine learning vs deep learning

<img src="./machine_learning_vs_deep_learning.png" align="center" width="800" />

- In traditional machine learning, a data scientist needs to determine the set of features which the model would learn from.
- In deep learning the model derives the features itself and learns independently. Deep learning shines in case of unstructured data e.g., image, text, audio etc.

Image credit : https://thedatascientist.com/what-deep-learning-is-and-isnt/

### Let's look inside an individual artificial neural network node:

We are targeting to find the rule to convert temperature in Celsius scale to Fahrenheit which is mentioned above using a one node neural network.

<img src="./neural_network_one_node1.png" align="center" width="1000" />

Image credit : https://www.ibm.com/topics/gradient-descent

### Building a full blown artificial neural network from the concept mentioned above:

<img src="./neural_network_nodes1.png" align="center" width="800" />

### Training an artifical neural network:

<img src="./neural_network_training.png" align="center" width="800" />

### It's time for coding :-)

In [1]:
# We import all the necessary libraries at the beginning
import torch
import torch.nn as nn 
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader, TensorDataset
from torch.optim import RMSprop 
from sklearn.datasets import make_regression 
from sklearn.model_selection import train_test_split

In [2]:
print(torch.__version__)

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

# Additional Info when using CUDA
if device.type == 'cuda':
    print(torch.cuda.is_available())
    print(torch.cuda.device_count())
    print(torch.cuda.device(0))
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

2.0.1+cu118
Using device: cuda

True
1
<torch.cuda.device object at 0x0000020B0A98D4F0>
NVIDIA GeForce RTX 3060 Laptop GPU
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB


### What is CUDA?
CUDA is a programming model and computing toolkit developed by NVIDIA. It enables you to perform compute-intensive operations faster by parallelizing tasks across GPUs. CUDA is the dominant API used for deep learning although other options are available, such as OpenCL. PyTorch provides support for CUDA in the torch.

In [3]:
# We generate a sample regression dataset
features, target = make_regression(n_features = 5, n_samples = 1000)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size = 0.1, random_state = 1)

In [4]:
# Let's look into top few records from the training dataset
pd.DataFrame(np.hstack((features_train, target_train.reshape(-1,1)))).head()

Unnamed: 0,0,1,2,3,4,5
0,-0.433935,1.031306,-2.965524,0.134074,0.697376,-68.741145
1,0.109711,0.867638,0.069104,-1.912093,-0.061461,48.917565
2,-0.046456,0.541363,-1.058704,-0.907559,-0.526042,-76.572045
3,-0.191237,0.705107,-1.787081,0.806705,-0.404556,-79.933531
4,0.435432,-1.125913,0.215511,-1.070098,-0.74066,-122.37619


In [5]:
# Let's look into top few records from the test dataset
pd.DataFrame(np.hstack((features_test, target_test.reshape(-1,1)))).head()

Unnamed: 0,0,1,2,3,4,5
0,0.73228,-1.648323,2.318695,0.917547,0.100946,79.127472
1,0.009549,-0.154502,-1.509998,-0.327571,-0.179232,-121.631682
2,-0.178757,-0.719495,-0.209334,-1.550079,-1.714379,-242.437942
3,-1.967328,-0.557492,-0.278034,-0.447881,-1.378088,-324.937476
4,-0.151556,0.85911,-0.59021,-0.222043,-1.841062,-118.739455


In [6]:
# Print the type and shape of the dataset
print('Training feature set -> type - {}, shape - {}'.format(type(features_train), features_train.shape))
print(features_train[0])

Training feature set -> type - <class 'numpy.ndarray'>, shape - (900, 5)
[-0.43393538  1.03130648 -2.96552365  0.13407387  0.697376  ]


In [7]:
# Print the type and shape of the dataset
print('Test feature set -> type - {}, shape - {}'.format(type(features_test), features_test.shape))
print(features_test[0])

Test feature set -> type - <class 'numpy.ndarray'>, shape - (100, 5)
[ 0.73228021 -1.64832261  2.31869477  0.91754668  0.10094596]


In [8]:
# Print the type and shape of the dataset
print('Training target set -> type - {}, shape - {}'.format(type(target_train), target_train.shape))
print(target_train[0])

Training target set -> type - <class 'numpy.ndarray'>, shape - (900,)
-68.74114460661815


In [9]:
# Print the type and shape of the dataset
print('Test target set -> type - {}, shape - {}'.format(type(target_test), target_test.shape))
print(target_test[0])

Test target set -> type - <class 'numpy.ndarray'>, shape - (100,)
79.1274718490457


In [10]:
# To work with Pytorch, we need to convert the ndarrays to tensors
x_train = torch.from_numpy(features_train).float() 
y_train = torch.from_numpy(target_train).float().view(-1,1) 
x_test = torch.from_numpy(features_test).float()
y_test = torch.from_numpy(target_test).float().view(-1,1)

In [11]:
# Print the type and shape of the converted dataset
print('Training feature set -> type - {}, shape - {}, is CUDA - {}'.format(type(x_train), x_train.shape, x_train.is_cuda))
print(x_train[0])

Training feature set -> type - <class 'torch.Tensor'>, shape - torch.Size([900, 5]), is CUDA - False
tensor([-0.4339,  1.0313, -2.9655,  0.1341,  0.6974])


In [12]:
# Print the type and shape of the converted dataset
print('Test feature set -> type - {}, shape - {}, is CUDA - {}'.format(type(x_test), x_test.shape, x_test.is_cuda))
print(x_test[0])

Test feature set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 5]), is CUDA - False
tensor([ 0.7323, -1.6483,  2.3187,  0.9175,  0.1009])


In [13]:
# Print the type and shape of the converted dataset
print('Train target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_train), y_train.shape, y_train.is_cuda))
print(y_train[0])

Train target set -> type - <class 'torch.Tensor'>, shape - torch.Size([900, 1]), is CUDA - False
tensor([-68.7411])


In [14]:
# Print the type and shape of the converted dataset
print('Test target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_test), y_test.shape, y_test.is_cuda))
print(y_test[0])

Test target set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 1]), is CUDA - False
tensor([79.1275])


In [15]:
# Now, send all the datasets to GPU, if available
x_train = x_train.to(device)
x_test = x_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

# Print the type and shape of the converted dataset
print('Test target set -> type - {}, shape - {}, is CUDA - {}'.format(type(y_test), y_test.shape, y_test.is_cuda))
print(y_test[0])

Test target set -> type - <class 'torch.Tensor'>, shape - torch.Size([100, 1]), is CUDA - True
tensor([79.1275], device='cuda:0')


In [16]:
# Define the Neural Netowrk to solve this regression problem
class RegressorNeuralNet(nn.Module): 
    def __init__(self): 
        super(RegressorNeuralNet, self).__init__()
        self.fc1 = nn.Linear(5, 10) # in_features=5, out_features=10
        self.fc2 = nn.Linear(10, 10) # in_features=10, out_features=10
        self.fc3 = nn.Linear(10, 1)  # in_features=10, out_features=1
        
    def forward(self, x): 
        x = nn.functional.relu(self.fc1(x))    # We used RELU as the activation function
        x = nn.functional.relu(self.fc2(x))    # We used RELU as the activation function
        x = self.fc3(x)                        # No activation function at the output layer as this is a regression problem
        return x

### Activation function:
- The purpose of an activation function is to introduce **non-linearity** into the output of a neuron.
- Following are few popular activation functions for hidden layers:
    - Rectified linear activation function or RELU
    - Sigmoid
    - TanH
- Following are few popular activation functions for output layers:
    - Linear
    - Sigmoid
    - Softmax

<table>
    <tr>
      <td>
      <img src='./Linear.png' width=300>
      </td>
      <td>
      <img src='./RELU.png' width=300>
      </td>
     </tr>
    <tr>
      <td>
      Linear
      </td>
      <td>
      RELU
      </td>
     </tr>  
    <tr>
      <td>
      <img src='./Sigmoid.png' width=300>
      </td>
      <td>
      <img src='./TanH.png' width=300>
      </td>
     </tr>
    <tr>
      <td>
      Sigmoid
      </td>
      <td>
      TanH
      </td>
     </tr>     
</table>

In [17]:
# Initialize the network now
network = RegressorNeuralNet()

# Define loss function and the optimizer
criterion = nn.MSELoss()
optimizer = RMSprop(network.parameters())

## Calculating loss for regression problems:
### We need to know how close are the predictions of the network to the actual targets
- Mean Absolute Error
- Mean Squared Error, this is in the square of the target value, so not very intuititive
- Root Mean Squared Error, this is in the same unit as the target value

In [18]:
# Print the network
network

RegressorNeuralNet(
  (fc1): Linear(in_features=5, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=1, bias=True)
)

<img src="./regressor_neural_network.png" align="left" width="500" />

In [19]:
# Send the network to GPU as well, if available
network.to(device)

RegressorNeuralNet(
  (fc1): Linear(in_features=5, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=1, bias=True)
)

In [20]:
# Next, we define the dataloader
train_data = TensorDataset(x_train, y_train) 
train_loader = DataLoader(train_data, batch_size = 100, shuffle = True)

In [21]:
# Define the number of EPOCHS we want to train
EPOCHS = 20

In [22]:
# Start the training
for epoch in range(EPOCHS):
    for batch_idx, (data, target) in enumerate(train_loader): 
        optimizer.zero_grad()  # This is very important to zero out the gradient before every iterations
        output = network(data) # This is where we are performing the forward propagation
        loss = criterion(output, target) # This is where we are calculating the loss
        loss.backward() # This is where we are performing the backward propagation
        optimizer.step() # This is where we update the network parameters
    print("Epoch:", epoch+1, "\tLoss:", loss.item())

Epoch: 1 	Loss: 13851.1435546875
Epoch: 2 	Loss: 6004.14990234375
Epoch: 3 	Loss: 2299.4560546875
Epoch: 4 	Loss: 662.6420288085938
Epoch: 5 	Loss: 363.2129211425781
Epoch: 6 	Loss: 257.5631408691406
Epoch: 7 	Loss: 333.099609375
Epoch: 8 	Loss: 232.1615447998047
Epoch: 9 	Loss: 196.04981994628906
Epoch: 10 	Loss: 201.78790283203125
Epoch: 11 	Loss: 179.89122009277344
Epoch: 12 	Loss: 185.03121948242188
Epoch: 13 	Loss: 170.4311065673828
Epoch: 14 	Loss: 153.2260284423828
Epoch: 15 	Loss: 143.64537048339844
Epoch: 16 	Loss: 117.62078857421875
Epoch: 17 	Loss: 97.60050201416016
Epoch: 18 	Loss: 115.50859832763672
Epoch: 19 	Loss: 113.16426849365234
Epoch: 20 	Loss: 82.55403137207031


In [23]:
# Evaluate neural network 
with torch.no_grad(): # When we evaluate the network, we dont track the gradients
    output = network(x_test) # We derive the output
    test_loss = float(criterion(output, y_test)) # We calculate the loss
    print("Test MSE:", test_loss)

Test MSE: 87.18633270263672
