	
This notebook contains code and concepts from the "Introduction to Deep Learning with PyTorch" course on DataCamp, originally developed by Maham Khan.

I have added extra comments, explanations, and modifications to the code to aid understanding and provide additional context.

## First Neural Network


In [2]:
import torch
import torch.nn as nn
## Create input
# tensor with three features
input_tensor = torch.tensor(
[[0.3471, 0.4547, -0.2356]])

# Define our first linear layer
linear_layer = nn.Linear(in_features=3, out_features=2)

# Pass input through linear layer
output = linear_layer(input_tensor)
print(output)

tensor([[-0.0108, -0.0725]], grad_fn=<AddmmBackward0>)


In [3]:
# Print the weight and bias of the linear layer
print("linear_layer.weight: ", linear_layer.weight)
print("linear_layer.bias: ", linear_layer.bias)

linear_layer.weight:  Parameter containing:
tensor([[ 0.4233,  0.1631,  0.3090],
        [-0.4805, -0.2379,  0.3753]], requires_grad=True)
linear_layer.bias:  Parameter containing:
tensor([-0.1591,  0.2909], requires_grad=True)


### Linear Layer operation 

For input $X$, with weights $W_0$ and bias $b_0$, the linear layer operation is:

$y_0 = W_0 * X + b_0$

- weights $W_0$ and bias $b_0$ are randomly initialized 
- $y_0$ is the output of the linear layer
- tuning the weights and biases is the process of training the model
- weights and biases are adjusted to minimize the loss function
- the loss function measures how well the model's predictions match the actual data


In [4]:
# Pass input through linear layer again
output = linear_layer(input_tensor)
print(output)

tensor([[-0.0108, -0.0725]], grad_fn=<AddmmBackward0>)


In [5]:
# Random values from normal distribution (mean=0, std=1)
input_tensor = torch.randn(10) 

# Create network with three linear layers
model = nn.Sequential(
    nn.Linear(10, 18),
    nn.Linear(18, 20),
    nn.Linear(20, 5)
)

# Pass input through model
output = model(input_tensor)
print(output)   

tensor([ 0.1432, -0.1754, -0.0240,  0.7588,  0.3835], grad_fn=<ViewBackward0>)


In [12]:
# Batch of 32 samples, each with 10 features
batch_tensor = torch.randn(32, 10)  # Shape: (batch_size, features)

# Shape of batch tensor
batch_tensor.shape

# First 5 samples in the batch
batch_tensor[:5]

tensor([[ 3.4837e-01,  5.4804e-02, -7.4907e-01, -3.8952e-01,  5.0334e-01,
          1.0518e+00,  4.2172e-01, -2.3033e+00, -9.0051e-02, -1.0382e-01],
        [-1.3934e+00, -1.4990e+00, -6.9818e-01, -7.0174e-01, -9.5970e-01,
         -5.4904e-02, -5.1799e-01, -2.0767e-01, -1.3028e+00,  5.2704e-01],
        [-9.9970e-01, -1.3610e+00, -1.6454e+00,  3.7918e-01, -1.9644e+00,
          5.0976e-01,  1.4878e+00,  8.3231e-01,  3.4670e-01, -1.6091e+00],
        [ 9.5728e-01,  8.4726e-01,  3.5618e-01, -5.4742e-01,  7.2314e-01,
         -1.1938e+00,  1.2719e-01, -3.6542e-01,  1.2577e-01,  1.0746e+00],
        [ 1.0326e-01,  9.1493e-01, -5.1294e-01,  1.2244e+00, -6.2411e-01,
          3.5623e-01,  1.8922e-03, -6.4831e-01, -3.8082e-01,  3.3649e-01]])

### Why do we need activation functions?

- activation functions introduce nonlinearity into the model
- without activation functions, the model would be a linear regression
- activation functions allow the model to learn more complex relationships

#### Sigmoid function
- sigmoid function is a nonlinear function that maps any input value to a value between 0 and 1
- it is defined as $f(x) = 1 / (1 + e^{-x})$
- it is commonly used as the activation function for the last layer of a binary classification model    


#### Softmax function
- softmax function is a nonlinear function that maps any input value to a value between 0 and 1
- it is defined as $f(x) = e^x / \sum_{i=1}^{n} e^{x_i}$
- it is commonly used as the activation function for the last layer of a multiclass classification model
- the output of the softmax function is a probability distribution over the classes and the sum of the probabilities is 1

In [13]:
# Sigmoid function
input_tensor = torch.tensor([[6.0]])
sigmoid = nn.Sigmoid()
output = sigmoid(input_tensor)
print(output)

tensor([[0.9975]])


In [14]:
# Sigmoid function is equivalent to logistic regression when it is used as the last layer of a neural network

input_tensor = torch.randn(6) 

# Create network with two linear layers and a sigmoid activation function
model = nn.Sequential(
    nn.Linear(6, 4), # First linear layer
    nn.Linear(4, 1), # Second linear layer
    nn.Sigmoid() # Sigmoid activation function
)

# Pass input through model
output = model(input_tensor)
print(output)

tensor([0.5467], grad_fn=<SigmoidBackward0>)


In [15]:
input_tensor = torch.tensor([[6.0]])
print(input_tensor)

tensor([[6.]])


In [16]:
# Create an input tensor
# 3 features
# double brackets to create a 2D tensor with shape (1, 3)
# think of it as a batch of 1 sample with 3 features
input_tensor = torch.tensor([[4.3, 6.1, 2.3]])
# Apply softmax along the last dimension, the features 
# dim=-1 means the softmax function is applied to each row of the input tensor
probabilities = nn.Softmax(dim=-1)
output_tensor = probabilities(input_tensor)
print(output_tensor)
print(torch.sum(output_tensor))


tensor([[0.1392, 0.8420, 0.0188]])
tensor(1.)


In [17]:
# more on torch dimensions

# Create a 2D tensor
x = torch.tensor([[1, 2, 3],
                 [4, 5, 6]])
# Shape is (2, 3): 2 rows, 3 columns

print(torch.sum(x, dim=0))  # Sums down columns: [5, 7, 9]
print(torch.sum(x, dim=1))  # Sums across rows: [6, 15]
print(torch.sum(x, dim=-1)) # Same as dim=1: [6, 15]
print(torch.sum(x)) # Sum all elements: 21

tensor([5, 7, 9])
tensor([ 6, 15])
tensor([ 6, 15])
tensor(21)
