# Introduction to Deep Learning

## Tensors

- A tensor's order, which is one of its properties, determines its dimensionality.
    - An order 1 tensor is a single-dimensional array (similar to a vector).
    - An order 2 tensor is a two-dimensional array (similar to a matrix).
    - An order 3 tensor is a three-dimensional array, and this pattern continues for higher dimensions.
- The `.shape` function is used to return the dimensions of a tensor.

## PyTorch acceleration using CUDA

- PyTorch can efficiently leverage the computational power of GPUs.
- This ability allows computational tasks to be effectively parallelized, enhancing performance.
- Compute Unified Device Architecture (CUDA) is a technology developed by Nvidia. It enables hardware acceleration on PyTorch when used with Nvidia's GPUs.

## Comparing PyTorch to Other Frameworks

- There are several deep learning frameworks like TensorFlow, Theano, etc., but they differ from PyTorch in various ways:
    - Differences in how models are computed.
    - Variations in the way computational graphs are compiled.
    - Some frameworks support dynamic computational graphs with variable layers.
    - Syntax differences also exist among these frameworks.

- PyTorch utilizes a method called automatic differentiation (autograd) for computing gradients. This allows for dynamic definition and execution of computational graphs compared to static methods used in TensorFlow, which might limit flexibility during production.
- Unlike other frameworks, PyTorch does not require precompiled computational graphs before model training. 
- PyTorch models can adjust to variations in input size.
- The syntax of PyTorch is similar to Python, making it more familiar and easier to learn for Python users.

## Data Types in PyTorch

- PyTorch supports different types of data including:
    - Float Tensor: This tensor type is made up of 32-bit floating-point numbers.
    - Long Tensor: This tensor type consists of 64-bit integers.


# Code Segment

In [13]:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
import torch.utils.data as data
import torch
from torch.autograd import Variable

In [2]:
import torch
x = torch.tensor([1.,2.])
print(x)

#initialising a tensor

tensor([1., 2.])


In [3]:
x = torch.tensor([1., 2.])
y = torch.tensor([3., 4.])

print(x*y)

#multiplying tensors

tensor([3., 8.])


In [3]:
x = torch.tensor([[1., 2.],[5., 3.],[0., 4.]])

print(x[0][1])

#indexing a tensor, gotta do it like a double list way
#however returns a tensor object instead

tensor(2.)


In [4]:
#therefore gotta use .item() function

print(x[0][1].item())

2.0


In [5]:
x.shape

#shows (3x2) which is order 2.

torch.Size([3, 2])

In [4]:
cuda = torch.device('cuda')

In [5]:
x = torch.tensor([5., 3.], device=cuda)


In [6]:
y = torch.tensor([4., 2.]).cuda()



In [7]:
x*y

tensor([20.,  6.], device='cuda:0')

## Building a SimpleNN

In [9]:
#Load in training data

train = pd.read_csv("train.csv")
train_labels = train['label'].values

In [10]:
train = train.drop("label", axis =1).values.reshape(len(train), 1,28,28) 

#reshaped our input to (1, 1, 28, 28),
# which is a tensor of 1,000 images, each consisting of 28x28 pixels. (array type)

In [14]:
# convert to pytorch tensors to then be fed into a NN

X = torch.Tensor(train.astype(float))
y = torch.Tensor(train_labels).long() #what is long?

# a float tensor comprises 32 bit FP number
# X needs to be float inorder to compute gradients
# a long tensor comprises of 64 bit integer
# y is a discrete label for classification model (predicting 1 2 3 etc)

In [16]:
# Building actual NN
# NOTE:
# output of one layer must be the same size as the input of the next layer

class MNISTClassifier(nn.Module):
    # init used to set up layers and drop out
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784,392)
        self.fc2 = nn.Linear(392,196)
        self.fc3 = nn.Linear(196,98)
        self.fc4 = nn.Linear(98,10)
        
        #drop makes model more robust and prevents overfitting
        self.dropout = nn.Dropout(p=0.2)
    
    #forward method used to aply activation functions and define where dropout is placed at
    def forward(self, x):
        x = x.view(x.shape[0], -1)
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        x = F.log_softmax(self.fc4(x), dim =1)
        
        return x

In [17]:
model = MNISTClassifier()
loss_function = nn.NLLLoss()
opt = optim.Adam(model.parameters(), lr = 0.001)

In [18]:
for epoch in range(50):
    images = Variable(X)
    labels = Variable(y)
    
    opt.zero_grad()
    outputs = model(images)
    
    loss = loss_function(outputs, labels)
    loss.backward()
    opt.step()
    
    print('Epoch [%d/%d] Loss: %.4f' %(epoch+1, 50, loss.data.item()))

Epoch [1/50] Loss: 6.7822
Epoch [2/50] Loss: 3.5774
Epoch [3/50] Loss: 2.2566
Epoch [4/50] Loss: 1.7789
Epoch [5/50] Loss: 1.4516
Epoch [6/50] Loss: 1.2382
Epoch [7/50] Loss: 1.1078
Epoch [8/50] Loss: 1.0144
Epoch [9/50] Loss: 0.8970
Epoch [10/50] Loss: 0.7774
Epoch [11/50] Loss: 0.6566
Epoch [12/50] Loss: 0.6934
Epoch [13/50] Loss: 0.5916
Epoch [14/50] Loss: 0.5786
Epoch [15/50] Loss: 0.4943
Epoch [16/50] Loss: 0.4699
Epoch [17/50] Loss: 0.4096
Epoch [18/50] Loss: 0.4108
Epoch [19/50] Loss: 0.3665
Epoch [20/50] Loss: 0.3310
Epoch [21/50] Loss: 0.3175
Epoch [22/50] Loss: 0.2983
Epoch [23/50] Loss: 0.2360
Epoch [24/50] Loss: 0.2587
Epoch [25/50] Loss: 0.2116
Epoch [26/50] Loss: 0.2185
Epoch [27/50] Loss: 0.1822
Epoch [28/50] Loss: 0.1858
Epoch [29/50] Loss: 0.1485
Epoch [30/50] Loss: 0.1343
Epoch [31/50] Loss: 0.1566
Epoch [32/50] Loss: 0.1252
Epoch [33/50] Loss: 0.1271
Epoch [34/50] Loss: 0.0923
Epoch [35/50] Loss: 0.1016
Epoch [36/50] Loss: 0.0971
Epoch [37/50] Loss: 0.0907
Epoch [38/

In [20]:
# Making predictions

test = pd.read_csv("test.csv")
test_labels = test['label'].values
test = test.drop("label", axis =1).values.reshape(len(test), 1, 28,28)


In [21]:
X_test = torch.Tensor(test.astype(float))
y_test = torch.Tensor(test_labels).long()

In [22]:
preds = model(X_test)

In [23]:
print(preds[0])

tensor([-1.9699e+01, -1.4701e+01, -1.5058e+01, -1.3170e+01, -1.0350e+01,
        -1.6658e+01, -1.7644e+01, -8.0809e+00, -7.4689e+00, -9.1499e-04],
       grad_fn=<SelectBackward0>)


In [25]:
_, predictionlabel = torch.max(preds.data,1)
predictionlabel = predictionlabel.tolist()
predictionlabel = pd.Series(predictionlabel)
test_labels = pd.Series(test_labels)
pred_table = pd.concat([predictionlabel, test_labels], axis = 1)
pred_table.columns = ['Predicted Value', 'True Value']
display(pred_table.head(15))

Unnamed: 0,Predicted Value,True Value
0,9,9
1,8,5
2,2,2
3,4,4
4,1,1
5,4,4
6,4,4
7,5,5
8,2,2
9,7,7


In [27]:
preds = len(predictionlabel)
correct = len([1 for x,y in zip(predictionlabel, test_labels) if x==y])
print((correct/preds)*100)

89.0


## NLP for PyTorch

In [28]:
import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt

import torch
import torch.nn.functional as F
from torch import nn, optim

In [30]:
training_data = [
        ("Veinte paginas".lower().split(), "Spanish"),
        ("I will visit the library".lower().split(), "English"),
        ("I am reading a book".lower().split(), "English"),
        ("This is my favourite chapter".lower().split(), "English"),
        ("Estoy en la biblioteca".lower().split(), "Spanish"),
        ("Tengo un libro".lower().split(), "Spanish")
        ]

test_data = [
        ("Estoy leyendo".lower().split(), "Spanish"),
        ("This is not my favourite book".lower().split(), "English")
        ]

# NOTE:
# lower() is used to prevent duplication in data

In [46]:
# building word index, dictionary of all words in the corpus
# create unique index value for each word

word_dict = {}
i = 0
for words, language in training_data +test_data:
    for word in words:
        if word not in word_dict:
            word_dict[word] = i
            i += 1

print(word_dict)

{'veinte': 0, 'paginas': 1, 'i': 2, 'will': 3, 'visit': 4, 'the': 5, 'library': 6, 'am': 7, 'reading': 8, 'a': 9, 'book': 10, 'this': 11, 'is': 12, 'my': 13, 'favourite': 14, 'chapter': 15, 'estoy': 16, 'en': 17, 'la': 18, 'biblioteca': 19, 'tengo': 20, 'un': 21, 'libro': 22, 'leyendo': 23, 'not': 24}


In [49]:
corpus_size = len(word_dict)
languages = 2
label_index = {"Spanish":0, "English":1}

In [52]:
class BagofWordsClassifier(nn.Module):
    def __init__(self, languages, corpus_size):
        super(BagofWordsClassifier,self).__init__()
        self.linear = nn.Linear(corpus_size,languages)
    
    def forward(self,bow_vec):
        return F.log_softmax(self.linear(bow_vec), dim =1)
    

In [53]:
def make_bow_vector(sentence, word_index):
    word_vec = torch.zeros(corpus_size)
    for word in sentence:
        word_vec[word_dict[word]] += 1
    return word_vec.view(1,-1)

def make_target(label, label_index):
    return torch.LongTensor([label_index[label]])

In [54]:
model = BagofWordsClassifier(languages, corpus_size)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr =0.1)

In [55]:
# we first zero our gradients (as otherwise, PyTorch calculates gradients cumulatively)

for epoch in range(100):
    for sentence, label in training_data:
        model.zero_grad()
        bow_vec = make_bow_vector(sentence, word_dict)
        target = make_target(label, label_index)
        log_probs = model(bow_vec)
        loss = loss_function(log_probs, target)
        loss.backward()
        optimizer.step()
        
    if epoch % 10 == 0:
        print('Epoch: ', str(epoch +1), ', Loss: ' + str(loss.item()))

Epoch:  1 , Loss: 0.831039547920227
Epoch:  11 , Loss: 0.14231812953948975
Epoch:  21 , Loss: 0.07090627402067184
Epoch:  31 , Loss: 0.04670780524611473
Epoch:  41 , Loss: 0.03471290320158005
Epoch:  51 , Loss: 0.027583042159676552
Epoch:  61 , Loss: 0.02286754548549652
Epoch:  71 , Loss: 0.019521258771419525
Epoch:  81 , Loss: 0.01702515222132206
Epoch:  91 , Loss: 0.015092584304511547


In [56]:
def make_predictions(data):

    with torch.no_grad():

        sentence = data[0]

        label = data[1]

        bow_vec = make_bow_vector(sentence, word_dict)

        log_probs = model(bow_vec)

        print(sentence)

        print(label + ':')

        print(np.exp(log_probs))

        

make_predictions(test_data[0])

make_predictions(test_data[1])

['estoy', 'leyendo']
Spanish:
tensor([[0.8607, 0.1393]])
['this', 'is', 'not', 'my', 'favourite', 'book']
English:
tensor([[0.0137, 0.9863]])
