<a href="https://colab.research.google.com/github/jkchandalia/nlp/blob/main/NLP_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Intro to Practical Hands-on Deep Learning**

**Import** Libraries 

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Structuring Unstructured Data

It's easy to compare numbers and vectors.

In [15]:
import math

def array_isclose(arr1, arr2):
  for i in range(len(arr1)):
    if not(math.isclose(arr1[i], arr2[i], rel_tol = 0.1)):
      return False
  return True

arr1 = [2.3, 1.01, 4.5]
arr2 = [2.2, 1.00, 4.3]
arr3 = [1.8, 1.5, 4.5]

print("Is arr1 close to arr2: ", array_isclose(arr1, arr2))
print("Is arr2 close to arr3: ", array_isclose(arr2, arr3))

Is arr1 close to arr2:  True
Is arr2 close to arr3:  False


How about images? Or sentences? We will start by looking at two sentences that are similar:



### 1.   **The cat took a nap on the rug.**
<figure>
<img src='https://drive.google.com/uc?export=view&id=1KCjrHAU1V7P73OFjzNiabIR0nfKfpen7' alt="History of LLMs", width="200" height="200"/>
<figcaption>By DALL-E: Cat taking nap on rug</figcaption></center>
</figure>

### 2.   **The feline slept in the sunshine.**
<figure>
<img src='https://drive.google.com/uc?export=view&id=1gtHWnVm-fPSruzxjMS1GHK0bgLqYBYyo' alt="History of LLMs", width="200" height="200"/>
<figcaption>By DALL-E: Feline sleeping in sunshine</figcaption></center>
</figure>


Do we feel like there’s a similar idea being presented in these two sentences? 
What about the below sentence?


### 3.   **The cat took a bite of the rug.**

<figure>
<img src='https://drive.google.com/uc?export=view&id=1DtnL-sFkhmFKnOAXPJ4q5n9z-eoZMIP0' alt="History of LLMs", width="200" height="200"/>
<figcaption>By DALL-E: Cat biting rug</figcaption></center>
</figure>





Is there a way to quantify our feelings about the differences/similarities between these three sentences? Let's try something simple and represent each sentence as an array:

In [None]:
s1 = ['The', 'cat', 'took', 'a', 'nap', 'on', 'the', 'rug']
s2 = ['The', 'feline', 'slept', 'in', 'the', 'sunshine']
s3 = ['The', 'cat', 'took', 'a', 'bite', 'of', 'the', 'rug']


And let’s try to quantify the differences between the vectors by representing a word match as a 1 and a word mismatch as a 0. For instance, for the first two sentences we have:

[1, 0, 0, 0, 0, 0, 0, 0]

Because only ‘The’ is common between the sentences. 

For the first and third sentences, we have:

[1, 1, 1, 1, 0, 0, 1, 1]


Because ‘The’, ‘cat’, ‘took’, ‘a’, ‘the’, and ‘rug’ are common words in the two sentences. If we simple add up all the numbers in each vector to get an idea of how similar the sentences are, we have:

1 for similarity between sentences 1 and 2
6 for similarity between sentences 1 and 3

Using our approach, which sentences are more similar? Does this match our intuition for which sentences are most similar/dissimilar out of our examples? How are we handling sentences of different length?


To quantify differences between sentences, we need to represent them in a numerical way. However, we also need this numerical representation to reflect what the sentences actually mean, i.e., capture the semantic content of these sentences and words. How can we do that?


# Large Language Models (LLM)


<figure>
<center>
<img src='https://drive.google.com/uc?export=view&id=1x0w2nrDUcuAUOqwbjpM8NKrH2T3m5gLH' alt="History of LLMs", width="900" height="600"/>
<figcaption>Image Caption</figcaption></center>
</figure>

## Transformer Encoder

### Self-supervised learning

*Masked Language Modelling (MLM)*

*Next Sentence Prediction (NSP)*

### Attention Mechanism

Traditionally, modeling sequences has been challenging but the general idea is that we want to express the information contained in the sequence into a compressed form and use that compressed form to generate an output. The output could be a translation of the original sequence into a new language or a classification. Taking the example from above, this means the above sentences that are semantically similar will be mapped to something similar in the compressed form while dissimilar sentences will be further apart. What transformed (pun intended) this compression step is the concept of attention. 

I’m going to need to gloss over the technical details, but basically for each item in our sequence, we ask or query each item in our sequence to see how important or relevant it is for us. Of course this is done through matrix multiplications which we will not get into here. For this example sentence:

“The cat purred in happiness”

one could imagine that cat and purr attend to each other, likely because the english language reflects that cats purring is much more likely than say an elephant purring. After going through an attention layer, each item in the transformed sequence is actually a mixture of itself and all other items that contribute to the meaning of itself. By passing inputs through many such layers, we generate a representation of our original input that does capture the semantic meaning of our original input. 


### Transfer Learning

## Data Generation:
Use chatgpt to create a poem that we will classify in the next section!

https://chat.openai.com/