# Objective:

    This notebook is meant to understand outputs of DistilBert model in a visual manner. It is inspired by an article at 

<a href="http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/">Jay Alammar</a>
    
<img src="http://jalammar.github.io/images/distilBERT/bert-distilbert-tutorial-sentence-embedding.png"/>

### Load the DistilBertModel & Tokenizer

In [18]:
import torch
from transformers import DistilBertTokenizer
from transformers import DistilBertModel
import numpy as np

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

### Sample Input data
    - Tokenize each input string, add special token and pad to max length
    - Create a tensor object for each tokenized string
    
<img src="http://jalammar.github.io/images/distilBERT/bert-distilbert-tokenization-2-token-ids.png"/>

In [11]:
s1 = 'i love machine learning'
s2 = 'i hate sleeping'
sentences = [s1,s2]
tokens =[]

for s in sentences:
  tokenized = tokenizer.encode(s, add_special_tokens=True, pad_to_max_length=True)
  tokens.append(tokenized)
  
s_tensor = torch.tensor(tokens)

#attention_mask_tensor = torch.tensor([attention_mask])

### Run model and extract last hidden states

<img src="http://jalammar.github.io/images/distilBERT/bert-distilbert-input-tokenization.png"/>

In [12]:
with torch.no_grad():
  last_hidden_states = model(s_tensor)

The way BERT does sentence classification, is that it adds a token called `[CLS]` (for classification) at the beginning of every sentence. The output corresponding to that token can be thought of as an embedding for the entire sentence.

<img src="https://jalammar.github.io/images/distilBERT/bert-output-tensor-selection.png" />

In [13]:
last_hidden_states[0].shape

torch.Size([2, 512, 768])

In [17]:
sentence_embedding = np.array(last_hidden_states[0][:,0,:])
sentence_embedding.shape

(2, 768)

### What Next ?

    Now that we have the embeddings we can use any classification algorithm to do the classification
    
<img src="http://jalammar.github.io/images/distilBERT/distilbert-bert-sentiment-classifier.png"/>

### WIP - Ignore code below for now

In [21]:
import torch
from transformers import DistilBertTokenizer
from transformers import DistilBertForSequenceClassification
import numpy as np

tokenizer_1 = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model_1 = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')

In [22]:
s1 = 'i love machine learning'
s2 = 'i hate sleeping'
sentences = [s1,s2]
tokens =[]

for s in sentences:
  tokenized = tokenizer_1.encode(s, add_special_tokens=True, pad_to_max_length=True)
  tokens.append(tokenized)
  
s_tensor = torch.tensor(tokens)

#attention_mask_tensor = torch.tensor([attention_mask])

In [23]:
with torch.no_grad():
  last_hidden_states = model_1(s_tensor)

In [24]:
last_hidden_states[0].shape

torch.Size([2, 2])

In [25]:
last_hidden_states[0]

tensor([[ 0.1441, -0.0378],
        [ 0.1396, -0.0364]])