<a href="https://colab.research.google.com/github/nidhiashok/huggingfacelearning/blob/main/Understandingmultipleseq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Handling multiple Sequences

- Transformer models expect multiple sequences by default.

In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.33.1-py3-none-any.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.17.1-py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.8/294.8 kB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m40.3 MB/s[0m eta [36m0:00:0

In [12]:
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a hugging face course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = tf.constant(ids)

model(input_ids)

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[-3.5495782,  3.765616 ]], dtype=float32)>, hidden_states=None, attentions=None)

In [4]:
tokenized_inputs = tokenizer(sequence, return_tensors="tf")
print(tokenized_inputs["input_ids"])

tf.Tensor(
[[  101  1045  1005  2310  2042  3403  2005  1037 17662  2227  2607  2026
   2878  2166  1012   102]], shape=(1, 16), dtype=int32)


In [5]:
input_ids = tf.constant([ids])
print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)

Input IDs: tf.Tensor(
[[ 1045  1005  2310  2042  3403  2005  1037 17662  2227  2607  2026  2878
   2166  1012]], shape=(1, 14), dtype=int32)
Logits: tf.Tensor([[-3.5495782  3.765616 ]], shape=(1, 2), dtype=float32)


In [6]:
batched_ids = tf.constant([ids,ids])
print("Batched IDs:", batched_ids)

output = model(batched_ids)
print("Logits:", output.logits)


Batched IDs: tf.Tensor(
[[ 1045  1005  2310  2042  3403  2005  1037 17662  2227  2607  2026  2878
   2166  1012]
 [ 1045  1005  2310  2042  3403  2005  1037 17662  2227  2607  2026  2878
   2166  1012]], shape=(2, 14), dtype=int32)
Logits: tf.Tensor(
[[-3.549578   3.7656155]
 [-3.5495775  3.7656147]], shape=(2, 2), dtype=float32)


- Batching allows model to process multiple sentences.
- One issue that might arise in batching is unmatched length of sentences.
- This conflicts with tensors needing to be of rectangular shape.

In [7]:
# Padding the inputs

#list of tensors
batched_ids = [[200,200,200],[200,200]]

In [8]:
padding_id = 100

batched_ids = [[200,200,200],[200,200,padding_id],]

In [13]:
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence1_ids = [[200,200,200]]
sequence2_ids = [[200,200]]
batched_ids = [[200,200,200],[200,200,tokenizer.pad_token_id],]

print(model(tf.constant(sequence1_ids)).logits)
print(model(tf.constant(sequence2_ids)).logits)
print(model(tf.constant(batched_ids)).logits)

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


tf.Tensor([[ 1.569367  -1.3894577]], shape=(1, 2), dtype=float32)
tf.Tensor([[ 0.5803027  -0.41252664]], shape=(1, 2), dtype=float32)
tf.Tensor(
[[ 1.5693668 -1.3894575]
 [ 1.3373479 -1.2163184]], shape=(2, 2), dtype=float32)


We can see that the values for the second row is different from the individual sequence. Which should not be the case. To ensure this attention mask is used.

In [14]:
# Attention masks

batched_ids = [[200,200,200],[200,200,tokenizer.pad_token_id],]

attention_mask = [[1,1,1],[1,1,0],]

outputs = model(tf.constant(batched_ids), attention_mask=tf.constant(attention_mask))
print(outputs.logits)

tf.Tensor(
[[ 1.5693668 -1.3894575]
 [ 0.5803016 -0.4125255]], shape=(2, 2), dtype=float32)


Now we have the same logits in both the cases.

In [None]:
#