## Using the transformer pipeline for text classification

In [2]:
import transformers

In [3]:
from transformers import pipeline

In [4]:
text_classifier = pipeline(task="text-classification",
model="nlptown/bert-base-multilingual-uncased-sentiment")



In [9]:
text = "Dear seller, I got very impressed with the fast delivery and careful packaging. But the product is broken!"
sentiment = text_classifier(text)

In [10]:
print(sentiment)

[{'label': '2 stars', 'score': 0.4391906261444092}]


In [12]:
print(sentiment[0]["label"])

2 stars


## Using the transformer pipeline for text summarization

In [13]:
from transformers import pipeline

model_name = "cnicu/t5-small-booksum"

summarizer = pipeline(task = "summarization", model = model_name)

In [14]:
long_text = """Design is generally an information dense process involving the creation of a wide variety of formalized documents at different levels of technical complexity. This ranges from those tracking the changes in the requirement analysis stage, conceptual design phase, virtual prototyping process, to the management of the manufacturing processes. Due to this, the idea of AI-aided knowledge distillation and representation support for design has become entrenched in recent years."""

In [15]:
output = summarizer(long_text, max_length=30)
print(output)
print(output[0]['summary_text'])

[{'summary_text': 'Design is generally an information dense process involving the creation of a wide variety of formalized documents at different levels of technical complexity. This range'}]
Design is generally an information dense process involving the creation of a wide variety of formalized documents at different levels of technical complexity. This range


## Using the transformer pipeline for question and answering
    - For Q&A, the output of the pipeline is a dictionary, not a list like for others

In [17]:
# Load the model pipeline for question-answering
qa_model = pipeline(task = "question-answering")
context = """The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). 
Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."""

question = "For how long was the Eiffel Tower the tallest man-made structure in the world?"

# Pass the inputs to the pipeline
outputs = qa_model(question = question, context = context)

# Access and print the answer
print(outputs)
print(outputs['answer'])

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9754096269607544, 'start': 340, 'end': 348, 'answer': '41 years'}
41 years


## Using the transformer pipeline for text generation

In [24]:
from transformers import pipeline

model = pipeline(task = "text-generation", model = "openai-community/gpt2")

prompt = "Finite element analysis is known for"

output = model(prompt, max_length = 200, truncation=True)
print(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Finite element analysis is known for computational analysis and for statistical analyses (Eckstrom and S. W. Jones 2003). The use of finite elements can be used to demonstrate that mathematical modeling methods may capture all types of data and to test whether assumptions about their structure can be adjusted from natural selection or from the hypothesis theory view, e.g., for each element that has the capacity to be considered one of its constituent parts. Some of the most important finite-element algorithms in recent years have been numerical ones, such as R and Bayesian, R and Bayesian, as well as computational ones, such as a large-scale-scale linear algebra theory based on the FFT. The present paper focuses on the FFT in mathematical modeling of computational models of the distribution functions. It has discussed a wide variety of finite-element algorithms, which have been widely adopted and often used in computational modeling of the natural world, such as th

## Using the transformer pipeline for Translation from Spanish to English

In [1]:
from transformers import pipeline
import sentencepiece

input_text = "Este curso sobre LLMs se está poniendo muy interesante"

model_name = "Helsinki-NLP/opus-mt-es-en"
# Define pipeline for Spanish-to-English translation
translator = pipeline(task = "translation_es_to_en", model=model_name)

# Translate the input text
translations = translator(input_text)

# Access the output to print the translated text in English
print(translations[0]["translation_text"])


  from .autonotebook import tqdm as notebook_tqdm


This course on LLMs is getting very interesting.


## Using the transformer pipeline for text generation

In [3]:

# Create a pipeline for text generation using the gpt2 model
generator = pipeline(task ="text-generation", model ="gpt2")

response = "Dear valued customer, I am glad to hear you had a good stay with us."

text = """I had a wonderful stay at the Riverview Hotel! The staff were incredibly attentive and the amenities were top-notch. 
The only hiccup was a slight delay in room service, but that didn't overshadow the fantastic experience I had."""

# Complete the prompt
prompt = f"Customer review:\n{text}\n\nHotel reponse to the customer:\n{response}"

# Pass the prompt to the model pipeline
outputs = generator(prompt, max_length = 150, pad_token_id=generator.tokenizer.eos_token_id)

# Print the generated text
print(outputs[0]["generated_text"])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Customer review:
I had a wonderful stay at the Riverview Hotel! The staff were incredibly attentive and the amenities were top-notch. 
The only hiccup was a slight delay in room service, but that didn't overshadow the fantastic experience I had.

Hotel reponse to the customer:
Dear valued customer, I am glad to hear you had a good stay with us. The Riverview Experience has exceeded my expectations and I'm glad I was able to enjoy it at my home-based venue. Love that we have other people waiting for us before the reservation is made. This is an extraordinary setting. We'll make the best of it!

Hotel was really great! Service was outstanding!! It


## Examining the base transformer architecture using the PyTorch library

In [26]:
import torch
import torch.nn as nn

# Set transformer model hyperparameters
d_model = 512
n_heads = 8
num_encoder_layers = 6
num_decoder_layers = 6

# Create the transformer model and assign hyperparameters
model = nn.Transformer(
    d_model = d_model,
    nhead = n_heads,
    num_encoder_layers = num_encoder_layers,
    num_decoder_layers = num_decoder_layers
          
)

print(model)




Transformer(
  (encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-5): 6 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (linear1): Linear(in_features=512, out_features=2048, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=2048, out_features=512, bias=True)
        (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (layers): ModuleList(
      (0-5): 6 x TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, o

## Attention Mechanism - Positional Encoding
- 

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [8]:
#print(torch.arange(0,20).unsqueeze(1))

# Subclass the appropriate PyTorch class 
# NOTE: nn.Module class is the base model for all neural network, all nn must subclass this module
## use the .register_buffer method to keep parameters untrainable

## Initialize a positional encoding matrix for token positions in sequences up to max length

class PositionalEncoder(nn.Module):
    def __init__(self, d_model, max_length):
        super(PositionalEncoder, self).__init__()
        self.d_model = d_model
        self.max_length = max_length
        
        # Initialize the positional encoding matrix
        pe = torch.zeros(max_length, d_model)
        
        position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2, dtype=torch.float) * -(math.log(10000.0) / d_model))
        
        # Calculate and assign position encodings to the matrix
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe) 

  # Update the embeddings tensor adding the positional encodings
    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x

In [7]:
print(torch.arange(0, 10, 2))

tensor([0, 2, 4, 6, 8])


## Attention mechanism - Multihead attention
    - In multi-headed attention, the outputs from different attention heads are combined together and then adjusted to maintain the same size for further processing.

In [5]:
# Set up the linear transformations for the attention inputs: query, key, and value.

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        # Set the number of attention heads
        self.num_heads = num_heads
        self.d_model = d_model
        self.head_dim = d_model // num_heads
		# Set up the linear transformations
        self.query_linear = nn.Linear(d_model, d_model)
        self.key_linear = nn.Linear(d_model, d_model)
        self.value_linear = nn.Linear(d_model, d_model)
        self.output_linear = nn.Linear(d_model, d_model)
        # Split the sequence embeddings x across the multiple attention heads.
    def split_heads(self, x, batch_size):
        # Split the sequence embeddings in x across the attention heads
        x = x.view(batch_size, -1, self.num_heads, self.head_dim)
        return x.permute(0, 2, 1, 3).contiguous().view(batch_size * self.num_heads, -1, self.head_dim)

    # Compute dot-product based attention scores between the project query and key.
    # Normalize the attention scores to obtain attention weights.
    def compute_attention(self, query, key, mask=None):
        # This computes attention weights using
        # Compute dot-product attention scores
        scores = torch.matmul(query, key.permute(1, 2, 0))
        if mask is not None:
            scores = scores.masked_fill(mask == 0, float("-1e20"))
        # Normalize attention scores into attention weights
        attention_weights = F.softmax(scores, dim=-1)
        return attention_weights

    # Multiply the attention weights by the values and linearly transform the concatenated outputs per head.
    def forward(self, query, key, value, mask=None):
        batch_size = query.size(0)

        query = self.split_heads(self.query_linear(query), batch_size)
        key = self.split_heads(self.key_linear(key), batch_size)
        value = self.split_heads(self.value_linear(value), batch_size)

        attention_weights = self.compute_attention(query, key, mask)
		
    # Multiply attention weights by values, concatenate and linearly project outputs
        output = torch.matmul(attention_weights, value)
        output = output.view(batch_size, self.num_heads, -1, self.head_dim).permute(0, 2, 1, 3).contiguous().view(batch_size, -1, self.d_model)
        return self.output_linear(output)

In [4]:

##Post-attention feed-forward layer

class FeedForwardSubLayer(nn.Module):
    # Specify the two linear layers' input and output sizes
    def __init__(self, d_model, d_ff):
        super(FeedForwardSubLayer, self).__init__()
        self.fc1 = nn.Linear(d_model, d_ff)
        self.fc2 = nn.Linear(d_ff, d_model)
        self.relu = nn.ReLU()

	# Apply a forward pass
    def forward(self, x):
        return self.fc2(self.relu(self.fc2))

In [3]:

#You've made it quite far in building your own skeleton transformer architecture! Now you are ready to assemble a full encoder layer containing:

#    A multi-headed self-attention mechanism.

# Complete the initialization of elements in the encoder layer
class EncoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super(EncoderLayer, self).__init__()
        self.self_attn = MultiHeadAttention(d_model, num_heads)
        self.feed_forward = FeedForwardSubLayer(d_model, d_ff)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask):
        attn_output = self.self_attn(x, x, x, mask)
        x = self.norm1(x + self.dropout(attn_output))
        ff_output = self.feed_forward(x)
        return self.norm2(x + self.dropout(ff_output))

## Encoder transformer body and head

  - Implementing the transformer body, namely a stack of multiple encoder layers.
  - Appending a task-specific transformer head to process the encoder's resulting hidden states and produce the final outputs for the language task at hand!


In [2]:
class TransformerEncoder(nn.Module):
    def __init__(self, vocab_size, d_model, num_layers, num_heads, d_ff, dropout, max_sequence_length):
        super(TransformerEncoder, self).__init__()
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.positional_encoding = PositionalEncoder(d_model, max_sequence_length)
        # Define a stack of multiple encoder layers
        self.layers = nn.ModuleList([EncoderLayer(d_model, num_heads, d_ff, dropout) for _ in range(num_layers)])
	
    # Complete the forward pass method
    def forward(self, x, mask):
        x = self.embedding(x)
        x = self.positional_encoding(x)
        for layer in self.layers:
            x = layer(x, mask)
        return x

class ClassifierHead(nn.Module):
    def __init__(self, d_model, num_classes):
        super(ClassifierHead, self).__init__()
        # Add linear layer for multiple-class classification
        self.fc = nn.Linear(d_model, num_classes)

    def forward(self, x):
        logits = self.fc(x[:, 0, :])
        # Obtain log class probabilities upon raw outputs
        return F.log_softmax(logits, dim=-1)