####Using Transformer LLMs for common tasks
1. Summarization
2. Translation
3. Question Answering
4. Table Question Answering
5. Fill Mask
6. Feature Extraction
7. Zero Shot Classification

In [0]:
from transformers import pipeline
import pandas as pd

2025-04-22 05:08:40.505334: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


####1. Summarization
Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.


Q1. Create a summary of the given article.

In [0]:
article = """Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017."""

In [0]:
summarizer = pipeline(task="summarization", max_length=50)
result = summarizer(article)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Your min_length=56 must be inferior than your max_length=50.


In [0]:
print(article)

Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.


In [0]:
print(result)

[{'summary_text': ' Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018 . City of Paris is centre and seat of government of the region and province of Île-de-'}]


####2. Translation
Translation is the task of converting text from one language to another.


Q2. Translate the given statement from english to french.\
You can use the Helsinki-NLP/opus-mt-en-fr model

In [0]:
statement = "My name is Prashant."
translator = pipeline(task="translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator(statement)

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



In [0]:
print(result)

[{'translation_text': 'Mon nom est Prashant.'}]


In [0]:
french_sen = 'Mon nom est Prashant.'
model_name = "Helsinki-NLP/opus-mt-fr-en"
eng_translator = pipeline(task="translation_fr_to_en", model=model_name)
print(eng_translator(french_sen))

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

[{'translation_text': 'My name is Prashant.'}]


####3. Question Answering
Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context!


Q3. Generate the answer for the given question from the context.

In [0]:
question = "Where do I live?"
context = "My name is Merve and I live in İstanbul."
qa_pipeline = pipeline(task="question-answering")
result = qa_pipeline(question = question, context = context)

In [0]:
print(result)

####4. Table Question Answering
Table Question Answering (Table QA) is the answering a question about an information on a given table.


Q4. Generate the answer for the given question from the provided table.\
You can use the google/tapas-large-finetuned-wtq model

In [0]:
question = "how many movies does Leonardo Di Caprio have?"
data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)

tqa_pipeline = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq")
result = tqa_pipeline(table=table, query=question)

config.json:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/262k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

In [0]:
print(result)

{'answer': 'SUM > 53', 'coordinates': [(1, 1)], 'cells': ['53'], 'aggregator': 'SUM'}


####Fill-Mask
Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in.


Q5. Generate the sentance after filling the missing word.

In [0]:
sentance = "Paris is the <mask> of France."
fm_pipeline = pipeline(task="fill-mask")
result = fm_pipeline(sentance)

In [0]:
display(result)

####6. Feature Extraction
Feature extraction is the task of extracting features learnt in a model. These models can be used in RAG Approch.



Q6. Extract the features of the given text.\
You can use the facebook/bart-base model.

In [0]:
text = "Transformers is an awesome library!"
feature_extractor = pipeline(task="feature-extraction", model="facebook/bart-base")
result = feature_extractor(text, return_tensors = "pt")

config.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [0]:
display(result)

tensor([[[ 2.5834,  2.7571,  0.9024,  ...,  1.5036, -0.0435, -0.8603],
         [-1.2850, -1.0094, -2.0826,  ...,  1.5993, -0.9017,  0.6426],
         [ 0.9082,  0.3896, -0.6843,  ...,  0.7061,  0.6517,  1.0550],
         ...,
         [ 0.6919, -1.1946,  0.2438,  ...,  1.3646, -1.8661, -0.1642],
         [-0.1701, -2.0019, -0.4223,  ...,  0.3680, -1.9704, -0.0068],
         [ 0.2520, -0.6869, -1.0582,  ...,  0.5198, -2.2106,  0.4547]]])

####7. Zero Shot Classification
Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes.




Q7. Classify the given sentance to the provided lables.\
You can use the facebook/bart-large-mnli model.

In [0]:
sentance = "I have a problem with my iphone that needs to be resolved asap!"
candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"]
zsc_pipiline = pipeline(task="zero-shot-classification", model="facebook/bart-large-mnli")
result = zsc_pipiline(sentance, candidate_labels = candidate_labels)

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

In [0]:
display(result)