# Text Summarization with Transformers

![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you have been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car's text review, answering a customer question, summarizing or translating text, etc.



### New Jupyter Notebook: `text_summarization_notebook.ipynb`

#### Cell 1: Notebook Title and Introduction


# Text Summarization with Transformers

![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you have been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car's text review, answering a customer question, summarizing or translating text, etc.



#### Cell 2: Install Necessary Libraries


## Install Necessary Libraries

Before you start, ensure you have the necessary libraries installed. You can install them using the following commands:





#### Cell 3: Import Libraries and Set Logging Level


In [1]:
# Import the necessary libraries from the transformers package
from transformers import logging, AutoTokenizer, AutoModelForSeq2SeqLM

# Set logging level to warning to reduce verbosity
logging.set_verbosity(logging.WARNING)



#### Cell 4: Load Car Reviews Dataset


In [2]:
# Load the car reviews dataset
import pandas as pd

# Specify the file path to the dataset
file_path = "data/car_reviews.csv"

# Read the dataset into a DataFrame
df = pd.read_csv(file_path, delimiter=";")

# Display the first few rows of the DataFrame
df.head()

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE




#### Cell 5: Sentiment Analysis


## Sentiment Analysis

We will use a pre-trained model to classify the sentiment of car reviews.

In [3]:
# Running the sentiment classification

# Load a sentiment analysis LLM into a pipeline
from transformers import pipeline

classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')

# Perform inference on the car reviews and display prediction results
reviews = df.Review.to_list()
real_labels = df.Class.to_list()
predicted_labels = classifier(reviews)
for review, prediction, label in zip(reviews, predicted_labels, real_labels):
    print(f"Review: {review}\nActual Sentiment: {label}\nPredicted Sentiment: {prediction['label']} (Confidence: {prediction['score']:.4f})\n")

# Load accuracy and F1 score metrics
import evaluate
accuracy = evaluate.load('accuracy')
f1 = evaluate.load('f1')

# Map categorical sentiment labels into integer labels
references = [1 if label == "POSITIVE" else 0 for label in real_labels]
predictions = [1 if label['label'] == "POSITIVE" else 0 for label in predicted_labels]

# Calculate accuracy and F1 score
accuracy_result_dict = accuracy.compute(references=references, predictions=predictions)
accuracy_result = accuracy_result_dict['accuracy']
f1_result_dict = f1.compute(references=references, predictions=predictions)
f1_result = f1_result_dict['f1']
print(f"Accuracy: {accuracy_result}")
print(f"F1 result: {f1_result}")




Review: I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use. Camping, road trips, etc. We dont have any children so I store most of the seats in my warehouse. I wanted the passenger van for the rear air conditioning. We drove our van from Florida to California for a Cross Country trip in 2014. We averaged about 18 mpg. We drove thru a lot of rain and It was a very comfortable and stable vehicle. The V8 Nissan Titan engine is a 500k mile engine. It has been tested many times by delivery and trucking companies. This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty. Many people are scared about driving this van because of its size. But with front and rear sonar sensors, large mirrors and the back up camera. It is easy to drive. The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects. Our Nissan NV is a Tow Monster. It pulls our 5000 pound travel tr

Accuracy: 0.8
F1 result: 0.8571428571428571






#### Cell 6: Text Translation


## Text Translation

We will use a pre-trained model to translate car reviews from Spanish to English.

In [4]:
# Load model directly
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en", truncate=True)

# Filtering the data to be translated
data_to_translate = df.Review.iloc[0]
data_to_translate

# Selecting only the first two sentences
sentences = data_to_translate.split(".")
first_two_sentences = sentences[:2]
data_to_translate = ".".join(first_two_sentences)

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translated_review = translator(data_to_translate)[0]['translation_text']

# Loading BLEU metric
bleu = evaluate.load('bleu')

# Loading the reference translations
reviews_es = pd.read_csv('data/reference_translations.txt', names=['reviews'])
reviews_es = list(reviews_es.reviews.values)

# Calculating BLEU score
for review in reviews_es:
    print(f"Translated: {translated_review}")
    print(f"Original: {review}")
    bleu_score = bleu.compute(references=[review], predictions=[translated_review])
    print(f"Bleu Score: {bleu_score['bleu']}")
    print()

Translated: Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal
Original: Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.
Bleu Score: 0.6712403123245675

Translated: Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal
Original: Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.
Bleu Score: 0.6712403123245675







#### Cell 7: Question Answering


## Question Answering

We will use a pre-trained model to answer questions about car reviews.

In [5]:
# Loading the model for QA
import transformers
model_name_QA = "deepset/minilm-uncased-squad2"

model = transformers.AutoModelForQuestionAnswering.from_pretrained(model_name_QA)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_QA)

# Defining the question
question = "What did he like about the brand?"

# Selecting the 2nd review in the dataset
context = df['Review'][1]

# Tokenizing the inputs and returning PyTorch tensors
import torch
inputs = tokenizer.encode(question, context, return_tensors='pt')

with torch.no_grad():
    output = model(inputs)

# Catching the highest logits, which means the highest probability of the tokens for the beginning and for the end
start = torch.argmax(output.start_logits)
end = torch.argmax(output.end_logits) + 1

# Filtering the tokenized text for the highest probable tokens
answer_span = inputs[0][start:end]

# Translating the answer for human-readable format
answer = tokenizer.decode(answer_span)
print(f"Main terms at the answer = {answer}")

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Main terms at the answer = ride quality, reliability






#### Cell 8: Text Summarization


## Text Summarization

We will use the `Falconsai/text_summarization` model to summarize the last review in our dataset.

In [6]:
# Load the pre-trained tokenizer and model for text summarization
tokenizer = AutoTokenizer.from_pretrained("Falconsai/text_summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/text_summarization")

# Extract the last review text from the DataFrame
text = df['Review'].iloc[-1]

# Tokenize the input text and convert it to tensor format
inputs = tokenizer.encode(text, return_tensors='pt')

# Generate the summary using the model
outputs = model.generate(inputs, max_length=50)

# Decode the generated summary to a human-readable format
summarized_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Display the summarized text
summarized_text

'. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. The financial arrangement is quite reasonable; the financial arrangement is quite reasonable. Handling and styling are great; I'





#### Cell 9: Conclusion


## Conclusion

In this notebook, we demonstrated how to use the `transformers` library to perform sentiment analysis, text translation, question answering, and text summarization on car reviews using various pre-trained models. By following the steps outlined in this notebook, you can build a chatbot app with multiple functionalities to assist customers and provide support to human agents in an auto dealership company.