# Analyzing Car Reviews with LLM's

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Load the dataset

In [39]:
import pandas as pd

df = pd.read_csv('car_reviews.csv', sep=';')

In [40]:
df.head()

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


## Sentiment Analysis with Transformers

In [33]:
import warnings
from transformers import pipeline

# Suppress FutureWarnings
warnings.filterwarnings("ignore", category=FutureWarning)

# Initialize sentiment analysis pipeline with a specific model and revision
classifier = pipeline("sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", revision="af0f99b")

# Perform sentiment analysis on the reviews
results = classifier(df['Review'].tolist())

# Extract the labels from results and store them in predicted_labels
predicted_labels = [result['label'] for result in results]

# Map the labels to binary integer labels: POSITIVE -> 1, NEGATIVE -> 0
predictions = [1 if label == "POSITIVE" else 0 for label in predicted_labels]

# Display predictions
print(predictions)

[1, 1, 1, 0, 1]


## Calculate Accuracy and F1 Score

In [34]:
import evaluate

# Load accuracy and F1 score metrics    
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

# True labels from the dataset
true_labels = [1 if label == "POSITIVE" else 0 for label in df['Class']]

# Calculate accuracy and F1 score
accuracy_result_dict = accuracy.compute(references=true_labels, predictions=predictions)
accuracy_result = accuracy_result_dict['accuracy']
f1_result_dict = f1.compute(references=true_labels, predictions=predictions)
f1_result = f1_result_dict['f1']

# Display accuracy and F1 score
accuracy_result, f1_result

(0.8, 0.8571428571428571)

## Translation and BLEU Score Evaluation

In [41]:
from transformers import pipeline
import evaluate

# Load the translation pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")

# Extract the first two sentences from the first review
first_review = df['Review'][0].split('.')[:2]
first_review_text = '. '.join(first_review) + '.'

# Increase max_length to handle longer input texts
translated_review = translator(first_review_text, max_length=400)[0]['translation_text']
print(f"Model translation:\n{translated_review}")

# Load reference translations from the file
with open('reference_translations.txt', 'r') as f:
    reference_translations = f.readlines()

# Strip newline characters from reference translations
references = [line.strip() for line in reference_translations]
print(f"Spanish translation references:\n{references}")

# Load and calculate BLEU score metric
bleu = evaluate.load("bleu")
bleu_score = bleu.compute(predictions=[translated_review], references=[references])
print(f"BLEU score: {bleu_score['bleu']}")


Model translation:
Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.
Spanish translation references:
['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.', 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']
BLEU score: 0.7794483794144497


## Extractive Question Answering with Transformers

In [43]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

# Load the tokenizer and model
model_ckp = "deepset/minilm-uncased-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_ckp)
model = AutoModelForQuestionAnswering.from_pretrained(model_ckp)

# Define the question and context (2nd review)
question = "What did he like about the brand?"
context = df['Review'][1]

# Tokenize the input
inputs = tokenizer(question, context, return_tensors="pt")

# Get the answer from the model
with torch.no_grad():
    outputs = model(**inputs)

# Find the start and end token indices of the answer
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits) + 1

# Get the tokens for the answer span
answer_span = inputs["input_ids"][0][start_idx:end_idx]

# Decode the answer tokens to a string
answer = tokenizer.decode(answer_span, skip_special_tokens=True)

# Display the answer
print("Answer:", answer)

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Answer: ride quality, reliability


## Text Summarization

In [37]:
import os
from transformers import pipeline

# Suppress the symlink warning
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

# Load the summarization pipeline with the new model
model_name = "cnicu/t5-small-booksum"
summarizer = pipeline("summarization", model=model_name)

# Get the last review (5th one) to summarize
text_to_summarize = df['Review'].iloc[-1]

# Summarize the text
outputs = summarizer(text_to_summarize, max_length=53, min_length=50, do_sample=False)
summarized_text = outputs[0]['summary_text']

# Display the summarized text
print(f"Summarized text:\n{summarized_text}")


Summarized text:
the Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. I have hauled 12 bags of mulch in the back with the seats down and could have held more.
