![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [9]:
!pip install transformers
!pip install evaluate==0.4.0
!pip install datasets==2.10.0
!pip install sentencepiece==0.1.97

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [10]:
import pandas as pd
import evaluate
from transformers import pipeline

In [11]:
# Start your code here!
import pandas as pd
from transformers import pipeline
import evaluate

# Task 1: Classify car reviews
car_reviews = pd.read_csv("data/car_reviews.csv", delimiter=";")
#print(car_reviews)

# Load a sentiment analysis model
sentiment_pipeline = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

# Classify sentiment for each review
predicted_labels = [sentiment_pipeline(review)[0]['label'] for review in car_reviews['Review']]
predictions = [1 if label == "POSITIVE" else 0 for label in predicted_labels]

true_labels = car_reviews['Class'].map({"POSITIVE": 1, "NEGATIVE": 0})

accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
accuracy_result = accuracy.compute(references=true_labels, predictions=predictions)
f1_result = f1.compute(references=true_labels, predictions=predictions)

In [12]:
# Task 2: Translate part of the first review into Spanish and calculate BLEU score

# Extract the first two sentences of the first review
first_review = car_reviews.iloc[0]['Review']
first_two_sentences = ". ".join(first_review.split(". ")[:2])
#print(first_review)
#print(first_two_sentences)


# Load model 
translator = pipeline(task="translation_en_to_es", model = "Helsinki-NLP/opus-mt-en-es")

# Translate English to Spanish
raw_translation = translator(first_two_sentences, clean_up_tokenization_spaces=True)
translated_review = raw_translation[0]["translation_text"]  # this is now a string
print(translated_review)

# Compute BLEU score 
reference_translations = pd.read_csv("data/reference_translations.txt", delimiter="\t", header=None)
raw_reference = reference_translations.iloc[0, 0]
ref1, ref2 = raw_reference.strip().split('. ', 1)  # safer: only split once
ref1 = ref1.strip() + "."
ref2 = ref2.strip()

# Try to split intelligently using sentence boundaries (there are exactly two full references)
references = [[ref1, ref2]]
translated_prediction = [translated_review]

bleu = evaluate.load("bleu")
bleu_scores = bleu.compute(references=references, predictions=translated_prediction)
bleu_score = bleu_scores['bleu']
print(bleu_scores)
print(bleu_score)

Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal
{'bleu': 0.6712403123245676, 'precisions': [0.8571428571428571, 0.75, 0.631578947368421, 0.5], 'brevity_penalty': 1.0, 'length_ratio': 2.1, 'translation_length': 21, 'reference_length': 10}
0.6712403123245676


In [13]:
# Task 3: Extractive QA using "deepset/minilm-uncased-squad2"

# Load the extractive QA model
qa_pipeline = pipeline("question-answering", model="deepset/minilm-uncased-squad2")

# Define the question and context (second review)
question = "What did he like about the brand?"
context = car_reviews.iloc[1]['Review']

# Get the answer using the QA model
raw_answer = qa_pipeline(question=question, context=context)
answer = {"answer": raw_answer["answer"]}  # wrap as dict


# Display extracted answer
#print(context)
#print(question)
print(answer)

{'answer': 'ride quality, reliability'}


In [14]:
# Task 4: Summarization and Bias Analysis

# Load a summarization model (e.g., "facebook/bart-large-cnn")
summarization_pipeline = pipeline("summarization", model="facebook/bart-large-cnn")

# Extract the last review
last_review = car_reviews.iloc[-1]['Review']
#print(last_review)

# Generate summary with a length of approximately 50-55 tokens
summary_output = summarization_pipeline(last_review, max_length=55, min_length=50)
summarized_text = [summary_output[0]['summary_text']]  # List of one string
print(summarized_text)



toxicity_metric = evaluate.load("toxicity")
regard_metric = evaluate.load("regard")

toxicity_result_raw = toxicity_metric.compute(predictions=summarized_text, aggregation="maximum")
if isinstance(toxicity_result_raw, float):
    toxicity_results = {"max_toxicity": toxicity_result_raw}
else:
    toxicity_results = toxicity_result_raw

# Force regard to be a dict
regard_result_raw = regard_metric.compute(data=summarized_text)
if isinstance(regard_result_raw, float):
    regard_results = {"regard": regard_result_raw}
else:
    regard_results = regard_result_raw

print(toxicity_results)
print(regard_results)

['The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong']
{'max_toxicity': 0.00013863427739124745}
{'regard': [[{'label': 'positive', 'score': 0.6263341307640076}, {'label': 'neutral', 'score': 0.20273472368717194}, {'label': 'other', 'score': 0.1229156106710434}, {'label': 'negative', 'score': 0.04801551252603531}]]}


In [15]:
print("predicted_labels:", predicted_labels)  # should be ["POSITIVE", "NEGATIVE", ...]
print("predictions:", predictions)            # should be [1, 0, 1, ...]
print("accuracy_result:", accuracy_result)    # should be dict with 'accuracy'
print("f1_result:", f1_result) 
print("summarized_text:", type(summarized_text), "→", summarized_text)
print("bleu_score:", type(bleu_score))
print("toxicity_results:", type(toxicity_results))
print("regard_results:", type(regard_results))
print("translated_review:", type(translated_review), "→", translated_review)
print("answer:", type(answer), "→", answer)
print("question:", type(question), "→", question)
print("context:", type(context), "→", context)



predicted_labels: ['POSITIVE', 'POSITIVE', 'POSITIVE', 'NEGATIVE', 'POSITIVE']
predictions: [1, 1, 1, 0, 1]
accuracy_result: {'accuracy': 0.8}
f1_result: {'f1': 0.8571428571428571}
summarized_text: <class 'list'> → ['The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong']
bleu_score: <class 'float'>
toxicity_results: <class 'dict'>
regard_results: <class 'dict'>
translated_review: <class 'str'> → Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal
answer: <class 'dict'> → {'answer': 'ride quality, reliability'}
question: <class 'str'> → What did he like about the brand?
context: <class 'str'> → The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the interior is well-built. The transmissio