![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [27]:
!pip install transformers
!pip install evaluate
!pip install xformers

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [28]:
# Start your code here!
import pandas as pd
# Reloading the dataset using the correct semicolon delimiter
car_reviews = pd.read_csv('data/car_reviews.csv', delimiter=';')

# Display the first few rows to confirm it's loaded correctly
car_reviews.head()

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


In [29]:
from transformers import pipeline
from sklearn.metrics import accuracy_score, f1_score

# Load the pre-trained sentiment analysis model
sentiment_classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Get the reviews and their true labels
reviews = car_reviews['Review'].tolist()
true_labels = car_reviews['Class'].tolist()

# Perform sentiment classification on the reviews
predicted_sentiments = sentiment_classifier(reviews)

# Extract the predicted labels and map them to binary values (POSITIVE = 1, NEGATIVE = 0)
predicted_labels = [1 if result['label'] == 'POSITIVE' else 0 for result in predicted_sentiments]
true_binary_labels = [1 if label == "POSITIVE" else 0 for label in true_labels]

# Calculate accuracy and F1 score
accuracy_result = accuracy_score(true_binary_labels, predicted_labels)
f1_result = f1_score(true_binary_labels, predicted_labels)

accuracy_result, f1_result

(0.8, 0.8571428571428571)

In [32]:
from transformers import MarianMTModel, MarianTokenizer
from evaluate import load
from nltk.translate.bleu_score import sentence_bleu

# Load the English-to-Spanish translation model and tokenizer
model_name = "Helsinki-NLP/opus-mt-en-es"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Extract the first review from the dataset
first_review = car_reviews['Review'].iloc[0]

# Translate the first part of the review with a max length of 30 tokens
translated = model.generate(**tokenizer(first_review, return_tensors="pt", max_length=30, truncation=True))
translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)

# Load reference translations
with open("data/reference_translations.txt", 'r') as file:
    lines = file.readlines()
    references = [line.strip() for line in lines]

# Calculate BLEU score
weights = (0.25, 0.25, 0, 0)
results = sentence_bleu(references, translated_text, weights=weights)

translated_text, results

('Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal. Camping, viajes por carretera,',
 0.8812414585178334)