Car-ing is sharing, an auto dealership company for car sales and rental, is taking their services to the next level thanks to Large Language Models (LLMs).

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.

In [1]:
import pandas as pd
import os
from dotenv import load_dotenv
from sklearn.metrics import accuracy_score, f1_score
import evaluate
from transformers import logging, pipeline
logging.set_verbosity(logging.WARNING)

In [2]:
load_dotenv()
csv_path = os.getenv("CAR_REVIEWS_PATH")
ref_path = os.getenv("REF_TRANSLATIONS_PATH")
if not csv_path or not ref_path:
    raise ValueError("Lütfen .env dosyasında CAR_REVIEWS_PATH ve REF_TRANSLATIONS_PATH tanımladığından emin ol.")
df = pd.read_csv(csv_path, sep=";")
df['Class'] = df['Class'].map({'POSITIVE': 1, 'NEGATIVE': 0})
reviews = df['Review'].tolist()
true_labels = df['Class'].tolist()
model = pipeline(task="sentiment-analysis", model= "siebert/sentiment-roberta-large-english")
predicted_labels= model(reviews)
predictions= []
for output in predicted_labels:
    if output['label'] == 'POSITIVE':
        predictions.append(1)
    else:
        predictions.append(0)
accuracy_result = accuracy_score(predictions,true_labels)
f1_result = f1_score(predictions,true_labels)
print(f"Predicted Labels (Raw): {predicted_labels}")
print(f"Mapped Predictions: {predictions}")
print(f"Accuracy: {accuracy_result}")
print(f"F1 Score: {f1_result}")

Device set to use cpu


Predicted Labels (Raw): [{'label': 'POSITIVE', 'score': 0.9989148378372192}, {'label': 'NEGATIVE', 'score': 0.9995049238204956}, {'label': 'POSITIVE', 'score': 0.998863697052002}, {'label': 'NEGATIVE', 'score': 0.9994910955429077}, {'label': 'POSITIVE', 'score': 0.9989224672317505}]
Mapped Predictions: [1, 0, 1, 0, 1]
Accuracy: 1.0
F1 Score: 1.0


In [3]:
translation_model = pipeline(task= "translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
first_review = reviews[0]
sentences = first_review.split(".")
text_to_translate = ".".join(sentences[:2]) + "."
translation_output = translation_model(text_to_translate)
translated_review = translation_output[0]['translation_text']
with open(ref_path, "r") as f:
    references = [line.strip() for line in f.readlines()]
bleu = evaluate.load("bleu")
bleu_score = bleu.compute(predictions=[translated_review], references=[references])

print(f"Original: {text_to_translate}")
print(f"Translated: {translated_review}")
print(f"BLEU Score: {bleu_score}")

Device set to use cpu


Original: I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use.
Translated: Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.
BLEU Score: {'bleu': 0.7794483794144498, 'precisions': [0.9090909090909091, 0.8571428571428571, 0.75, 0.631578947368421], 'brevity_penalty': 1.0, 'length_ratio': 1.0476190476190477, 'translation_length': 22, 'reference_length': 21}


In [4]:
answer_model = pipeline(task = "question-answering", model = "deepset/minilm-uncased-squad2")
question = "What did he like about the brand?"
context = reviews[1]
output = answer_model(question=question,context=context)
answer = output["answer"]
print(f"Question: {question}")
print(f"Context (Snippet): {context[:50]}...")
print(f"Answer: {answer}")

Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


Question: What did he like about the brand?
Context (Snippet): The car is fine. It's a bit loud and not very powe...
Answer: ride quality, reliability


In [5]:
summary = pipeline(task = "summarization", model="facebook/bart-large-cnn")
last_review = reviews[-1]
summary_output = summary(last_review, min_length=50, max_length=55)
summarized_text = summary_output[0]['summary_text']
print(f"Original Length: {len(last_review)}")
print(f"Summary: {summarized_text}")

Device set to use cpu


Original Length: 1067
Summary: The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong
