![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [235]:
!pip install xformers
!pip install evaluate==0.4.0
!pip install datasets==2.10.0
!pip install sentencepiece==0.1.97

from transformers import logging, pipeline
import pandas as pd
import os
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [236]:
import pandas as pd

# print(os.getcwd())
# os.listdir("/work/files/workspace")
dirs = "/work/files/workspace/data"
reviews_df = pd.read_csv(dirs + "/car_reviews.csv", sep=";")
reference_df = pd.read_csv(dirs + "/reference_translations.txt")
reviews_df

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


In [237]:
references = [1 if label == "POSITIVE" else 0 for label in reviews_df["Class"]]

In [238]:
reviews = reviews_df["Review"].tolist()
real_labels = reviews_df["Class"].tolist()

## Classification Task

In [239]:
# Start your code here!
model = 'distilbert-base-uncased-finetuned-sst-2-english'
classifier = pipeline(task="text-classification", model=model)

In [240]:
predicted_labels = classifier(reviews)
print(predicted_labels)

[{'label': 'POSITIVE', 'score': 0.9293975830078125}, {'label': 'POSITIVE', 'score': 0.8654279708862305}, {'label': 'POSITIVE', 'score': 0.9994640946388245}, {'label': 'NEGATIVE', 'score': 0.9935314059257507}, {'label': 'POSITIVE', 'score': 0.9986565113067627}]


In [241]:
for pred_label, actual_label in zip(predicted_labels, real_labels):
    print(pred_label['label'], pred_label['score'], actual_label)

POSITIVE 0.9293975830078125 POSITIVE
POSITIVE 0.8654279708862305 NEGATIVE
POSITIVE 0.9994640946388245 POSITIVE
NEGATIVE 0.9935314059257507 NEGATIVE
POSITIVE 0.9986565113067627 POSITIVE


In [242]:
predictions = [1 if pred_label["label"] == "POSITIVE" else 0 for pred_label in predicted_labels]
predictions

[1, 1, 1, 0, 1]

In [243]:
import evaluate
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

# Calculate accuracy and f1-score
accuracy_ = accuracy.compute(references=references,
                                  predictions=predictions)

f1_ = f1.compute(references=references,
                                  predictions=predictions)
accuracy_result = accuracy_["accuracy"]
f1_result = f1_["f1"]
print("Accuracy :",accuracy_result)
print("F1-score :",f1_result)

Accuracy : 0.8
F1-score : 0.8571428571428571


## Translation Task

In [244]:
model = "Helsinki-NLP/opus-mt-en-es"
translator = pipeline(task="translation", model=model)

In [245]:
import re
def get_first_two_sentences(text):
    sentences = re.split(r'(?<=[.!?])\s+', text)  # Split based on sentence-ending punctuation
    return ' '.join(sentences[:2])

first_two_sent = get_first_two_sentences(reviews_df.Review[0])
first_two_sent

'I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use.'

In [246]:
translated_output = translator(first_two_sent)
translated_review = translated_output[0]['translation_text']
translated_review

'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.'

In [247]:
reference = reference_df.loc[0][0]

bleu = evaluate.load("bleu")
bleu_score = bleu.compute(predictions=translated_review,
                         references=reference)
print(bleu_score)

{'bleu': 0.0, 'precisions': [0.8586956521739131, 0.0, 0.0, 0.0], 'brevity_penalty': 0.9891892950517397, 'length_ratio': 0.989247311827957, 'translation_length': 92, 'reference_length': 93}


## Generative Question and Answer

In [248]:
from transformers import AutoTokenizer
from transformers import AutoModelForQuestionAnswering

In [249]:
model_init = "deepset/minilm-uncased-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_init)

In [250]:
model=AutoModelForQuestionAnswering.from_pretrained(model_init)

In [251]:
context = reviews_df.Review[1]
question = "What did he like about the brand?"
inputs = tokenizer(question, context, return_tensors="pt")

In [252]:
with torch.no_grad():
  outputs = model(**inputs)
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits) + 1
answer_span = inputs["input_ids"][0][start_idx:end_idx]

# Decode and show answer
answer = tokenizer.decode(answer_span)
print("Answer: ", answer)

Answer:  ride quality, reliability


## Summarization

In [253]:
last_review = reviews_df.Review.iloc[-1]
last_review

"I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. Th

In [254]:
model = "cnicu/t5-small-booksum"
summarizer = pipeline(task="summarization", model=model)
summarized_output = summarizer(last_review, max_length=50)
summarized_text = summarized_output[0]['summary_text']
summarized_text

'the Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. I have hauled 12 bags of mulch in the back with the seats down and could have'