![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [14]:
!pip install transformers
!pip install evaluate

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


1. Classify car reviews 

Load a sentiment analysis LLM to classify the sentiment of each review in the dataset into POSITIVE or NEGATIVE, and utilize the real labels to calculate the accuracy and F1 score of predictions.


In [15]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForQuestionAnswering, AutoModelForSeq2SeqLM
from datasets import load_metric
from nltk.translate.bleu_score import corpus_bleu
import csv

In [16]:
# Initialize sentiment analysis model
#sentiment_model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
#sentiment_tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
#sentiment_analyzer = pipeline('sentiment-analysis', model=sentiment_model, tokenizer=sentiment_tokenizer)

# Initialize sentiment analysis model and tokenizer
sentiment_model_name = 'distilbert-base-uncased-finetuned-sst-2-english'
sentiment_model = AutoModelForSequenceClassification.from_pretrained(sentiment_model_name)
sentiment_tokenizer = AutoTokenizer.from_pretrained(sentiment_model_name)
sentiment_analyzer = pipeline('sentiment-analysis', model=sentiment_model, tokenizer=sentiment_tokenizer)


In [17]:
# Initialize translation model
translation_model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-en-es')
translation_tokenizer = AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-en-es')
translator = pipeline('translation_en_to_es', model=translation_model, tokenizer=translation_tokenizer)

In [5]:
# Initialize extractive QA model
qa_model = AutoModelForQuestionAnswering.from_pretrained('deepset/minilm-uncased-squad2')
qa_tokenizer = AutoTokenizer.from_pretrained('deepset/minilm-uncased-squad2')
qa_pipeline = pipeline('question-answering', model=qa_model, tokenizer=qa_tokenizer)

# Initialize summarization model
summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Error: Failed to execute this cell, please try again.

In [6]:
import pandas as pd
# Read the car reviews
#with open('./data/car_reviews.csv', 'r') as file:
#    reader = csv.reader(file)
#    reviews = list(reader)

# Extract reviews
#review_1 = reviews[0][0]
#review_2 = reviews[1][0]
#review_5 = reviews[4][0]
# Load car reviews dataset
data_path = './data/car_reviews.csv'
df = pd.read_csv(data_path, delimiter=';', header=0)

# Extract reviews
reviews = df['Review'].tolist()
classes = df['Class'].tolist()
print (reviews)
print (classes)

['I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and personal use. Camping, road trips, etc. We dont have any children so I store most of the seats in my warehouse. I wanted the passenger van for the rear air conditioning. We drove our van from Florida to California for a Cross Country trip in 2014. We averaged about 18 mpg. We drove thru a lot of rain and It was a very comfortable and stable vehicle. The V8 Nissan Titan engine is a 500k mile engine. It has been tested many times by delivery and trucking companies. This is why Nissan gives you a 5 year or 100k mile bumper to bumper warranty. Many people are scared about driving this van because of its size. But with front and rear sonar sensors, large mirrors and the back up camera. It is easy to drive. The front and rear sensors also monitor the front and rear sides of the bumpers making it easier to park close to objects. Our Nissan NV is a Tow Monster. It pulls our 5000 pound travel trailer 

In [7]:
# Sentiment Analysis
def get_sentiments(reviews):
    results = sentiment_analyzer(reviews)
    labels = [result['label'] for result in results]
    return labels

sentiments = get_sentiments([review[0] for review in reviews])
predicted_labels = sentiments
predictions = [1 if label == "POSITIVE" else 0 for label in predicted_labels]

In [8]:
# Load real labels (assuming they are in the same order as reviews)
#with open('real_labels.csv', 'r') as file:
#    reader = csv.reader(file)
#    real_labels = list(reader)
real_labels = [1 if label[0] == "POSITIVE" else 0 for label in classes]


In [9]:
# Calculate metrics
accuracy_metric = load_metric('accuracy')
f1_metric = load_metric('f1')

accuracy_result = accuracy_metric.compute(predictions=predictions, references=real_labels)['accuracy']
f1_result = f1_metric.compute(predictions=predictions, references=real_labels)['f1']

Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

In [10]:
# Translation
first_review = reviews[0]
sentences_to_translate = ' '.join(first_review.split('.')[:2])

translated = translator(sentences_to_translate, max_length=30)[0]['translation_text']
translated_review = translated

In [11]:
# BLEU score calculation
from datasets import load_dataset
from nltk.translate.bleu_score import corpus_bleu

# Read reference translations for BLEU score
with open('./data/reference_translations.txt', 'r') as file:
    references = [line.strip().split('\t') for line in file]

# Ensure the references are formatted correctly
formatted_references = []
formatted_references.append([ref.strip() for ref in references[0] if ref.strip()])

# Calculate BLEU score
# Convert translated_review to list of hypotheses
hypotheses = [translated_review]

# Ensure the number of hypotheses matches the number of references
if len(hypotheses) == len(formatted_references):
    bleu_score = corpus_bleu(formatted_references, hypotheses)
else:
    bleu_score = None
    print("Mismatch in the number of references and hypotheses. BLEU score cannot be computed.")


In [12]:
# Extractive QA
question = "What did he like about the brand?"
context = reviews[1]
qa_result = qa_pipeline(question=question, context=context)
answer = qa_result['answer']

# Summarization
last_review = reviews[4]
summarized_result = summarizer(last_review, max_length=55, min_length=50, do_sample=False)
summarized_text = summarized_result[0]['summary_text']

# Print results
print("Sentiment Analysis:")
print("Predicted Labels:", predicted_labels)
print("Predictions:", predictions)
print("Accuracy:", accuracy_result)
print("F1 Score:", f1_result)

print("\nTranslation:")
print("Translated Review:", translated_review)
print("BLEU Score:", bleu_score)

print("\nExtractive QA Answer:")
print("Answer:", answer)

print("\nSummarization:")
print("Summarized Text:", summarized_text)

Sentiment Analysis:
Predicted Labels: ['POSITIVE', 'NEGATIVE', 'POSITIVE', 'POSITIVE', 'POSITIVE']
Predictions: [1, 0, 1, 1, 1]
Accuracy: 0.2
F1 Score: 0.0

Translation:
Translated Review: Estoy muy satisfecho con mi Nissan NV SL 2014 Uso esta camioneta para mis entregas de negocios y uso personal
BLEU Score: 0.8237388624214945

Extractive QA Answer:
Answer: ride quality, reliability

Summarization:
Summarized Text:  Nissan Rogue provides the desired SUV experience without burdening me with an exorbitant payment . Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more . The engine delivers strong performance, and


The 'distilbert-base-uncased-finetuned-sst-2-english' model is a good choice of pre-trained LLM for binary sentiment classification

Label mapping for metrics computation
In a text classification context, the accuracy and F1 score metrics in the evaluate library take two arguments: references, containing the ground truth labels, and predictions, containing the classification outputs produced by your model.
Before passing these two collections of labels to the metric for computation, it is necessary to map the categorical POSITIVE, NEGATIVE labels in the outputs and dataset ground-truth into numerical {0,1} labels.
Here is an example illustrating how to do it for the ground-truth labels: references = [1 if label == "POSITIVE" else 0 for label in real_labels]