![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [1]:
!pip install transformers
!pip install evaluate

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.2-py3-none-any.whl.metadata (9.3 kB)
Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
Installing collected packages: evaluate
[0mSuccessfully installed evaluate-0.4.2


1. Classify car reviews 

Load a sentiment analysis LLM to classify the sentiment of each review in the dataset into POSITIVE or NEGATIVE, and utilize the real labels to calculate the accuracy and F1 score of predictions.


In [2]:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForQuestionAnswering, AutoModelForSeq2SeqLM
from datasets import load_metric
import csv

In [3]:
# Initialize sentiment analysis model
sentiment_model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
sentiment_tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
sentiment_analyzer = pipeline('sentiment-analysis', model=sentiment_model, tokenizer=sentiment_tokenizer)

# Initialize translation model
translation_model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-en-es')
translation_tokenizer = AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-en-es')
translator = pipeline('translation_en_to_es', model=translation_model, tokenizer=translation_tokenizer)

# Initialize extractive QA model
qa_model = AutoModelForQuestionAnswering.from_pretrained('deepset/minilm-uncased-squad2')
qa_tokenizer = AutoTokenizer.from_pretrained('deepset/minilm-uncased-squad2')
qa_pipeline = pipeline('question-answering', model=qa_model, tokenizer=qa_tokenizer)

# Initialize summarization model
summarizer = pipeline('summarization')

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


Downloading:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/477 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/133M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/107 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
# Read the car reviews
with open('car_reviews.csv', 'r') as file:
    reader = csv.reader(file)
    reviews = list(reader)

# Extract reviews
review_1 = reviews[0][0]
review_2 = reviews[1][0]
review_5 = reviews[4][0]

FileNotFoundError: [Errno 2] No such file or directory: 'car_reviews.csv'

In [None]:
# Sentiment Analysis
def get_sentiments(reviews):
    results = sentiment_analyzer(reviews)
    labels = [result['label'] for result in results]
    return labels

sentiments = get_sentiments([review[0] for review in reviews])
predicted_labels = sentiments
predictions = [1 if label == "POSITIVE" else 0 for label in predicted_labels]

In [None]:
# Load real labels (assuming they are in the same order as reviews)
with open('real_labels.csv', 'r') as file:
    reader = csv.reader(file)
    real_labels = list(reader)
real_labels = [1 if label[0] == "POSITIVE" else 0 for label in real_labels]


In [None]:
# Calculate metrics
accuracy_metric = load_metric('accuracy')
f1_metric = load_metric('f1')

accuracy_result = accuracy_metric.compute(predictions=predictions, references=real_labels)['accuracy']
f1_result = f1_metric.compute(predictions=predictions, references=real_labels)['f1']

In [None]:
# Translation
sentences_to_translate = ' '.join(review_1.split('.')[:2])  # First two sentences
translated = translator(sentences_to_translate, max_length=30)[0]['translation_text']
translated_review = translated

In [None]:
# BLEU score calculation
from datasets import load_dataset
from nltk.translate.bleu_score import corpus_bleu

with open('reference_translations.txt', 'r') as file:
    references = [line.strip().split('\t') for line in file]

bleu_score = corpus_bleu([[ref] for ref in references], [translated_review])


In [None]:
# Extractive QA
question = "What did he like about the brand?"
context = review_2
qa_result = qa_pipeline(question=question, context=context)
answer = qa_result['answer']

# Summarization
summarized_result = summarizer(review_5, max_length=55, min_length=50, do_sample=False)
summarized_text = summarized_result[0]['summary_text']

# Print results
print("Sentiment Analysis:")
print("Predicted Labels:", predicted_labels)
print("Predictions:", predictions)
print("Accuracy:", accuracy_result)
print("F1 Score:", f1_result)

print("\nTranslation:")
print("Translated Review:", translated_review)
print("BLEU Score:", bleu_score)

print("\nExtractive QA Answer:")
print("Answer:", answer)

print("\nSummarization:")
print("Summarized Text:", summarized_text)

The 'distilbert-base-uncased-finetuned-sst-2-english' model is a good choice of pre-trained LLM for binary sentiment classification

Label mapping for metrics computation
In a text classification context, the accuracy and F1 score metrics in the evaluate library take two arguments: references, containing the ground truth labels, and predictions, containing the classification outputs produced by your model.
Before passing these two collections of labels to the metric for computation, it is necessary to map the categorical POSITIVE, NEGATIVE labels in the outputs and dataset ground-truth into numerical {0,1} labels.
Here is an example illustrating how to do it for the ground-truth labels: references = [1 if label == "POSITIVE" else 0 for label in real_labels]

2. Translate a car review

3. Ask a question about a car review

4. Summarize and analyze a car review