![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


## Before you start

In order to complete the project you may wish to install some Hugging Face libraries such as `transformers` and `evaluate`.

In [1]:
!pip install transformers
!pip install evaluate

from transformers import logging
logging.set_verbosity(logging.WARNING)

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.2-py3-none-any.whl.metadata (9.3 kB)
Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
Installing collected packages: evaluate
[0mSuccessfully installed evaluate-0.4.2


In [2]:
# Start your code here!
import pandas as pd

# Attempt to read the CSV file with a different delimiter
df = pd.read_csv("car_reviews.csv", delimiter=';', error_bad_lines=False)

# Display the DataFrame
df

Unnamed: 0,Review,Class
0,I am very satisfied with my 2014 Nissan NV SL....,POSITIVE
1,The car is fine. It's a bit loud and not very ...,NEGATIVE
2,"My first foreign car. Love it, I would buy ano...",POSITIVE
3,I've come across numerous reviews praising the...,NEGATIVE
4,I've been dreaming of owning an SUV for quite ...,POSITIVE


In [3]:
# importing pre trained LLM model for sentiment analysis 
#from transformers import AutoModelForSequenceClassification,AutoTokenizer
#model_name = "distilbert-base-uncased-finetuned-sst-2-english"
#tokenizer = AutoTokenizer.from_pretrained(model_name)
#model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [4]:
# Convert the 'Review' column of the DataFrame into a list
review_list = df['Review'].tolist()

In [5]:
# Padding and truncation are used to ensure that all input sequences are of the same length.
# This is necessary because the model expects inputs of a fixed size.
# Padding adds extra tokens to shorter sequences so they match the length of the longest sequence.
# Truncation shortens longer sequences to the maximum allowed length.
#inputs = tokenizer(review_list, return_tensors="pt", padding=True, truncation=True)

In [6]:
#predicted_labels = model(**inputs)
#predicted_labels

In [7]:
# Logits are the raw, unnormalized scores output by the model before applying an activation function like softmax.
# They are used because they provide more information about the model's confidence in its predictions.
# By examining the logits, we can understand the relative differences in the model's predictions for each class.
#logits = predicted_labels.logits
#logits

In [8]:
import torch

# Get the predicted class indices
#predictions = torch.argmax(logits, dim=1).tolist()
#print(type(predictions))
# Map the predicted indices to sentiment labels
#sentiment_labels = ["Negative" if label == 0 else "Positive" for label in predictions]
#sentiment_labels

## Using Pipeline

In [9]:
from transformers import pipeline

# Initialize the text classification pipeline
text_classifier = pipeline(task="text-classification", model='distilbert-base-uncased-finetuned-sst-2-english')

# Get predictions for all reviews in the list
predicted_labels = text_classifier(review_list)

predicted_labels

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[{'label': 'POSITIVE', 'score': 0.9293975830078125},
 {'label': 'POSITIVE', 'score': 0.8654279708862305},
 {'label': 'POSITIVE', 'score': 0.9994640946388245},
 {'label': 'NEGATIVE', 'score': 0.9935314059257507},
 {'label': 'POSITIVE', 'score': 0.9986565113067627}]

In [10]:
predictions = [1 if pred['label'] == "POSITIVE"  else 0 for pred in predicted_labels]

### Accuracy and F1 score

In [11]:
true_labels = df['Class'].tolist()
real_labels = [1 if true == "POSITIVE" else 0 for true in true_labels]

In [12]:
import evaluate
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")
accuracy_result = accuracy.compute(references=real_labels, predictions=predictions)
f1_result = f1.compute(references=real_labels, predictions=predictions)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

In [13]:
accuracy_result

{'accuracy': 0.8}

In [14]:
f1_result

{'f1': 0.8571428571428571}

## Translation 

In [15]:
llm_model_trans = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")


Downloading:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

In [16]:
input_text = df['Review'][0].split('.')[:2]
input_text

['I am very satisfied with my 2014 Nissan NV SL',
 ' I use this van for my business deliveries and personal use']

In [17]:
output = llm_model_trans(input_text, clean_up_tokenization_spaces=True)
translated_review = " .".join(sentence['translation_text'] for sentence in output)
translated_review

'Estoy muy satisfecho con mi Nissan NV SL 2014 .Uso esta camioneta para mis entregas de negocios y uso personal'

In [18]:
# Import list of reference_translations from "reference_translations.txt"
with open("reference_translations.txt", "r", encoding="utf-8") as file:
    reference_translations = [[line.strip() for line in file.readlines()]]

reference_translations

[['Estoy muy satisfecho con mi Nissan NV SL 2014. Utilizo esta camioneta para mis entregas comerciales y uso personal.',
  'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta furgoneta para mis entregas comerciales y uso personal.']]

In [19]:
#Evaluation 
bleu = evaluate.load("bleu")
blue_score = bleu.compute(predictions=[translated_review], references= reference_translations)
blue_score

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

{'bleu': 0.7671176261207451,
 'precisions': [0.9047619047619048,
  0.85,
  0.7368421052631579,
  0.6111111111111112],
 'brevity_penalty': 1.0,
 'length_ratio': 1.0,
 'translation_length': 21,
 'reference_length': 21}

## QA LLM

In [20]:
from transformers import pipeline

# Correcting the pipeline initialization
llm_model_qa = pipeline("question-answering", model="deepset/minilm-uncased-squad2")


Downloading:   0%|          | 0.00/477 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/133M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/107 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [21]:
question = "What did he like about the brand?"
context = df["Review"][1]


In [22]:
answer = llm_model_qa(question=question , context = context)
answer

{'score': 0.47736144065856934,
 'start': 569,
 'end': 594,
 'answer': 'ride quality, reliability'}

## Text summarization

In [6]:
from transformers import pipeline

llm_model_summary = pipeline("summarization", model="facebook/bart-large-cnn")

Downloading:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

In [3]:
last_review = df['Review'].iloc[-1]
last_review

"I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already paid for during an extended period. I ultimately made the decision to transition to a brand-new car, which, of course, involved taking on new payments. However, given that I don't drive extensively, I was inclined to avoid a substantial financial commitment. The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment; the financial arrangement is quite reasonable. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. I am VERY satisfied overall. I find myself needing to exercise extra caution when making lane changes, particularly owing to the blind spots resulting from the small side windows situated towards the rear of the vehicle. To address this concern, I am actively engaged in making adjustments to my mirrors and consciously reducing the frequency of lane changes. Th

In [10]:
summarized_text = llm_model_summary(last_review, max_length=55, min_length=50, clean_up_tokenization_spaces=True)