![image](car.jpeg)

**Car-ing is sharing**, an auto dealership company for car sales and rental, is taking their services to the next level thanks to **Large Language Models (LLMs)**.

As their newly recruited AI and NLP developer, you've been asked to prototype a chatbot app with multiple functionalities that not only assist customers but also provide support to human agents in the company.

The solution should receive textual prompts and use a variety of pre-trained Hugging Face LLMs to respond to a series of tasks, e.g. classifying the sentiment in a car’s text review, answering a customer question, summarizing or translating text, etc.


In [17]:
# Import necessary packages
import pandas as pd
import torch

from transformers import logging
logging.set_verbosity(logging.WARNING)

In [18]:
import pandas as pd
from transformers import pipeline
from sklearn.metrics import accuracy_score, f1_score
from nltk.translate.bleu_score import sentence_bleu
import nltk
import os

# --- NLTK Download Section ---
print("Checking NLTK data...")
try:
    # Ensure 'punkt' tokenizer models are available
    nltk.data.find('tokenizers/punkt')
    print("'punkt' tokenizer found.")
except LookupError:
    print("'punkt' tokenizer not found, downloading...")
    nltk.download('punkt')
    print("'punkt' tokenizer downloaded.")

try:
    # Ensure 'wordnet' corpus is available for general NLP tasks (like lemmatization, though not directly used in your provided code for this version)
    nltk.data.find('corpora/wordnet')
    print("'wordnet' corpus found.")
except LookupError:
    print("'wordnet' corpus not found, downloading...")
    nltk.download('wordnet')
    print("'wordnet' corpus downloaded.")
print("NLTK data check complete.")
# --- End NLTK Download Section ---

# Verify file existence
csv_file = 'car_reviews.csv'
ref_file = 'reference_translations.txt'

if not os.path.exists(csv_file):
    raise FileNotFoundError(f"Error: The file '{csv_file}' was not found in the current working directory ({os.getcwd()}). Please ensure it's there.")
if not os.path.exists(ref_file):
    raise FileNotFoundError(f"Error: The file '{ref_file}' was not found in the current working directory ({os.getcwd()}). Please ensure it's there.")

# Load the car reviews dataset with explicit encoding and separator
try:
    # Attempt to read with 'utf-8', then 'latin1' if utf-8 fails, as CSVs can be tricky.
    try:
        df = pd.read_csv(csv_file, sep=';', encoding='utf-8')
    except UnicodeDecodeError:
        print(f"UTF-8 decoding failed for {csv_file}. Trying 'latin1' encoding.")
        df = pd.read_csv(csv_file, sep=';', encoding='latin1')

    # Verify column names
    expected_columns = ['Review', 'Class']
    if not all(col in df.columns for col in expected_columns):
        missing_cols = [col for col in expected_columns if col not in df.columns]
        raise ValueError(f"CSV must contain columns: {expected_columns}. Missing: {missing_cols}")
    if df.empty:
        raise ValueError(f"The CSV file '{csv_file}' is empty or contains no data rows.")
    if df['Review'].isnull().any() or df['Class'].isnull().any():
        print("Warning: Missing values found in 'Review' or 'Class' columns. These rows might be skipped or cause errors.")
        df.dropna(subset=['Review', 'Class'], inplace=True) # Drop rows with NaN in critical columns
    if df.empty: # Check again after dropping NaNs
        raise ValueError("After cleaning missing values, the DataFrame is empty.")

except Exception as e:
    raise Exception(f"Error loading or processing CSV file '{csv_file}': {str(e)}")

print(f"Successfully loaded {len(df)} reviews from '{csv_file}'.")

# --- Task 1: Sentiment Classification ---
print("\n--- Task 1: Sentiment Classification ---")
# Initialize the sentiment analysis pipeline
try:
    # Using device=-1 to leverage GPU if available, otherwise CPU
    sentiment_classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english', device=-1)
    print("Sentiment classifier loaded successfully.")
except Exception as e:
    raise Exception(f"Error initializing sentiment classifier model. Please check your internet connection or model name. Error: {str(e)}")

# Classify the sentiment of each review
predicted_labels = []
for i, review in enumerate(df['Review']):
    try:
        if not isinstance(review, str):
            print(f"Skipping review at index {i} due to non-string type: {type(review)}. Value: {review}")
            predicted_labels.append('UNKNOWN') # Or 'NEUTRAL', 'NEGATIVE' based on desired fallback
            continue

        # Truncate review if it's too long for the model (typically 512 tokens)
        # This is a heuristic; for production, consider more advanced chunking/summarization
        if len(review.split()) > 400: # Rough token estimate
            review_for_model = ' '.join(review.split()[:400])
            # print(f"Warning: Review {i} is very long, truncating for sentiment analysis.")
        else:
            review_for_model = review

        result = sentiment_classifier(review_for_model)[0]['label']
        predicted_labels.append(result)
    except Exception as e:
        print(f"Error classifying review {i}: '{review[:70]}...' Error: {str(e)}. Falling back to 'NEGATIVE'.")
        predicted_labels.append('NEGATIVE') # Fallback to avoid breaking for individual review errors

# Map predicted labels to binary {0, 1} (POSITIVE=1, NEGATIVE=0)
predictions = [1 if label.upper() == 'POSITIVE' else 0 for label in predicted_labels]

# Map ground truth labels to binary {0, 1}
try:
    true_labels = []
    for i, label in enumerate(df['Class']):
        if isinstance(label, str):
            true_labels.append(1 if label.upper() == 'POSITIVE' else 0)
        else:
            print(f"Warning: Ground truth label at index {i} is not a string ({type(label)}). Assuming 0 (Negative). Value: {label}")
            true_labels.append(0) # Default for non-string ground truth labels
except Exception as e:
    raise Exception(f"Error processing ground truth labels from 'Class' column: {str(e)}")

# Calculate accuracy and F1 score
try:
    accuracy_result = accuracy_score(true_labels, predictions)
    f1_result = f1_score(true_labels, predictions)
    print("Sentiment classification metrics calculated.")
except ValueError as e:
    print(f"Error calculating metrics. Ensure true_labels and predictions are non-empty and compatible: {str(e)}")
    accuracy_result = 0.0
    f1_result = 0.0


# --- Task 2: Translation and BLEU Score ---
print("\n--- Task 2: Translation and BLEU Score ---")
# Initialize the translation pipeline
try:
    translator = pipeline('translation_en_to_es', model='Helsinki-NLP/opus-mt-en-es', device=-1)
    print("Translator loaded successfully.")
except Exception as e:
    raise Exception(f"Error initializing translator model. Please check your internet connection or model name. Error: {str(e)}")

# Extract the first two sentences of the first review
text_to_translate = ""
try:
    if df.empty or len(df) < 1:
        raise ValueError("Cannot perform translation: DataFrame is empty or has no first review.")
    first_review = df['Review'].iloc[0] # Use .iloc for robust integer-based indexing
    if not isinstance(first_review, str):
        raise TypeError(f"First review is not a string: {type(first_review)}. Value: {first_review}")

    sentences = nltk.sent_tokenize(first_review)
    if len(sentences) < 2:
        # Fallback for reviews with fewer than 2 sentences: use the whole review
        print(f"Warning: First review has fewer than 2 sentences ({len(sentences)}). Using the entire review for translation.")
        text_to_translate = ' '.join(sentences)
        if not text_to_translate: # If the review was empty
            raise ValueError("First review is empty after tokenization.")
    else:
        text_to_translate = ' '.join(sentences[:2])
    print(f"Original text for translation: '{text_to_translate[:100]}...'")
except Exception as e:
    raise Exception(f"Error preparing text for translation from first review: {str(e)}")

# Perform translation
translated_review = ""
try:
    if text_to_translate: # Ensure there's something to translate
        translation_output = translator(text_to_translate, max_length=512)
        translated_review = translation_output[0]['translation_text']
        print(f"Translated text: '{translated_review[:100]}...'")
    else:
        print("Warning: No text to translate. translated_review will be empty.")
except Exception as e:
    raise Exception(f"Error during translation process: {str(e)}")

# Load reference translations
reference_translations = []
try:
    with open(ref_file, 'r', encoding='utf-8') as f:
        reference_translations = [line.strip() for line in f if line.strip()]
    if not reference_translations:
        raise ValueError(f"Reference translations file '{ref_file}' is empty or contains no valid lines.")
    print(f"Loaded {len(reference_translations)} reference translations.")
except Exception as e:
    raise Exception(f"Error loading reference translations from '{ref_file}': {str(e)}")

# Calculate BLEU score
bleu_score = 0.0
try:
    if not translated_review:
        print("Cannot calculate BLEU score: translated_review is empty.")
    elif not reference_translations:
        print("Cannot calculate BLEU score: reference_translations are empty.")
    else:
        # BLEU expects a list of reference token lists, and one candidate token list
        # So, reference_tokens should be [[ref1_words], [ref2_words], ...]
        reference_tokens = [nltk.word_tokenize(ref) for ref in reference_translations]
        candidate_tokens = nltk.word_tokenize(translated_review)
        bleu_score = sentence_bleu(reference_tokens, candidate_tokens)
        print("BLEU score calculated.")
except Exception as e:
    print(f"Error calculating BLEU score: {str(e)}. Setting BLEU to 0.0.")
    bleu_score = 0.0


# --- Task 3: Extractive Question Answering ---
print("\n--- Task 3: Extractive Question Answering ---")
# Initialize the QA pipeline
try:
    qa_pipeline = pipeline('question-answering', model='deepset/minilm-uncased-squad2', device=-1)
    print("Question-answering pipeline loaded successfully.")
except Exception as e:
    raise Exception(f"Error initializing QA pipeline model. Please check your internet connection or model name. Error: {str(e)}")

# Define question and context for the second review
question = "What did he like about the brand?"
context = ""
try:
    if df.empty or len(df) < 2:
        raise ValueError("Cannot perform QA: DataFrame has fewer than 2 reviews.")
    context = df['Review'].iloc[1] # Use .iloc for robust integer-based indexing
    if not isinstance(context, str):
        raise TypeError(f"Second review (context for QA) is not a string: {type(context)}. Value: {context}")
    if not context.strip():
        raise ValueError("Second review (context for QA) is empty or just whitespace.")
    print(f"QA Question: '{question}'")
    print(f"QA Context: '{context[:100]}...'")
except Exception as e:
    raise Exception(f"Error preparing context for QA from second review: {str(e)}")

# Get the answer
answer = "No answer found."
try:
    if question and context: # Ensure both are available
        qa_result = qa_pipeline(question=question, context=context)
        answer = qa_result['answer']
        print(f"QA Answer found: '{answer}'")
    else:
        print("Warning: Question or context missing for QA. Answer will be default.")
except Exception as e:
    print(f"Error during Question Answering: {str(e)}. Setting answer to default.")
    answer = "An error occurred while trying to find an answer."


# --- Task 4: Summarization ---
print("\n--- Task 4: Summarization ---")
# Initialize the summarization pipeline
try:
    summarizer = pipeline('summarization', model='facebook/bart-large-cnn', device=-1)
    print("Summarization pipeline loaded successfully.")
except Exception as e:
    raise Exception(f"Error initializing summarization model. Please check your internet connection or model name. Error: {str(e)}")

# Summarize the last review (aim for ~50-55 tokens)
summarized_text = "No summary generated."
try:
    if df.empty or len(df) < 5: # Assuming at least 5 reviews for "last review"
        print("Warning: Not enough reviews in the dataset to summarize the last one. Skipping summarization.")
    else:
        last_review = df['Review'].iloc[-1] # Use .iloc[-1] for the last review
        if not isinstance(last_review, str):
            raise TypeError(f"Last review for summarization is not a string: {type(last_review)}. Value: {last_review}")
        if not last_review.strip():
            raise ValueError("Last review for summarization is empty or just whitespace.")
        print(f"Original text for summarization: '{last_review[:100]}...'")

        # Set max_length and min_length for token count control
        # Note: These are token lengths, not word counts. 50-55 tokens is a tight range.
        summarized = summarizer(last_review, max_length=55, min_length=50, do_sample=False)
        summarized_text = summarized[0]['summary_text']
        print(f"Summarized text: '{summarized_text}'")
except Exception as e:
    print(f"Error during summarization: {str(e)}. Setting summary to default.")
    summarized_text = "An error occurred during summarization."


# --- Print Results for Verification ---
print("\n--- Final Results ---")
print(f"Sentiment Classification Accuracy: {accuracy_result:.4f}")
print(f"Sentiment Classification F1 Score: {f1_result:.4f}")
print(f"Translated Review (English to Spanish): {translated_review}")
print(f"BLEU Score (Translation Quality): {bleu_score:.4f}")
print(f"QA Question: {question}")
print(f"QA Context (second review snippet): {context[:70]}...")
print(f"QA Answer: {answer}")
print(f"Summarized Last Review (approx. 50-55 tokens): {summarized_text}")

Checking NLTK data...
'punkt' tokenizer found.
'wordnet' corpus not found, downloading...
'wordnet' corpus downloaded.
NLTK data check complete.
Successfully loaded 5 reviews from 'car_reviews.csv'.

--- Task 1: Sentiment Classification ---


[nltk_data] Downloading package wordnet to /home/repl/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
Device set to use cpu


Sentiment classifier loaded successfully.
Sentiment classification metrics calculated.

--- Task 2: Translation and BLEU Score ---


Device set to use cpu


Translator loaded successfully.
Original text for translation: 'I am very satisfied with my 2014 Nissan NV SL. I use this van for my business deliveries and persona...'


Some weights of the model checkpoint at deepset/minilm-uncased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


Translated text: 'Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y us...'
Loaded 2 reference translations.
BLEU score calculated.

--- Task 3: Extractive Question Answering ---
Question-answering pipeline loaded successfully.
QA Question: 'What did he like about the brand?'
QA Context: 'The car is fine. It's a bit loud and not very powerful. On one hand, compared to its peers, the inte...'
QA Answer found: 'ride quality, reliability'

--- Task 4: Summarization ---


Device set to use cpu


Summarization pipeline loaded successfully.
Original text for summarization: 'I've been dreaming of owning an SUV for quite a while, but I've been driving cars that were already ...'
Summarized text: 'The Nissan Rogue provides me with the desired SUV experience without burdening me with an exorbitant payment. Handling and styling are great; I have hauled 12 bags of mulch in the back with the seats down and could have held more. The engine delivers strong'

--- Final Results ---
Sentiment Classification Accuracy: 0.8000
Sentiment Classification F1 Score: 0.8571
Translated Review (English to Spanish): Estoy muy satisfecho con mi Nissan NV SL 2014. Uso esta camioneta para mis entregas de negocios y uso personal.
BLEU Score (Translation Quality): 0.7794
QA Question: What did he like about the brand?
QA Context (second review snippet): The car is fine. It's a bit loud and not very powerful. On one hand, c...
QA Answer: ride quality, reliability
Summarized Last Review (approx. 50-55 tokens):