## Import Unsloth Libraries

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install unsloth
# Get latest Unsloth
!pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# %%

In [None]:
!pip install evaluate bert-score nltk rouge_score


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting bert-score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=418df58fc6b4def8a54569e3e4c24ee644fb6cfdafafc04896df65bbfb30a736
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e

## Function to Load the Fine-tunned Models

In [None]:
import unsloth
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

def load_unsloth_model(model_name):
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_name,
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
    )
    return model, tokenizer


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-04-16 06:43:20.917589: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744785801.165144      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744785801.238449      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Pipeline Inputs

In [97]:
review = "I do not understand the enthusiastic reviews on the basis of which I ordered this bag. Looks very cheap. As the skin of a young dermatine is said."
input_category = "health"

## First Phase of Pipeline

- Load `AbuSalehMd/FakeReviewDetection_Mistral_7B_FineTuned` & `AbuSalehMd/ProductCategoryClassificationFinal_Mistral_7B_FineTuned` for detecting fake review and classify product category

In [112]:
# Assign models to specific GPUs
model_1, tokenizer_1 = load_unsloth_model("AbuSalehMd/FakeReviewDetection_Mistral_7B_FineTuned")
model_2, tokenizer_2 = load_unsloth_model("AbuSalehMd/ProductCategoryClassificationFinal_Mistral_7B_FineTuned")


==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Prediction Functions for Fake Review and Product Category

- Defines prompt-based inference functions to detect whether a review is *real or fake* and to classify its *product category*.
- Uses a custom prompt format and Unsloth's `FastLanguageModel` for generating model outputs.
- Handles prediction post-processing to extract clean labels like "real", "fake", or one of five categories.


In [113]:
def generate_test_prompt_single_fake(review_text):
    return f"""
    Determine if the review enclosed in square brackets is real or fake based on its content.
    Return the answer as either "real" or "fake".

    [{review_text}] =
    """.strip()

def predict_fake_review(review_text, model, tokenizer):
    from unsloth import FastLanguageModel
    FastLanguageModel.for_inference(model)
    prompt = generate_test_prompt_single_fake(review_text)

    # Make sure tensor goes to the correct GPU/device
    input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **input_ids,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=1,
        temperature=0.0
    )
    result = tokenizer.decode(outputs[0])
    answer = result.split("=")[-1].strip().lower()

    if "real" in answer:
        return "real"
    elif "fake" in answer:
        return "fake"
    else:
        return "none"


In [114]:
def generate_test_prompt_single_product(review_text):
    return f"""
    Determine the class if the review enclosed in square brackets is automotive or fashion or home or electronics or health category class based on its content.
    Return the answer as either "automotive" or "fashion" or "home" or "electronics" or "health".

    [{review_text}] =
    """.strip()

def predict_category(review_text, model, tokenizer):
    from unsloth import FastLanguageModel
    FastLanguageModel.for_inference(model)
    prompt = generate_test_prompt_single_product(review_text)
    input_ids = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**input_ids, pad_token_id=tokenizer.eos_token_id, max_new_tokens=1, temperature=0.0)
    result = tokenizer.decode(outputs[0])
    answer = result.split("=")[-1].strip().lower()
    if "autom" in answer:
        return "automotive"
    elif "fashion" in answer:
        return "fashion"
    elif "home" in answer:
        return "home"
    elif "electron" in answer:
        return "electronics"
    elif "health" in answer:
        return "health"
    else:
        return "none"


## Fake Review and Category Detection Pipeline

- Implements a 2-step pipeline to first detect *fake reviews*, then verify if the predicted *category* matches the input.
- Automatically filters out irrelevant or fake reviews before proceeding further.
- Logs time taken to run the full pipeline for performance evaluation.


In [115]:
def first_pipeline(review,input_category):
    print(f"🔍 Review: {review}\n📦 Input Category: {input_category}")

    # Step 1: Fake Review Detection
    fake_result = predict_fake_review(review, model_1, tokenizer_1)
    print("🕵️ Fake Review Detection:", fake_result)

    if fake_result == "fake":
        print("❌ Detected as fake review.")
        return "Fake"
    else:
        # Step 2: Category Classification
        category = predict_category(review, model_2, tokenizer_2)
        print("🏷️ Predicted Category:", category)
        if category != input_category.lower():
            print("⚠️ Irrelevant category.")
            return "Irrelevant"
        else:
            return "Rrelevant"

In [117]:
import time

# Start timer
start_time = time.time()

# Run your pipeline
output1 = first_pipeline(review, input_category)

# End timer
end_time = time.time()

# Calculate and print duration
duration1 = end_time - start_time
print(f"\n⏱️ Time taken 1st Pipeline: {duration1:.2f} seconds")


🔍 Review: I do not understand the enthusiastic reviews on the basis of which I ordered this bag. Looks very cheap. As the skin of a young dermatine is said.
📦 Input Category: health
🕵️ Fake Review Detection: real
🏷️ Predicted Category: health

⏱️ Time taken 1st Pipeline: 0.68 seconds


## Second Phase of Pipeline

- load `AbuSalehMd/Review_Response_Generation_Mistral_7B_FineTuned` & `AbuSalehMd/Sentiment_Analysis_Mistral_7B_FineTuned` for sentiment and response generation.

In [125]:
# ✅ Unload both models after loop
del model_1, model_2, tokenizer_1, tokenizer_2
gc.collect()
torch.cuda.empty_cache()

model_1, tokenizer_1 = load_unsloth_model("AbuSalehMd/Review_Response_Generation_Mistral_7B_FineTuned")
model_2, tokenizer_2 = load_unsloth_model("AbuSalehMd/Sentiment_Analysis_Mistral_7B_FineTuned")

==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Sentiment Classification and Reply Generation

- `predict_sentiment`: Classifies a review as *positive*, *neutral*, or *negative* using a prompt-based method.
- `generate_review_reply`: Produces a personalized reply based on the review content, its sentiment, and category using the Alpaca prompt format.
- Ensures the response is contextually relevant and informative for end users.


In [126]:
def predict_sentiment(review_text, model, tokenizer):
    from unsloth import FastLanguageModel
    FastLanguageModel.for_inference(model)

    prompt = f"""
    Determine if the review enclosed in square brackets is positive, neutral or negative based on its content.
    Return the answer as either "positive", "neutral" or "negative".

    [{review_text}] =
    """.strip()

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs, pad_token_id=tokenizer.eos_token_id, max_new_tokens=1, temperature=0.0)
    answer = tokenizer.decode(outputs[0]).split("=")[-1].strip().lower()

    if "positive" in answer:
        return "positive"
    elif "neutral" in answer:
        return "neutral"
    elif "negative" in answer:
        return "negative"
    else:
        return "none"


In [127]:

# Define the prompt format
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

def generate_review_reply(review, sentiment, category_label, model, tokenizer, alpaca_prompt):
    from unsloth import FastLanguageModel
    FastLanguageModel.for_inference(model)

    # Format the prompt
    prompt = alpaca_prompt.format(
        "Generate a helpful and context-aware reply based on the review, sentiment, and category.",
        f"Review: {review}\nSentiment: {sentiment}\nCategory: {category_label}",
        ""
    )

    # Tokenize and move inputs to the correct device
    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    # Generate output
    outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)

    # Decode output
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only the generated reply
    if "### Response:" in decoded_output:
        reply = decoded_output.split("### Response:")[-1].strip()
    else:
        reply = decoded_output.strip()

    return reply


## Sentiment & Reply Generation Pipeline with Timing

- Runs sentiment classification and reply generation only if the review passes fake detection and category matching.
- Measures and prints execution time for Part 2 as well as the total time for the full pipeline.
- Ensures that responses are only generated for valid, real, and category-relevant reviews.


In [128]:
def second_pipeline(review,input_category):
    # Step 3: Sentiment Classification
    sentiment = predict_sentiment(review, model_2, tokenizer_2)
    print("💬 Sentiment:", sentiment)

    # Step 4: Generate Reply
    reply = generate_review_reply(review, sentiment, input_category, model_1, tokenizer_1, alpaca_prompt)
    print("✍️ Generated Reply:")
    return reply

In [129]:
import time

# Start total timer
start_time = time.time()

# Run part 2 only if relevant
if output1 == "Rrelevant":
    output2 = second_pipeline(review, input_category)
    print("✅ Final Output (Reply):", output2)
else:
    print("🚫 Final Output:", output1)

# End total timer
end_time = time.time()
duration2 = end_time - start_time
# Show execution duration
print(f"\n⏱️ Total Time Taken 2nd Pipeline: {duration2:.2f} seconds")


print(f"\n⏱️ Total Time for Full Pipeline: {duration1+duration2:.2f} seconds")

💬 Sentiment: negative
✍️ Generated Reply:
✅ Final Output (Reply): Thank you for your feedback! We appreciate your honesty and understand that the product didn't meet your expectations. We are constantly working to improve and offer more options that may be a better fit.

⏱️ Total Time Taken 2nd Pipeline: 2.94 seconds

⏱️ Total Time for Full Pipeline: 3.62 seconds


In [111]:
# ✅ Unload both models after loop
del model_1, model_2, tokenizer_1, tokenizer_2
gc.collect()
torch.cuda.empty_cache()

## Test Dataset for Full Inference Pipeline for Evaluation

- Contains 14 labeled review samples with expected outputs for fake review detection, product category classification, sentiment analysis, and reply generation.
- Each entry includes:
  - `review`: User review text.
  - `input_category`: User-provided product category.
  - `expected_fake`, `expected_category`, `expected_sentiment`, `expected_reply`: Expected outputs for evaluation.
- Out of 14 examples, **10 are valid** (i.e., real and correctly categorized) and passed to the second pipeline stage for sentiment and reply generation.


In [None]:
test_data = [
    {
        "review": "Very satisfied with the product, it is really quite strong. Only bad point is that it does not turn on any light during the Batt's carrageway to know if the load is complete with wonderful rest, I recommend!",
        "input_category": "automotive",
        "expected_fake": "real",
        "expected_category": "automotive",
        "expected_sentiment": "positive",
        "expected_reply": "That's great to hear! Strong and reliable products are always appreciated. Thanks for the positive note!"
    },
    {
        "review": "I really liked I can take out my quiet dog",
        "input_category": "fashion",
        "expected_fake": "real",
        "expected_category": "fashion",
        "expected_sentiment": "positive",
        "expected_reply": "That's great to hear! Thanks for the positive note and your creative use of the product."
    },
    {
        "review": "nice nice nice nice nice",
        "input_category": "fashion",
        "expected_fake": "fake",
        "expected_category": "fashion",
        "expected_sentiment": "positive",
        "expected_reply": ""
    },
    {
        "review": "These seat covers fit perfectly in my car. They are very easy to install and look great!",
        "input_category": "automotive",
        "expected_fake": "real",
        "expected_category": "automotive",
        "expected_sentiment": "positive",
        "expected_reply": "Perfect fit is always a good sign! Thanks for the great review and your support."
    },
    {
        "review": "Really high-quality bags, buying worth...",
        "input_category": "fashion",
        "expected_fake": "real",
        "expected_category": "fashion",
        "expected_sentiment": "positive",
        "expected_reply": "That's great to hear! Thanks for the positive note and your support."
    },
    {
        "review": "Order received successfully and faster than expected. The covers are as described. Perfect size and suitable color. Very satisfied with the purchase.",
        "input_category": "electronics",
        "expected_fake": "real",
        "expected_category": "electronics",
        "expected_sentiment": "positive",
        "expected_reply": "That’s always great news—thanks for the note!"
    },
    {
        "review": "I like Xro for my sofa no m served quality fabric",
        "input_category": "home",
        "expected_fake": "real",
        "expected_category": "home",
        "expected_sentiment": "positive",
        "expected_reply": "Thanks for the great feedback! We are glad the quality and fit met your expectations."
    },
    {
        "review": "The FIta or belt is thin because the price is already good.",
        "input_category": "health",
        "expected_fake": "real",
        "expected_category": "health",
        "expected_sentiment": "neutral",
        "expected_reply": "Thank you for your feedback! We appreciate your input, and we understand that the product didn't meet your expectations. We are always working to improve, so feel free to check out our other models."
    },
    {
        "review": "Delivery Latvia 10 days. Original case, sound, aplication good",
        "input_category": "electronics",
        "expected_fake": "real",
        "expected_category": "electronics",
        "expected_sentiment": "positive",
        "expected_reply": "Thank you for your feedback! We appreciate your input, and we understand that the product didn't meet your full expectations. We are always working to improve, so feel free to check out our other models."
    },
    {
        "review": "The vacuum cleaner broke down after only two weeks. I am very disappointed with the quality.",
        "input_category": "home",
        "expected_fake": "real",
        "expected_category": "home",
        "expected_sentiment": "negative",
        "expected_reply": "We’re sorry for your experience. Please reach out to our support so we can help resolve this issue for you."
    },
    {
        "review": "Good product. Very good. I like. Will buy again. Good product. Very good.",
        "input_category": "electronics",
        "expected_fake": "fake",
        "expected_category": "electronics",
        "expected_sentiment": "positive",
        "expected_reply": ""
    },
    {
        "review": "thank you all the dependable",
        "input_category": "automotive",
        "expected_fake": "fake",
        "expected_category": "automotive",
        "expected_sentiment": "positive",
        "expected_reply": ""
    },
    {
        "review": "super quality and bistra ane shipping",
        "input_category": "health",
        "expected_fake": "fake",
        "expected_category": "health",
        "expected_sentiment": "positive",
        "expected_reply": ""
    },
    {
        "review": "the case is great a the picture the speech speed is also excellent",
        "input_category": "home",
        "expected_fake": "fake",
        "expected_category": "home",
        "expected_sentiment": "positive",
        "expected_reply": ""
    },
]

passed_examples=10

## Step 1 & 2: Fake Review Detection and Category Classification

- Loads and applies fine-tuned Unsloth models for fake review detection and product category classification.
- Immediately filters out reviews predicted as **fake** before proceeding to category classification.
- Classifies remaining reviews into one of five base categories and filters out mismatches with the input category.
- Stores only valid and relevant reviews for sentiment analysis and reply generation in Part 2.


In [78]:
import gc
import torch

# Step 1 & 2 storage
fake_preds, fake_labels = [], []
cat_preds, cat_labels = [], []

# This will hold items that pass both steps for Part 2
passed_samples = []

# ✅ Load models once before loop
model_1, tokenizer_1 = load_unsloth_model("AbuSalehMd/FakeReviewDetection_Mistral_7B_FineTuned")
model_2, tokenizer_2 = load_unsloth_model("AbuSalehMd/ProductCategoryClassificationFinal_Mistral_7B_FineTuned")

print("\n📋 Step 1 & 2 Predictions:")
print("────────────────────────────────────────────────────────")

for idx, item in enumerate(test_data):
    review = item["review"]
    input_category = item["input_category"]

    # Step 1: Fake detection
    fake = predict_fake_review(review, model_1, tokenizer_1)
    fake_preds.append(fake)
    fake_labels.append(item["expected_fake"])

    # If fake, reject immediately
    if fake == "fake":
        print(f"\n🔁 Sample {idx+1}")
        print(f"📝 Review: {review}")
        print(f"🔍 Predicted Fake/Real: {fake} | Expected: {item['expected_fake']}")
        print(f"❌ Filtered out before Part 2 (Fake Review)")
        continue

    # Step 2: Category classification
    cat = predict_category(review, model_2, tokenizer_2)
    cat_preds.append(cat)
    cat_labels.append(item["expected_category"])

    # 📊 Print outcomes
    print(f"\n🔁 Sample {idx+1}")
    print(f"📝 Review: {review}")
    print(f"🔍 Predicted Fake/Real: {fake} | Expected: {item['expected_fake']}")
    print(f"🏷️ Predicted Category:   {cat}  | Input Category: {input_category}")

    # Check if predicted category matches input category
    if cat == input_category:
        passed_samples.append({
            "review": review,
            "category": cat,
            "expected_sentiment": item["expected_sentiment"],
            "expected_reply": item["expected_reply"]
        })
        print("✅ Passed to Part 2")
    else:
        print("❌ Filtered out before Part 2 (Category Mismatch)")

# ✅ Unload both models after loop
del model_1, model_2, tokenizer_1, tokenizer_2
gc.collect()
torch.cuda.empty_cache()


==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

📋 Step 1 & 2 Predictions:
───────────────────────────────────────────────────────

## Step 3 & 4: Sentiment Analysis and Reply Generation

- Loads fine-tuned models for sentiment classification and context-aware reply generation.
- Predicts the **sentiment** (positive, neutral, negative) for each review passed from previous stages.
- Generates a helpful and personalized **reply** using a structured Alpaca-style prompt format.
- Compares predicted sentiment and reply against expected labels for evaluation and inspection.


In [79]:
# Step 3 & 4 storage
sentiment_preds, sentiment_labels = [], []
reply_preds, reply_labels = [], []

# ✅ Load sentiment and reply models ONCE
model_1, tokenizer_1 = load_unsloth_model("AbuSalehMd/Review_Response_Generation_Mistral_7B_FineTuned")
model_2, tokenizer_2 = load_unsloth_model("AbuSalehMd/Sentiment_Analysis_Mistral_7B_FineTuned")

print("\n📋 Step 3 & 4 Predictions (for passed samples):")
print("────────────────────────────────────────────────────────")

for idx, item in enumerate(passed_samples):
    review = item["review"]
    category = item["category"]

    # Step 3: Sentiment
    sentiment = predict_sentiment(review, model_2, tokenizer_2)
    sentiment_preds.append(sentiment)
    sentiment_labels.append(item["expected_sentiment"])

    # Step 4: Reply
    reply = generate_review_reply(review, sentiment, category, model_1, tokenizer_1, alpaca_prompt)
    reply_preds.append(reply)
    reply_labels.append(item["expected_reply"])

    # 📊 Print outcomes
    print(f"\n🔁 Passed Sample {idx+1}")
    print(f"📝 Review: {review}")
    print(f"💬 Predicted Sentiment: {sentiment} | Expected: {item['expected_sentiment']}")
    print(f"📝 Generated Reply: {reply}")
    print(f"✅ Expected Reply:  {item['expected_reply']}")
    print("────────────────────────────────────────────────────────")

# ✅ Unload models after loop
del model_1, model_2, tokenizer_1, tokenizer_2
gc.collect()
torch.cuda.empty_cache()


==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

📋 Step 3 & 4 Predictions (for passed samples):
──────────────────────────────────

## Evaluation and Output Inspection

- Calculates **accuracy** and **F1 score** for fake review detection, product category classification, and sentiment analysis.
- Evaluates reply generation using **ROUGE**, **METEOR**, and **BERTScore** for natural language quality.
- Displays detailed results including predicted sentiment, generated reply, and comparison with expected values.
- Reports how many reviews passed all pipeline stages and the overall correct rate against expected valid samples.


In [80]:
from sklearn.metrics import accuracy_score, f1_score
from evaluate import load

rouge = load("rouge")
meteor = load("meteor")
bertscore = load("bertscore")

print("\n🔍 Fake Review Detection")
print("Accuracy:", accuracy_score(fake_labels, fake_preds))
print("F1 Score:", f1_score(fake_labels, fake_preds, pos_label="fake"))

print("\n🏷️ Category Classification")
print("Accuracy:", accuracy_score(cat_labels, cat_preds))

if sentiment_preds:
    print("\n💬 Sentiment Classification")
    print("Accuracy:", accuracy_score(sentiment_labels, sentiment_preds))
    print("F1 Score:", f1_score(sentiment_labels, sentiment_preds, average="macro"))

if reply_preds:
    print("\n✍️ Reply Generation")
    print("ROUGE:", rouge.compute(predictions=reply_preds, references=reply_labels))
    print("METEOR:", meteor.compute(predictions=reply_preds, references=reply_labels))
    print("BERTScore:", bertscore.compute(predictions=reply_preds, references=reply_labels, lang="en"))

print(f"\n✅ Full Pipeline Passed: {len(passed_samples)}/{len(test_data)} = {len(passed_samples)/len(test_data):.2%}")
print(f"\n✅ Full Pipeline Correct Rate: {len(passed_samples)}/{passed_examples} = {len(passed_samples)/passed_examples:.2%}")


[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /usr/share/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!



🔍 Fake Review Detection
Accuracy: 1.0
F1 Score: 1.0

🏷️ Category Classification
Accuracy: 0.8888888888888888

💬 Sentiment Classification
Accuracy: 0.875
F1 Score: 0.7948717948717948

✍️ Reply Generation
ROUGE: {'rouge1': 1.0, 'rouge2': 1.0, 'rougeL': 1.0, 'rougeLsum': 1.0}
METEOR: {'meteor': 0.950050609449925}


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore: {'precision': [1.000000238418579, 0.9999998807907104, 0.9999998211860657, 0.9999998807907104, 0.937644362449646, 0.9999999403953552, 1.0, 0.9999998211860657], 'recall': [1.000000238418579, 0.9999998807907104, 0.9999998211860657, 0.9999998807907104, 0.978985071182251, 0.9999999403953552, 1.0, 0.9999998211860657], 'f1': [1.000000238418579, 0.9999998807907104, 0.9999998211860657, 0.9999998807907104, 0.9578688740730286, 0.9999999403953552, 1.0, 0.9999998211860657], 'hashcode': 'roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.51.1)'}

✅ Full Pipeline Passed: 8/14 = 57.14%

✅ Full Pipeline Correct Rate: 8/10 = 80.00%


In [81]:
print("\n📋 Inspection of Passed Samples with Predictions:\n" + "-"*50)
for i, sample in enumerate(passed_samples):
    print(f"\n🔁 Review: {sample['review']}")
    print(f"🏷️  Input Category: {sample['category']}")
    print(f"💬 Predicted Sentiment: {sentiment_preds[i]}")
    print(f"💬 Expected Sentiment:  {sentiment_labels[i]}")
    print(f"📝 Generated Reply:     {reply_preds[i]}")
    print(f"✅ Expected Reply:      {reply_labels[i]}")
    print("-" * 50)



📋 Inspection of Passed Samples with Predictions:
--------------------------------------------------

🔁 Review: Very satisfied with the product, it is really quite strong...
🏷️  Input Category: automotive
💬 Predicted Sentiment: positive
💬 Expected Sentiment:  positive
📝 Generated Reply:     That's great to hear! Strong and reliable products are always appreciated. Thanks for the positive note!
✅ Expected Reply:      That's great to hear! Strong and reliable products are always appreciated. Thanks for the positive note!
--------------------------------------------------

🔁 Review: I really liked I can take out my quiet dog
🏷️  Input Category: fashion
💬 Predicted Sentiment: positive
💬 Expected Sentiment:  positive
📝 Generated Reply:     That's great to hear! Thanks for the positive note and your creative use of the product.
✅ Expected Reply:      That's great to hear! Thanks for the positive note and your creative use of the product.
--------------------------------------------------

🔁 

# Full pipeline (Not Worked)

- The device can't load the 4 models in onece.
- It gives error on loading

In [None]:
# Global config
max_seq_length = 2048
dtype = None
load_in_4bit = True
from unsloth import FastLanguageModel
import torch
# Alpaca prompt template for reply generation
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# Load model on specific device
def load_unsloth_model(model_name, device):
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_name,
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
    )
    model.to(device)
    return model, tokenizer


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-04-15 17:58:42.146684: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744739922.562972      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744739922.684205      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
model_fake, tokenizer_fake = load_unsloth_model("AbuSalehMd/FakeReviewDetection_Mistral_7B_FineTuned", device="cuda:0")
model_category, tokenizer_category = load_unsloth_model("AbuSalehMd/ProductCategoryClassificationFinal_Mistral_7B_FineTuned", device="cuda:0")
model_sentiment, tokenizer_sentiment = load_unsloth_model("AbuSalehMd/Sentiment_Analysis_Mistral_7B_FineTuned", device="cuda:0")
model_reply, tokenizer_reply = load_unsloth_model("AbuSalehMd/Review_Response_Generation_Mistral_7B_FineTuned", device="cuda:0")


In [None]:
# ---- INFERENCE FUNCTIONS ---- #

def predict_fake_review(review, model, tokenizer):
    FastLanguageModel.for_inference(model)
    prompt = f"""
    Determine if the review enclosed in square brackets is real or fake based on its content.
    Return the answer as either "real" or "fake".

    [{review}] =
    """.strip()
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=1, temperature=0.0)
    answer = tokenizer.decode(outputs[0]).split("=")[-1].strip().lower()
    return "real" if "real" in answer else "fake" if "fake" in answer else "none"

def predict_category(review, model, tokenizer):
    FastLanguageModel.for_inference(model)
    prompt = f"""
    Determine the class if the review enclosed in square brackets is automotive or fashion or home or electronics or health category class based on its content.
    Return the answer as either "automotive" or "fashion" or "home" or "electronics" or "health".

    [{review}] =
    """.strip()
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=1, temperature=0.0)
    answer = tokenizer.decode(outputs[0]).split("=")[-1].strip().lower()

    if "autom" in answer:
        return "automotive"
    elif "fashion" in answer:
        return "fashion"
    elif "home" in answer:
        return "home"
    elif "electron" in answer:
        return "electronics"
    elif "health" in answer:
        return "health"
    else:
        return "none"

def predict_sentiment(review, model, tokenizer):
    FastLanguageModel.for_inference(model)
    prompt = f"""
    Determine if the review enclosed in square brackets is positive, neutral or negative based on its content.
    Return the answer as either "positive", "neutral" or "negative".

    [{review}] =
    """.strip()
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=1, temperature=0.0)
    answer = tokenizer.decode(outputs[0]).split("=")[-1].strip().lower()
    return answer if answer in ["positive", "neutral", "negative"] else "none"

def generate_review_reply(review, sentiment, category, model, tokenizer, prompt_template, device):
    FastLanguageModel.for_inference(model)
    prompt = prompt_template.format(
        "Generate a helpful and context-aware reply based on the review, sentiment, and category.",
        f"Review: {review}\nSentiment: {sentiment}\nCategory: {category}",
        ""
    )
    inputs = tokenizer([prompt], return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return decoded.split("### Response:")[-1].strip() if "### Response:" in decoded else decoded.strip()


In [None]:
# ---- MAIN PIPELINE ---- #

def full_pipeline(review, input_category):
    print(f"🔍 Review: {review}\n📦 Input Category: {input_category}")

    # Step 1: Fake Review Detection
    fake_result = predict_fake_review(review, model_fake, tokenizer_fake)
    print("🕵️ Fake Review Detection:", fake_result)

    if fake_result == "fake":
        return "❌ Detected as fake review."

    # Step 2: Category Classification
    category = predict_category(review, model_category, tokenizer_category)
    print("🏷️ Predicted Category:", category)

    if category != input_category.lower():
        return "⚠️ Irrelevant category."

    # Step 3: Sentiment Classification
    sentiment = predict_sentiment(review, model_sentiment, tokenizer_sentiment)
    print("💬 Sentiment:", sentiment)

    # Step 4: Generate Reply
    reply = generate_review_reply(review, sentiment, category, model_reply, tokenizer_reply, alpaca_prompt, device="cuda:0")
    print("✍️ Generated Reply:")
    return reply


In [None]:
# ---- EXAMPLE USAGE ---- #

review = "I do not understand the enthusiastic reviews on the basis of which I ordered this bag. Looks very cheap. As the skin of a young dermatine is said."
input_category = "health"

output = full_pipeline(review, input_category)
print(output)

🔍 Review: I do not understand the enthusiastic reviews on the basis of which I ordered this bag. Looks very cheap. As the skin of a young dermatine is said.
📦 Input Category: health
==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 14.12 MiB is free. Process 6397 has 14.72 GiB memory in use. Of the allocated memory 14.57 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)