# Personalized Shopping Assistant with Gemini 2.0 Flash & Gradio (Kaggle 2023 Amazon Dataset Version)

## Step 1: Upload Google Cloud Service Account Key

### 📦 Library Imports
This section imports all necessary libraries including LangChain, Google GenerativeAI, Pandas, and Gradio. These libraries are used for LLM access, data manipulation, and user interface development respectively.

In [1]:
from google.colab import files
import os

uploaded = files.upload()
service_account_path = list(uploaded.keys())[0]
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = service_account_path
print("Service account uploaded.")

Saving shopping-assistant-llm-26f4ad1d2bd4.json to shopping-assistant-llm-26f4ad1d2bd4.json
Service account uploaded.


## Step 2: Install Required Packages

### 🛠️ Install Required Packages

This cell ensures all required Python packages are installed in the Colab environment. It installs `google-cloud-aiplatform`, `faiss-cpu`, `sentence-transformers` for embeddings, `gradio` for UI, and `datasets` for handling data.

In [2]:
!pip install -q --upgrade google-cloud-aiplatform faiss-cpu sentence-transformers gradio datasets

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m129.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m77.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.2/54.2 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m323.3/323.3 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m123.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Step 3: Import Libraries

### 📦 Library Imports
This section imports all necessary libraries including LangChain, Google GenerativeAI, Pandas, and Gradio. These libraries are used for LLM access, data manipulation, and user interface development respectively.

In [3]:
from google.cloud import aiplatform
from vertexai.generative_models import GenerativeModel
from sentence_transformers import SentenceTransformer
import faiss
import pandas as pd
import numpy as np
import gradio as gr
import datetime
from datasets import load_dataset

## Step 4: Initialize Vertex AI

### ☁️ Vertex AI Initialization

Initializes Google Cloud Vertex AI using the specified project ID and region. This setup is essential for connecting to Gemini Flash model and other Vertex services.

In [4]:
aiplatform.init(project="shopping-assistant-llm", location="us-central1")
print("Vertex AI initialized.")

Vertex AI initialized.


## Step 5: Load Amazon Dataset from Kaggle

### 📦 Library Imports
This section imports all necessary libraries including LangChain, Google GenerativeAI, Pandas, and Gradio. These libraries are used for LLM access, data manipulation, and user interface development respectively.

In [5]:

# Step 1: Install kagglehub if not already installed
!pip install -q kagglehub

# Step 2: Import the package
import kagglehub
import os
import pandas as pd

# Step 3: Download the dataset
path = kagglehub.dataset_download("karkavelrajaj/amazon-sales-dataset")
print("Path to dataset files:", path)

# Step 4: List all files to identify the correct CSV file
print("Files in dataset folder:", os.listdir(path))

# Step 5: Read the CSV file (adjust file name if different)
csv_file = os.path.join(path, "amazon.csv")  # Replace with actual file name from Step 4 if needed
amazon_df = pd.read_csv(csv_file)

# Step 6: View a sample
print("📦 Dataset Preview:")
amazon_df.head()


Path to dataset files: /kaggle/input/amazon-sales-dataset
Files in dataset folder: ['amazon.csv']
📦 Dataset Preview:


Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


### 📏 Checking Dataset Dimensions

Displays the shape (rows, columns) of the loaded dataset to verify it was loaded correctly and contains sufficient data for analysis.

In [6]:
amazon_df.shape

(1465, 16)

### 🔍 Inspect Dataset Structure and Types

Uses `.info()` to summarize the dataset’s column names, types, and non-null counts. This helps validate the integrity of the raw product dataset.

In [7]:
amazon_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

### 🧾 Listing Column Names

Prints the original column names of the dataset to guide which fields are kept, renamed, or transformed during preprocessing.

In [8]:
amazon_df.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link'],
      dtype='object')

### 🧹 Data Cleaning and Column Renaming

Selects and retains relevant columns for recommendation logic, such as `product_name`, `category`, `about_product`, `review_content`, etc. Then renames them to simpler forms like `product`, `description`, and `price`. Also normalizes the category to lowercase and resets the index.

In [9]:
# Keep and rename relevant columns
amazon_df = amazon_df[['product_name', 'category', 'about_product', 'rating', 'actual_price', 'review_content']].dropna()
amazon_df.rename(columns={
    'product_name': 'product',
    'about_product': 'description',
    'average_rating': 'rating',
    'actual_price': 'price',
    'review_content': 'review'

}, inplace=True)

# Normalize category
amazon_df['category'] = amazon_df['category'].str.lower()
amazon_df.reset_index(drop=True, inplace=True)

### 🏷️ Unique Category Extraction

Displays all unique product categories in the dataset. This is useful to understand the scope of product domains available for personalization and filtering.

In [10]:
unique_values_category = amazon_df['category'].unique()
print(unique_values_category)

['computers&accessories|accessories&peripherals|cables&accessories|cables|usbcables'
 'computers&accessories|networkingdevices|networkadapters|wirelessusbadapters'
 'electronics|hometheater,tv&video|accessories|cables|hdmicables'
 'electronics|hometheater,tv&video|televisions|smarttelevisions'
 'electronics|hometheater,tv&video|accessories|remotecontrols'
 'electronics|hometheater,tv&video|televisions|standardtelevisions'
 'electronics|hometheater,tv&video|accessories|tvmounts,stands&turntables|tvwall&ceilingmounts'
 'electronics|hometheater,tv&video|accessories|cables|rcacables'
 'electronics|homeaudio|accessories|speakeraccessories|mounts'
 'electronics|hometheater,tv&video|accessories|cables|opticalcables'
 'electronics|hometheater,tv&video|projectors'
 'electronics|homeaudio|accessories|adapters'
 'electronics|hometheater,tv&video|satelliteequipment|satellitereceivers'
 'computers&accessories|accessories&peripherals|cables&accessories|cables|dvicables'
 'electronics|hometheater,tv&

##### ******************keeping the 1st string as category

### 🧪 Category Normalization (Handling Pipe Separator)

Cleans multi-valued categories by splitting at the `|` symbol and keeping the first value. Trims whitespace and prints cleaned unique categories to ensure consistency.

In [11]:
# Apply the split operation and take the first element
amazon_df['category'] = amazon_df['category'].str.split('|').str[0]

# Optional: Remove leading/trailing whitespace after splitting
amazon_df['category'] = amazon_df['category'].str.strip()

# Optional: Print unique values again to verify
print("Unique categories after splitting:")
print(amazon_df['category'].unique())

Unique categories after splitting:
['computers&accessories' 'electronics' 'musicalinstruments'
 'officeproducts' 'home&kitchen' 'homeimprovement' 'toys&games'
 'car&motorbike' 'health&personalcare']


##### ******************keeping the 1st string as category

### ✅ Defining Allowed Product Categories

Lists a fixed set of valid product categories for the shopping assistant. This helps avoid hallucinations and ensures recommendations stay within known bounds.

In [12]:
allowed_categories = ['computers&accessories','electronics','musicalinstruments',
 'officeproducts','home&kitchen','homeimprovement','toys&games','car&motorbike',
 'health&personalcare'
 ]

## Step 6: Initialize Sentence Transformer and Category Embeddings

### 🧠 Category Embedding & Similarity Match

Uses the `SentenceTransformer` model to embed allowed categories and compute cosine similarity with the query. This helps map vague or misspelled inputs to a known category using semantic similarity.

In [13]:
model = SentenceTransformer('all-MiniLM-L6-v2')
category_embeddings = model.encode(allowed_categories)

def closest_category(query):
    query_emb = model.encode([query])[0]
    scores = np.dot(category_embeddings, query_emb)
    best_index = int(np.argmax(scores))
    return allowed_categories[best_index]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Step 7: Simulate User Preferences Based on Purchase History

### 📋 Defining Sample User History

Creates a sample user preference DataFrame with previous purchases. Extracts and ranks preferred categories based on frequency to simulate personalization.

In [14]:
user_history_df = pd.DataFrame({
    "user_id": [1, 1, 1],
    "product": ["smartwatches", "juicers", "keyboards"],
    "category": ["electronics", "home&kitchen", "computers&accessories"]
})

preferred_categories = user_history_df['category'].value_counts().index.tolist()
preference_summary = ", ".join(preferred_categories)

## Step 8: FAISS Search with Embedding of Static Dataset

### 🔎 Product Similarity Search with FAISS

Encodes product text (title + description), builds a FAISS index, and retrieves top-K similar products based on the user query. This allows efficient vector-based retrieval.

In [15]:
def retrieve_similar_products(user_query, product_df, k=5):
    product_texts = (product_df['product'] + " " + product_df['description']).tolist()
    product_embeddings = model.encode(product_texts).astype("float32")

    dim = product_embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(product_embeddings)

    query_embedding = model.encode([user_query]).astype("float32")
    distances, indices = index.search(query_embedding, k)

    return product_df.iloc[indices[0]]

## Step 9: Gemini Response with Personalization + Category Check

### 🧠 Generate Gemini Flash Prompt with Context

Creates a prompt for Gemini using user preferences and retrieved products. The prompt is designed to guide the LLM in generating helpful, tailored shopping advice.

In [16]:
gemini = GenerativeModel(model_name="gemini-2.0-flash-lite-001")

def generate_gemini_flash_response(user_query, retrieved_df):
    product_info = "\n".join([
        f"- {row['product']} ({row['category']}): ${row['price']}, Rating: {row.get('rating', 'NA')}"
        for _, row in retrieved_df.iterrows()
    ])

    user_profile_context = f"User has shown interest in: {preference_summary}"

    prompt = f"""
You are a helpful shopping assistant. Use the user's past preferences to guide your recommendations.

{user_profile_context}

User Query: "{user_query}"

Matching Products:
{product_info}

Please recommend the best product options with brief reasoning.
"""

    response = gemini.generate_content(prompt)
    return response.text

## Step 10: Gradio Web App with Feedback Collection for Evaluation

### 💬 Assistant Interface Logic
This function handles the full query-response loop: it parses input, fetches products, invokes Gemini for reasoning, logs evaluation metrics like precision@k and NDCG, and returns the final reply and scores.

This function shopping_assistant_interface() serves as the backend logic for the Gradio-powered shopping assistant. It takes a user query and optionally feedback, then performs several critical steps. First, it uses semantic category detection via closest_category() to map the query to a known product category. It filters the dataset accordingly and retrieves similar products using FAISS-based semantic vector search, which compares dense embeddings of the query and product descriptions.

Next, it generates a response using Gemini Flash, summarizing and recommending items based on both the query and filtered products. The model output is parsed using regex and fuzzy matching (get_close_matches) to map generated suggestions back to actual product titles from the dataset. The final matched titles are stored for evaluation and feedback logging. If user feedback is provided, it’s appended to a CSV log with timestamp and query metadata. The Gradio interface then presents the matched product table, Gemini’s recommendation text, and (optionally) the internal list of matched product titles.

In [17]:

def shopping_assistant_interface(user_query, feedback=None):
    try:
        if not user_query or not user_query.strip():
            return pd.DataFrame(), "❗ Please enter a valid product query."

        # Assuming closest_category, retrieve_similar_products, generate_gemini_flash_response are defined elsewhere
        category = closest_category(user_query)
        print(f"Determined category: {category}") # Diagnostic print

        # Check if the determined category exists in the dataframe
        # This check might be redundant if closest_category only returns from allowed_categories,
        # but keeping it for safety if allowed_categories is a subset of df categories.
        if category not in amazon_df['category'].unique():
             print(f"Determined category '{category}' not found in amazon_df['category']. Unique categories in df: {amazon_df['category'].unique()}")

        # Filtering the dataframe based on category determined by semantic search
        filtered_df = amazon_df[amazon_df['category'].str.contains(category, case=False, na=False)].copy()

        if filtered_df.empty:
            print(f"Filtering by category '{category}' resulted in an empty DataFrame.") # Diagnostic print
            return pd.DataFrame(), f"❗ No products found in category '{category}'."

        # Retrieve similar products using FAISS on the filtered dataframe
        retrieved_df = retrieve_similar_products(user_query, filtered_df)

        if retrieved_df.empty:
            return pd.DataFrame(), "❗ No similar products found."

        # Generate Gemini response based on retrieved products and user preference
        recommendation_text = generate_gemini_flash_response(user_query, retrieved_df)

        # --- Start: Extracting product titles from Gemini's output and matching ---
        recommended_product_titles_list = []
        if isinstance(recommendation_text, str):
            # Attempt to find titles formatted like **Title**
            titles = re.findall(r"\*\*(.*?)\*\*", recommendation_text)
            if not titles:
                 # If not found, attempt to find titles formatted like * Title: or • Title:
                 titles = re.findall(r"[*•]\s+(.*?)(?::|\n)", recommendation_text)

            available_titles = amazon_df['product'].dropna().unique().tolist()

            for title in titles:
                # Use get_close_matches to handle potential variations in Gemini's output
                close_matches = get_close_matches(title, available_titles, n=1, cutoff=0.6)
                if close_matches:
                    # Store the matched title in the desired format for later use
                    recommended_product_titles_list.append({'title': close_matches[0]})

            if recommended_product_titles_list:
                 print("✅ Matched Gemini output to actual product titles.")
            else:
                 print("⚠️ Could not match Gemini output to actual product titles.")

        # --- End: Extracting product titles ---

        # Prepare the table output for Gradio
        table = retrieved_df[['product', 'category', 'price', 'rating']]

        # Log feedback (optional based on user interaction)
        if feedback:
            log_entry = pd.DataFrame([{
                "timestamp": datetime.datetime.now().isoformat(),
                "query": user_query,
                "category": category,
                "feedback": feedback,
                "recommendation": recommendation_text,
                "matched_recommended_products": str(recommended_product_titles_list) # Log the matched titles
            }])
            # Make sure the feedback log file exists or create it with headers the first time
            if not os.path.exists("recommendation_feedback_log.csv"):
                log_entry.to_csv("recommendation_feedback_log.csv", mode="a", index=False, header=True)
            else:
                 log_entry.to_csv("recommendation_feedback_log.csv", mode="a", index=False, header=False)

        # Return the table and the full recommendation text
        return table, recommendation_text, recommended_product_titles_list # Also return the processed list of recommended products

    except Exception as e:
        return pd.DataFrame(), f"⚠️ Error:\n{traceback.format_exc()}", [] # Return empty list for recommended products on error


iface = gr.Interface(
    fn=shopping_assistant_interface,
    inputs=[
        gr.Textbox(lines=2, placeholder="e.g. Looking for affordable fitness smartwatch"),
        gr.Radio(["Very helpful", "Helpful", "Not helpful"], label="Was this recommendation helpful?")
    ],
    outputs=[
        gr.Dataframe(label="Top Matching Products"),
        gr.Textbox(label="Gemini's Recommendation"),
        #gr.Variable(label="Matched Recommended Product Titles") # Add an output to capture the processed list
    ],
    title="🛍️ Personalized Shopping Assistant",
    description="Enter your product preference and get a Gemini-powered recommendation in real-time."
)

iface.launch(share=True)

# The evaluation code in the subsequent cell should now work
# because 'recommended_product_titles_list' will be captured
# by the Gradio interface and available in the global scope
# after a successful interface call.

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://afedaa7ff388d0a9b8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




##### STEP 11 : This code powers a personalized shopping assistant by combining multiple types of similarity matching to enhance product recommendations. It uses semantic similarity via SentenceTransformer to embed product titles and descriptions, enabling FAISS to perform efficient vector-based nearest-neighbor search. This helps identify conceptually similar products even if exact words don't match. The assistant also applies fuzzy string matching using difflib.get_close_matches() to align the Gemini model's free-text outputs with real product titles from the dataset.

A key part of the personalization logic involves the use of user-preferred categories, which are inferred from the user’s past purchases. These categories form the ground truth for evaluating how relevant the recommended products are. The evaluation step compares the top-k Gemini-suggested products against those preferred categories using metrics like Precision@k, NDCG@k, and Hit@k, providing quantitative feedback on recommendation quality.


In [20]:
# 🧪 Evaluation Metrics for Gemini Recommendations (Safe & Aligned with Notebook)

import numpy as np
import re # Ensure re and difflib are imported for the evaluation cell
from difflib import get_close_matches
import traceback # Import traceback for the error handling in the function

# --- Metric functions ---
def precision_at_k(y_true, y_pred, k):
    relevant_items = set(y_true)
    retrieved_items = y_pred[:k]
    return len(set(retrieved_items) & relevant_items) / k if retrieved_items else 0.0

def ndcg_at_k(y_true, y_pred, k):
    dcg = sum(1 / np.log2(i + 2) for i, item in enumerate(y_pred[:k]) if item in y_true)
    ideal_dcg = sum(1 / np.log2(i + 2) for i in range(min(len(y_true), k)))
    return dcg / ideal_dcg if ideal_dcg > 0 else 0.0

def hit_at_k(y_true, y_pred, k):
    return int(any(item in y_true for item in y_pred[:k]))

# --- Gradio Interface Function (Keep the return values) ---
def shopping_assistant_interface(user_query, feedback=None):
    try:
        if not user_query or not user_query.strip():
            # Return empty outputs including the list on invalid query
            return pd.DataFrame(), "❗ Please enter a valid product query.", []

        category = closest_category(user_query)
        print(f"Determined category: {category}")

        # Check if the determined category exists in the dataframe
        if category not in amazon_df['category'].unique():
             print(f"Determined category '{category}' not found in amazon_df['category']. Unique categories in df: {amazon_df['category'].unique()}")

        filtered_df = amazon_df[amazon_df['category'].str.contains(category, case=False, na=False)].copy()

        if filtered_df.empty:
            # Return empty outputs including the list if no products found in category
            print(f"Filtering by category '{category}' resulted in an empty DataFrame.")
            return pd.DataFrame(), f"❗ No products found in category '{category}'.", []

        retrieved_df = retrieve_similar_products(user_query, filtered_df)

        if retrieved_df.empty:
            # Return empty outputs including the list if no similar products found
            return pd.DataFrame(), "❗ No similar products found.", []

        recommendation_text = generate_gemini_flash_response(user_query, retrieved_df)

        recommended_product_titles_list = []
        if isinstance(recommendation_text, str):
            titles = re.findall(r"\*\*(.*?)\*\*", recommendation_text)
            if not titles:
                 titles = re.findall(r"[*•]\s+(.*?)(?::|\n)", recommendation_text)

            available_titles = amazon_df['product'].dropna().unique().tolist()

            for title in titles:
                close_matches = get_close_matches(title, available_titles, n=1, cutoff=0.6)
                if close_matches:
                    recommended_product_titles_list.append({'title': close_matches[0]})

            if recommended_product_titles_list:
                 print("✅ Matched Gemini output to actual product titles.")
            else:
                 print("⚠️ Could not match Gemini output to actual product titles.")

        table = retrieved_df[['product', 'category', 'price', 'rating']]

        # Log feedback (optional based on user interaction)
        if feedback:
            log_entry = pd.DataFrame([{
                "timestamp": datetime.datetime.now().isoformat(),
                "query": user_query,
                "category": category,
                "feedback": feedback,
                "recommendation": recommendation_text,
                "matched_recommended_products": str(recommended_product_titles_list)
            }])
            if not os.path.exists("recommendation_feedback_log.csv"):
                log_entry.to_csv("recommendation_feedback_log.csv", mode="a", index=False, header=True)
            else:
                 log_entry.to_csv("recommendation_feedback_log.csv", mode="a", index=False, header=False)

        # Return all three outputs: table, text, and the list
        return table, recommendation_text, recommended_product_titles_list

    except Exception as e:
        # Return empty outputs including the list on error
        return pd.DataFrame(), f"⚠️ Error:\n{traceback.format_exc()}", []

# --- Gradio Interface Definition (Remove gr.Variable output) ---
# NOTE: This interface definition remains the same as in the original code cell
# where it's defined and launched. We are just illustrating the corrected definition.
# You should apply this change to the Gradio interface definition block in your notebook.
"""
iface = gr.Interface(
    fn=shopping_assistant_interface,
    inputs=[
        gr.Textbox(lines=2, placeholder="e.g. Looking for affordable fitness smartwatch"),
        gr.Radio(["Very helpful", "Helpful", "Not helpful"], label="Was this recommendation helpful?")
    ],
    outputs=[
        gr.Dataframe(label="Top Matching Products"),
        gr.Textbox(label="Gemini's Recommendation")
        # Removed gr.Variable as it's not a valid component for display
    ],
    title="🛍️ Personalized Shopping Assistant",
    description="Enter your product preference and get a Gemini-powered recommendation in real-time."
)

# iface.launch(share=True) # Keep the launch command in your notebook
"""

# --- Evaluation Code (Call the function directly to get the data) ---
# Instead of relying on Gradio to make a variable available globally,
# call the function directly with a sample query to populate the variables needed for evaluation.
# You can use a sample query that you'd typically use in the Gradio interface.

try:
    # Define a sample query for evaluation
    sample_query = "Looking for a kitchen juicers below ₹10,000 " # Or any other relevant query
    #sample_query = "Looking for an affordable adapter " # Or any other relevant query
    #sample_query = "Looking for a keyboard " # Or any other relevant query

    # Call the shopping assistant function directly to get the outputs
    # The function now returns three values: table_df, recommendation_text, and the list of titles
    _, _, recommended_product_titles_list = shopping_assistant_interface(sample_query)

    # Check required variables populated by the function call
    if 'amazon_df' not in globals():
        raise ValueError("❗ 'amazon_df' is missing.")

    if 'preferred_categories' not in globals():
        raise ValueError("❗ 'preferred_categories' (user interest) not defined.")

    # Ensure the variable holding the processed recommended titles is populated and not empty
    if not recommended_product_titles_list:
        raise ValueError(f"❗ 'recommended_product_titles_list' is empty after calling the function with query '{sample_query}'. Check the function's output for this query.")

    # Build relevance ground truth: products matching user's preferred categories
    true_relevant_products = amazon_df[
        amazon_df['category'].isin(preferred_categories)
    ]['product'].dropna().unique().tolist()

    # Get titles from the processed list
    retrieved_products = [p['title'] for p in recommended_product_titles_list if 'title' in p]

    k = 5
    print("\n🔍 Evaluation Results Based on Gemini Recommendations:")
    print(f"Precision@{k}: {precision_at_k(true_relevant_products, retrieved_products, k):.2f}")
    print(f"NDCG@{k}: {ndcg_at_k(true_relevant_products, retrieved_products, k):.2f}")
    print(f"Hit@{k}: {hit_at_k(true_relevant_products, retrieved_products, k)}")

    print("\n📝 Explanation:")
    print("- Precision@k: % of top-k recommended products that match user's interests.")
    print("- NDCG@k: Rewards highly ranked relevant products more.")
    print("- Hit@k: 1 if any relevant product is found in top-k, else 0.")

except Exception as e:
    print(f"❗ Evaluation could not be performed: {e}")

Determined category: home&kitchen
✅ Matched Gemini output to actual product titles.

🔍 Evaluation Results Based on Gemini Recommendations:
Precision@5: 0.60
NDCG@5: 0.72
Hit@5: 1

📝 Explanation:
- Precision@k: % of top-k recommended products that match user's interests.
- NDCG@k: Rewards highly ranked relevant products more.
- Hit@k: 1 if any relevant product is found in top-k, else 0.


### 🧪 Debug: Filter Products by Keyword (Optional)

A temporary debug section to test filtering of rows containing specific keywords (e.g., “juicer”) in the product name. This helps inspect dataset content manually.

In [19]:
#amazon_df.columns
#amazon_df.tail()

# Example: Filter rows where 'product' column contains the word "juicer"
#filtered_dft = amazon_df[amazon_df['product'].str.contains("juicer", case=False, na=False)]

# Print the matching rows
#print(filtered_dft)