# Phone Recommendation System

## Introduction
This notebook demonstrates a machine learning project to build a phone recommendation system using a dataset of items and reviews. We will go through the steps of loading the data, preprocessing, building the recommendation system, and evaluating the results.

### Import Required Libraries

In [56]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

### Load the Dataset

In [57]:
# Load the CSV files
items = pd.read_csv("items.csv")
reviews = pd.read_csv("reviews.csv")

# Display the first few rows of the data
print("Shape of Items Data:", items.shape)
print("Items Data Sample:")
print(items.head())

print("\nShape of Reviews Data:", reviews.shape)
print("\nReviews Data Sample:")
print(reviews.head())



## Data Preprocessing
Before we build the recommendation system, we need to preprocess our data to ensure it is in the right format.

In [58]:
# Fill NaN brands with "Unknown"
print("Before dropping Nan Items: ", items.shape)
#items["brand"].fillna("Unknown", inplace=True)
items = items.dropna(subset=["brand"])
print("After dropping Nan Items: ", items.shape)



### Combine Text Features

In [59]:
# Combine brand and title for better text representation
items["combined_text"] = items["brand"] + " " + items["title"]
items



## Feature Extraction
We will use TF-IDF Vectorization to convert the text data into numerical features.

### Apply TF-IDF Vectorization

In [60]:
# Apply TF-IDF Vectorization
vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(items["combined_text"])

### Experiment with Different Feature Extraction Techniques

In [61]:
# Experiment with different feature extraction techniques
from sklearn.feature_extraction.text import CountVectorizer

# Using Count Vectorizer
count_vectorizer = CountVectorizer(stop_words="english")
count_matrix = count_vectorizer.fit_transform(items["combined_text"])

# Using TF-IDF Vectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf_vectorizer.fit_transform(items["combined_text"])

# Choose the feature extraction method (uncomment one)
# feature_matrix = count_matrix
feature_matrix = tfidf_matrix

### Compute Cosine Similarity
We will compute the cosine similarity between the feature vectors to measure the similarity between items.

In [62]:
# Compute Cosine Similarity
cosine_sim = cosine_similarity(feature_matrix, feature_matrix)

## Build the Recommendation System
We will build a content-based recommendation system using the cosine similarity matrix.

### Create a Mapping of ASINs to Indices

In [63]:
# Create a mapping of phone ASINs to indices
asin_to_index = {asin: idx for idx, asin in enumerate(items["asin"])}

### Function to Get Recommendations


In [64]:
# Function to get recommendations based on ASIN
def recommend_phones(asin, top_n=5):
    if asin not in asin_to_index:
        return "ASIN not found."
    
    idx = asin_to_index[asin]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
    
    phone_indices = [i[0] for i in sim_scores]
    return items.iloc[phone_indices][["asin", "brand", "title", "rating"]]

## Hybrid Recommendation System
We will combine content-based and collaborative filtering methods.

In [65]:
# Rename 'name' column to 'user_id'
reviews.rename(columns={'name': 'user_id'}, inplace=True)

# Collaborative Filtering using user-item interactions
from sklearn.decomposition import TruncatedSVD

# Create a user-item interaction matrix
user_item_matrix = reviews.pivot(index='user_id', columns='asin', values='rating').fillna(0)

# Apply SVD for collaborative filtering
svd = TruncatedSVD(n_components=50, random_state=42)
user_factors = svd.fit_transform(user_item_matrix)
item_factors = svd.components_.T

# Function to get hybrid recommendations
def hybrid_recommend_phones(user_id, asin, top_n=5):
    if asin not in asin_to_index:
        return "ASIN not found."
    
    # Content-based recommendations
    idx = asin_to_index[asin]
    content_sim_scores = list(enumerate(cosine_sim[idx]))
    content_sim_scores = sorted(content_sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
    content_phone_indices = [i[0] for i in content_sim_scores]
    
    # Collaborative filtering recommendations
    user_idx = user_item_matrix.index.get_loc(user_id)
    user_ratings = user_factors[user_idx].dot(item_factors.T)
    collab_sim_scores = list(enumerate(user_ratings))
    collab_sim_scores = sorted(collab_sim_scores, key=lambda x: x[1], reverse=True)[:top_n]
    collab_phone_indices = [i[0] for i in collab_sim_scores]
    
    # Combine recommendations
    combined_indices = list(set(content_phone_indices + collab_phone_indices))
    return items.iloc[combined_indices][["asin", "brand", "title", "rating"]]



### Incorporate User Feedback
We will incorporate user feedback to improve recommendations.

In [None]:
# Function to incorporate user feedback
def incorporate_feedback(user_id, asin, feedback):
    if asin not in asin_to_index:
        return "ASIN not found."
    
    # Update user-item interaction matrix with feedback
    user_item_matrix.at[user_id, asin] = feedback
    
    # Recompute SVD
    user_factors = svd.fit_transform(user_item_matrix)
    item_factors = svd.components_.T
    
    return "Feedback incorporated successfully."

In [None]:
# Example usage
asin_to_recommend = "B0009N5L7K"  # Example ASIN
recommendations = recommend_phones(asin_to_recommend, top_n=5)
print(recommendations)



In [None]:
# Example usage of hybrid recommendation system
user_id_to_recommend = "A3SGXH7AUHU8GW"  # Example user ID
asin_to_recommend = "B0009N5L7K"  # Example ASIN
hybrid_recommendations = hybrid_recommend_phones(user_id_to_recommend, asin_to_recommend, top_n=5)
print(hybrid_recommendations)

# Example usage of incorporating user feedback
feedback = 5  # Example feedback rating
feedback_result = incorporate_feedback(user_id_to_recommend, asin_to_recommend, feedback)
print(feedback_result)

In [None]:
# Define a dictionary with relevant ASINs for each ASIN (this is just an example)
relevant_asins = {
    "B0009N5L7K": ["B0009N5L7L", "B0009N5L7M", "B0009N5L7N", "B0009N5L7O", "B0009N5L7P"],
    # Add more ASINs and their relevant ASINs here
}

# Function to calculate precision
def precision(recommended_asins, relevant_asins):
    relevant_and_recommended = set(recommended_asins).intersection(set(relevant_asins))
    return len(relevant_and_recommended) / len(recommended_asins)

# Function to calculate recall
def recall(recommended_asins, relevant_asins):
    relevant_and_recommended = set(recommended_asins).intersection(set(relevant_asins))
    return len(relevant_and_recommended) / len(relevant_asins)

# Function to evaluate recommendations
def evaluate_recommendations(asin, top_n=5):
    if asin not in relevant_asins:
        return "No relevant ASINs found for evaluation."
    
    recommendations = recommend_phones(asin, top_n)
    recommended_asins = recommendations["asin"].tolist()
    relevant_asins_list = relevant_asins[asin]
    
    prec = precision(recommended_asins, relevant_asins_list)
    rec = recall(recommended_asins, relevant_asins_list)
    
    return {"precision": prec, "recall": rec}

# Example usage
asin_to_evaluate = "B0009N5L7K"  # Example ASIN
evaluation_results = evaluate_recommendations(asin_to_evaluate, top_n=5)
print(evaluation_results)



In [69]:
# Load datasets (update paths accordingly)
items_df = pd.read_csv("items.csv")
reviews_df = pd.read_csv("reviews.csv")

items_df.isnull().sum()
reviews_df.isnull().sum()

