<a href="https://colab.research.google.com/github/lizzzb/Implicit-feedback_Amazon-Product-Data/blob/main/ImplicitFeedbackAmazon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Steps:

1. **Load and Explore the Dataset**: Load the Amazon Product Data and focus on implicit signals like purchases, clicks, or views.

2. **Preprocessing and Feature Engineering**: Transform the data into a user-item interaction matrix, focusing on implicit interactions like purchase counts or clicks.

3. **Choosing a Machine Learning Model**:
   - Collaborative Filtering (e.g., Matrix Factorization).
   - Alternating Least Squares (ALS) – commonly used for implicit feedback.
   - Neural Collaborative Filtering (NCF).

4. **Evaluation Metrics**: Use metrics like AUC, Precision@k, or Recall@k to evaluate how well the model performs in recommending relevant products to users based on implicit feedback.

5. **Tuning and Improving the Model**: Experiment with different hyperparameters and models to improve recommendation quality.

https://snap.stanford.edu/data/index.html

In [10]:
import pandas as pd
from scipy.sparse import csr_matrix

ModuleNotFoundError: No module named 'implicit'

In [2]:
url = "http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz"
amazon_data = pd.read_json(url, lines=True)

# Explore the data
amazon_data.head(10)

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,AO94DHGC771SJ,528881469,amazdnu,"[0, 0]",We got this GPS for my husband who is an (OTR)...,5,Gotta have GPS!,1370131200,"06 2, 2013"
1,AMO214LNFCEI4,528881469,Amazon Customer,"[12, 15]","I'm a professional OTR truck driver, and I bou...",1,Very Disappointed,1290643200,"11 25, 2010"
2,A3N7T0DY83Y4IG,528881469,C. A. Freeman,"[43, 45]","Well, what can I say. I've had this unit in m...",3,1st impression,1283990400,"09 9, 2010"
3,A1H8PY3QHMQQA0,528881469,"Dave M. Shaw ""mack dave""","[9, 10]","Not going to write a long review, even thought...",2,"Great grafics, POOR GPS",1290556800,"11 24, 2010"
4,A24EV6RXELQZ63,528881469,Wayne Smith,"[0, 0]",I've had mine for a year and here's what we go...,1,"Major issues, only excuses for support",1317254400,"09 29, 2011"
5,A2JXAZZI9PHK9Z,594451647,"Billy G. Noland ""Bill Noland""","[3, 3]",I am using this with a Nook HD+. It works as d...,5,HDMI Nook adapter cable,1388707200,"01 3, 2014"
6,A2P5U7BDKKT7FW,594451647,Christian,"[0, 0]",The cable is very wobbly and sometimes disconn...,2,Cheap proprietary scam,1398556800,"04 27, 2014"
7,AAZ084UMH8VZ2,594451647,"D. L. Brown ""A Knower Of Good Things""","[0, 0]",This adaptor is real easy to setup and use rig...,5,A Perfdect Nook HD+ hook up,1399161600,"05 4, 2014"
8,AEZ3CR6BKIROJ,594451647,Mark Dietter,"[0, 0]",This adapter easily connects my Nook HD 7&#34;...,4,A nice easy to use accessory.,1405036800,"07 11, 2014"
9,A3BY5KCNQZXV5U,594451647,Matenai,"[3, 3]",This product really works great but I found th...,5,This works great but read the details...,1390176000,"01 20, 2014"


`asin` stands for Amazon Standard Identification Number. It's a unique identifier for each product on Amazon.

`overall` is the rating that a reviewer gave to a product.

Implicit Feedback in the Amazon Product Data:

We can treat implicit actions like:

    Purchases (whether a user bought a product or not).
    Clicks (whether a user viewed a product page).
    Cart additions (whether a user added a product to their cart).

Updated Approach for Implicit Feedback:

1. Focus on Purchases: Instead of ratings, we will consider a purchase (or a product review) as implicit feedback. If a user has purchased or interacted with a product, we assign a positive interaction, and if not, we assume no interaction.

2. Binary Interaction Matrix: We’ll create a binary matrix where:
        1 indicates that the user interacted with (purchased or viewed) the product.
        0 indicates no interaction.

3. Train an Implicit Recommender System: We will use ALS (Alternating Least Squares) from the implicit library, which is designed to handle implicit feedback.

In [3]:
# Keep only relevant columns
implicit_data = amazon_data[['reviewerID', 'asin', 'overall']].copy()
implicit_data.head(10)


Unnamed: 0,reviewerID,asin,overall
0,AO94DHGC771SJ,528881469,5
1,AMO214LNFCEI4,528881469,1
2,A3N7T0DY83Y4IG,528881469,3
3,A1H8PY3QHMQQA0,528881469,2
4,A24EV6RXELQZ63,528881469,1
5,A2JXAZZI9PHK9Z,594451647,5
6,A2P5U7BDKKT7FW,594451647,2
7,AAZ084UMH8VZ2,594451647,5
8,AEZ3CR6BKIROJ,594451647,4
9,A3BY5KCNQZXV5U,594451647,5


That line of code adds a new column named interaction to the implicit_data DataFrame and sets all values in this column to 1.

This is a common technique in implicit feedback datasets. Since you don't have explicit ratings or likes, you can interpret the presence of a review as an interaction between the user (reviewerID) and the product (asin). The value 1 simply signifies that there was an interaction.

This new column can be used for building a recommendation system based on implicit feedback.

In [8]:
# Create a binary matrix: 1 if a user purchased a product, 0 otherwise
# We'll use the `overall` column (rating) as a proxy for purchase
implicit_data['interaction'] = 1  # Treat any purchase (i.e., review) as implicit feedback

# Drop duplicates (if a user purchased/reviewed the same product multiple times)
implicit_data = implicit_data.drop_duplicates()

# Subset the data: Choose a smaller number of users and products
# Let's say we want to work with only 500 users and 500 products
subset_users = implicit_data['reviewerID'].unique()[:500]  # Select the first 500 unique users
subset_products = implicit_data['asin'].unique()[:500]     # Select the first 500 unique products

# Filter the dataset to include only these users and products
subset_data = implicit_data[(implicit_data['reviewerID'].isin(subset_users)) &
                            (implicit_data['asin'].isin(subset_products))]

# Check the subset size
print(f"Subset Data Shape: {subset_data.shape}")
subset_data.head(10)


Subset Data Shape: (566, 4)


Unnamed: 0,reviewerID,asin,overall,interaction
0,AO94DHGC771SJ,528881469,5,1
1,AMO214LNFCEI4,528881469,1,1
2,A3N7T0DY83Y4IG,528881469,3,1
3,A1H8PY3QHMQQA0,528881469,2,1
4,A24EV6RXELQZ63,528881469,1,1
5,A2JXAZZI9PHK9Z,594451647,5,1
6,A2P5U7BDKKT7FW,594451647,2,1
7,AAZ084UMH8VZ2,594451647,5,1
8,AEZ3CR6BKIROJ,594451647,4,1
9,A3BY5KCNQZXV5U,594451647,5,1


In [9]:
# Create a user-item interaction matrix (users as rows, items as columns)
user_item_matrix = subset_data.pivot_table(index='reviewerID', columns='asin', values='interaction', fill_value=0)

# Check the matrix dimensions and a few rows
print(f"User-Item Matrix Shape: {user_item_matrix.shape}")
user_item_matrix.head()

User-Item Matrix Shape: (500, 46)


asin,0528881469,0594451647,0594481813,0972683275,1400501466,1400501520,1400501776,1400532620,1400532655,140053271X,...,B00004VX3T,B00004W3ZQ,B00004WCGF,B00004WHFL,B00004WLJ8,B00004X0ZO,B00004X107,B00004X10C,B00004Y2MM,B00004YKDQ
reviewerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A102RLOGIBBDMW,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A1038957GWRBP375RU5T,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A1089S59XSJT2T,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A10BOETDPAFJ4C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A10S9NK38WEQ65,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Train the ALS Model on the Subset

In [11]:
pip install implicit

Collecting implicit
  Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl.metadata (6.1 kB)
Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl (8.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.9/8.9 MB[0m [31m41.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: implicit
Successfully installed implicit-0.7.2


In [12]:
import implicit
import scipy.sparse as sparse
from sklearn.metrics.pairwise import cosine_similarity

In [14]:
# Convert the user-item matrix to a sparse format (this helps in handling large datasets efficiently)
user_item_sparse = sparse.csr_matrix(user_item_matrix.values)

# Initialize the ALS model for implicit feedback
als_model = implicit.als.AlternatingLeastSquares(factors=20, regularization=0.1, iterations=50)

# ALS model works with confidence scores; we scale the interactions using a confidence factor
alpha = 15
als_model.fit(alpha * user_item_sparse)

  check_blas_config()


  0%|          | 0/50 [00:00<?, ?it/s]

In [20]:
# Example: Recommend products for a specific user from the subset
# You need the user's index in the matrix, so we'll find a user from the subset
user_id = subset_users[0]  # Pick the first user in the subset
user_index = list(user_item_matrix.index).index(user_id)  # Get the index of this user

# Get top 5 product recommendations for the user
recommendations = als_model.recommend(user_index, user_item_sparse[user_index], N=5)
print(f"Top 5 recommendations for user {user_id}:")
recommendations


Top 5 recommendations for user AO94DHGC771SJ:


(array([17, 39, 37, 40, 26], dtype=int32),
 array([0.17902419, 0.11556099, 0.08689363, 0.08689357, 0.08432439],
       dtype=float32))