



---

# **MARKET-BASKET ANALYSIS**

## Massive Algorithm - Data Science for Economics

**Angelica Longo, Melissa Rizzi**

The goal of this project is to implement a system for **detecting frequent itemsets**, commonly known as **market-basket analysis**.
In this notebook, the detector treats each user’s reviewed books as a basket, with books serving as items.

The project is based on the **[Amazon Books Review](https://www.kaggle.com/datasets/mohamedbakhet/amazon-books-reviews)** dataset, published on Kaggle under the public domain CC0 license. Data is downloaded during the execution of the scripts via an API and contains variables related to users and their reviews of purchased books.

Given the large volume of data (3 million rows), a reasonable subsample is created using **PySpark**, consisting of approximately 500'000 rows, while ensuring scalability for the full dataset.

The project is structured as follows:

- **Preprocessing** – This phase includes data cleaning, checking data integrity, handling null values, removing duplicates, and computing the overall mean to verify consistency with the selected subsample.
- **Subsampling** – A subset of data is created while maintaining a representative distribution of user choices and ratings.
- **Frequent Itemset Mining** – The final step involves implementing an algorithm to identify frequent itemsets within the dataset.

This structured approach ensures both **efficiency** and **scalability** while maintaining **data integrity**.

## **Table of Contents**
- [1. Data Import](#1-Data-Import)
- [2. Data PreProcessing](#2-data-preprocessing)
  - [2.1 Data Integrity](#21-data-integrity)
  - [2.2 Missing Data](#22-missing-data)
  - [2.3 Data Duplicates](#22-data-duplicates)
  - [2.4 Rating Means](#22-rating-means)
- [3. Subsample Creation](#3-subsample-creation)
- [4. Frequent Itemset Mining](#4-frequent-itemset-mining)


---
## **1. Data Import**

In [1]:
#import os
#import zipfile

In [2]:
#os.environ['KAGGLE_USERNAME'] = "melissarizzi"
#os.environ['KAGGLE_KEY'] = "3ed913e7329a3117a254e67179c0f8bb"

In [3]:
#!kaggle datasets download --unzip mohamedbakhet/amazon-books-reviews

---
## **2. Data PreProcessing**

In [4]:
# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, min, max, sum, when, collect_set,count
from pyspark.sql.types import DoubleType
from pyspark.ml.fpm import FPGrowth
import pyspark.sql.functions as F

In [5]:
# Create Spark Session
spark = SparkSession.builder.appName("MapReduce").getOrCreate()

In [6]:
# Import data
data = spark.read.csv("books_rating.csv", header=True, inferSchema=True)
#data.show(5)

In [7]:
data.count()

3000000

In [8]:
#Select only useful columns
df = data.select("Id", 'Title', "User_id", "review/score").withColumnRenamed("review/score", "score")
#df.show(5)

### **2.1 Data Integrity**

In [9]:
df.printSchema()

root
 |-- Id: string (nullable = true)
 |-- Title: string (nullable = true)
 |-- User_id: string (nullable = true)
 |-- score: string (nullable = true)



In [10]:
# Transform 'score' variable in double type
df = df.withColumn("score", col("score").cast(DoubleType()))
df.printSchema()

root
 |-- Id: string (nullable = true)
 |-- Title: string (nullable = true)
 |-- User_id: string (nullable = true)
 |-- score: double (nullable = true)



In [11]:
# Check score range
df.select(min(col("score")).alias("min_score"), max(col("score")).alias("max_score")).show()

+---------+----------+
|min_score| max_score|
+---------+----------+
|      1.0|1.295568E9|
+---------+----------+



In [12]:
# Keep just data with the 'score' values in the correct range [1, 5]
df = df.filter((col("score") >= 1) & (col("score") <= 5))
df.select(min("score").alias("min_score"), max("score").alias("max_score")).show()

+---------+---------+
|min_score|max_score|
+---------+---------+
|      1.0|      5.0|
+---------+---------+



### **2.2 Missing Data**

In [13]:
# Count null values for each variable
null_counts = df.select(
    [sum(when(col(c).isNull(), 1).otherwise(0)).alias(c) for c in df.columns]
)

#null_counts.show()

What stands out right away, especially for the purpose of our analysis, is that there are many missing values in the User_id variable. One possible reason for this could be that users who leave reviews but are not registered don’t have a user ID. Our goal is to identify baskets of items purchased by the same users, but without the user ID, this analysis cannot be conducted. We explored the possibility of using profile names instead, by assigning a dummy ID to users with the same name. However, we were aware that this might not provide accurate results due to potential name duplication. Moreover, there were more missing profile names than missing user IDs, which made this solution unfeasible. After considering our options, we ultimately decided to **drop the missing values**, as we couldn’t identify a suitable method to replace them.

In [14]:
# Remove null values
df_clean = df.dropna()
#df_clean.show(5)

In [15]:
# Check data size
n_rows_clean = df_clean.count()
print(f"Number of Rows - After cleaning: {n_rows_clean}")

Number of Rows - After cleaning: 2420237


In [16]:
null_counts = df_clean.select(
    [sum(when(col(c).isNull(), 1).otherwise(0)).alias(c) for c in df.columns]
)

null_counts.show()

+---+-----+-------+-----+
| Id|Title|User_id|score|
+---+-----+-------+-----+
|  0|    0|      0|    0|
+---+-----+-------+-----+



### **2.3 Data Duplicates**

In [17]:
# Remove duplicated rows
df_clean = df_clean.dropDuplicates()

n_rows_clean = df_clean.count()
print(f"Number of Rows - After duplicates removal: {n_rows_clean}")

Number of Rows - After duplicates removal: 2383199


Up until now, we’ve performed a general cleaning of the dataset. From here on, we’ll focus exclusively on the three columns that are relevant to our analysis (Id, User_id, and score), forming a new dataset: df_short.

In [18]:
# Remove useless columns
df_short = df_clean.select("Id", "User_id","score")
#df_short.show(5)

In [19]:
# Check and remove duplicated rows for the three considered variables
df_short= df_short.dropDuplicates()

Given that the same user could have rated the same book twice, we want to compute the mean of the different scores given by the same user to the same book.

In [20]:
# Find duplicates considering only 'Id' and 'User_id'
duplicati = df_short.groupBy("Id", "User_id").count().filter("count > 1")

In [21]:
# Compute average score for every (Id, User_id)
score_mean = df_short.groupBy('Id', 'User_id').agg(F.mean('score').alias('mean_score'))
df_final = df_short.join(score_mean, on=['Id', 'User_id'], how='left')

# Creation of the final preprocessed dataset
df_final = df_final.select('Id','User_id', 'mean_score')
df_final = df_final.dropDuplicates()
#df_final.show(5)

In [22]:
#n_rows_final = df_final.count()
#print(f"Number of Rows - Final dataset: {n_rows_final}")

Before continuing the analysis, we considered only books that received a score above 3, as low ratings might indicate lack of engagement with the book.

In [23]:
# Filter and keep just rows with rating >= 3
df_final = df_final.filter(col("mean_score") >= 3)
#df_final.count()

Then we filtered for users who have rated at least 2 books, because otherwise, if they have only one high rating, we wouldn't be able to understand what other books they might buy based on that rating.

In [24]:
# Filter and keep just users who rated > 1 book
user_counts = df_final.groupBy("User_id").agg(count("Id").alias("book_count"))
users_with_multiple_books = user_counts.filter(col("book_count") > 1).select("User_id")

df_final = df_final.join(users_with_multiple_books, on="User_id", how="inner")
#df_final.count()

In [25]:
#df_final.show(5)

### **2.4 Rating Means**

- Overall mean score:

We want to calculate the overall average score to see if consistency is maintained after creating the subsample.

In [26]:
df_final = df_final.withColumn("mean_score", F.col("mean_score").cast("double"))

overall_mean = df_final.agg(F.avg("mean_score")).collect()[0][0]
print(f"Overall mean score - Final dataset: {overall_mean}")

Overall mean score - Final dataset: 4.542176703967902


---
## **3. Subsample Creation**

We aim to create a subsample that remains consistent with the original dataset. To achieve this, we select a fraction of users while ensuring that all their reviews are included. This approach allows us to better represent their purchasing behavior and rating patterns, preserving the integrity of the data.

In [27]:
num_users = df_final.select("User_id").distinct().count()
print(f"Total number of different users - Original dataset: {num_users}")

Total number of different users - Original dataset: 268637


In [28]:
df_final.select("Id").distinct().count()

162125

In [29]:
# Keep just 20% of the users
sample_fraction = 0.2
user_sample = df_final.select("User_id").distinct().sample(fraction=sample_fraction, seed=42)

In [30]:
# Create the subsample with the selected users
df_sampled = df_final.join(user_sample, on="User_id", how="inner")
#df_sampled.show(5)

In [31]:
# Check subsample size
#n_rows_sample = df_sampled.count()
#print(f"Number of Rows - Sample: {n_rows_sample}")

In [32]:
# Check data integrity
df_sampled.printSchema()

root
 |-- User_id: string (nullable = true)
 |-- Id: string (nullable = true)
 |-- mean_score: double (nullable = true)



### **3.1 Sample reliability**

In [33]:
#Check number of books rated by each user
#avg_rows_original = df_final.groupBy("User_id").count().agg(F.mean("count")).collect()[0][0]
#avg_rows_sampled = df_sampled.groupBy("User_id").count().agg(F.mean("count")).collect()[0][0]

#print(f"Books rated by each user - Original: {avg_rows_original:.2f}")
#print(f"books rated by each user - Subsample: {avg_rows_sampled:.2f}")

In [34]:
#Check Score Average and Standard Deviation
#df_final.select(F.mean("mean_score"), F.stddev("mean_score")).show()
#df_sampled.select(F.mean("mean_score"), F.stddev("mean_score")).show()

---
## **4. Algorithm Implementation**

In [35]:
# Choose whether to work with a subsample or the entire datasetù
def use_sample(is_subsample):
    if is_subsample:
        data = df_sampled
    else:
        data = df_final
    print(data.count())
    return data

In [36]:
df_filtered = use_sample(True)

295872


### **4.1 A-priori Algorithms**

The Apriori algorithm is an association rule mining method used to discover frequent patterns within large datasets. It works by first identifying the most frequently occurring itemsets and then extracting association rules that express relationships between these itemsets. The algorithm follows an iterative approach, progressively eliminating less frequent itemsets, improving efficiency. It is commonly used in market basket analysis.

#### **4.1.1 Mlxtend**

In the context of Apriori, MLxtend provides an easy-to-use implementation for association rule mining. The apriori function in MLxtend helps identify frequent itemsets from transaction data, while the association_rules function generates association rules based on these frequent itemsets. It is widely used for tasks like market basket analysis, where the goal is to find associations between products frequently bought together.

In [37]:
# Import necessary libraries
from pyspark.sql.functions import collect_set
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd
from scipy.sparse import lil_matrix

In [38]:
# Raggruppare i libri per ogni utente
transactions = df_filtered.groupBy("User_id").agg(collect_set("Id").alias("books"))
#transactions.show(5)

In [39]:
pandas_df = transactions.toPandas()
pandas_df.head()

Unnamed: 0,User_id,books
0,A07084061WTSSXN6VLV92,"[0808510002, B000Q34B8I, 0521639522, B000FC1BY..."
1,A100NGGXRQF0AQ,"[B000U424PU, 1568521332, 0066620694, 158160032..."
2,A100TQ7ZRE0W02,"[0971237034, 0976325608, 0971237018, 0972800522]"
3,A1019451ZPRJ0N3JO06M,"[B000N2HBZM, 0001050087, 0786197382, 0786192550]"
4,A1033RWNZWEMR5,"[B0006FD9NE, B0006AEPTG, B000IY4THI]"


In [40]:
unique_books = set(item for sublist in pandas_df["books"] for item in sublist)
book_index = {book: idx for idx, book in enumerate(unique_books)}  
sparse_matrix = lil_matrix((len(pandas_df), len(unique_books)), dtype=bool)

In [41]:
for row_idx, books in enumerate(pandas_df["books"]):
    for book in books:
        col_idx = book_index[book]
        sparse_matrix[row_idx, col_idx] = 1

encoded_df = pd.DataFrame.sparse.from_spmatrix(sparse_matrix, columns=book_index.keys())
encoded_df

  encoded_df = pd.DataFrame.sparse.from_spmatrix(sparse_matrix, columns=book_index.keys())


Unnamed: 0,B0007F3TGA,B000KINT0A,156280183X,B000OVOJX4,1843580284,B00087M31Y,0195167899,0875812597,1560253185,1555472583,...,0786716215,0312102682,B000P57QES,1892284731,0962254606,0451205618,0312495412,067179695X,B00071J5C6,B000JPK0YW
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53923,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53924,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53925,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
53926,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


- Function **apriori** included in Mlxtend library

In [42]:
frequent_itemsets = apriori(encoded_df, min_support=0.012, use_colnames=True)


In [43]:
# Filtra gli itemsets con almeno 2 elementi
filtered_itemsets = frequent_itemsets[frequent_itemsets['itemsets'].apply(lambda x: len(x) >= 2)]

# Ordina per supporto in ordine decrescente
sorted_itemsets = filtered_itemsets.sort_values(by='support', ascending=False)

sorted_itemsets.head(15)

# Stampa i risultati
print("Top 15 frequent itemsets:")
sorted_itemsets.head(15)


Top 15 frequent itemsets:


Unnamed: 0,support,itemsets
29,0.01298,"(B000ILIJE0, B000NWU3I4)"
23,0.012869,"(B000PC54NG, B000ILIJE0)"
24,0.01285,"(B000PC54NG, B000NWU3I4)"
79,0.01285,"(B000PC54NG, B000ILIJE0, B000NWU3I4)"
30,0.012832,"(B000ILIJE0, B000NWQXBA)"
25,0.012832,"(B000PC54NG, B000NWQXBA)"
80,0.012832,"(B000PC54NG, B000NWQXBA, B000ILIJE0)"
34,0.012813,"(B000NWQXBA, B000NWU3I4)"
84,0.012813,"(B000PC54NG, B000NWQXBA, B000NWU3I4)"
93,0.012813,"(B000ILIJE0, B000NWQXBA, B000NWU3I4)"


- Association rules

In [44]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)
print("\nRegole di associazione:")
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]


Regole di associazione:


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(B000GQG5MA),(B000H9R1Q0),0.012239,0.980684,76.869622
1,(B000H9R1Q0),(B000GQG5MA),0.012239,0.959302,76.869622
2,(B000PC54NG),(B000GQG5MA),0.012350,0.959654,76.897817
3,(B000GQG5MA),(B000PC54NG),0.012350,0.989599,76.897817
4,(B000ILIJE0),(B000GQG5MA),0.012480,0.958689,76.820513
...,...,...,...,...,...
4595,(B000NDSX6C),"(B000PC54NG, B000NWQXBA, B000Q032UY, B000H9R1Q...",0.012220,0.979198,76.864875
4596,(B000Q032UY),"(B000PC54NG, B000NWQXBA, B000NDSX6C, B000H9R1Q...",0.012220,0.957849,78.383721
4597,(B000H9R1Q0),"(B000PC54NG, B000NWQXBA, B000NDSX6C, B000Q032U...",0.012220,0.957849,78.383721
4598,(B000ILIJE0),"(B000PC54NG, B000NWQXBA, B000NDSX6C, B000Q032U...",0.012220,0.938746,76.820513


### **4.1.2 From scratch**

In [45]:
#import necessary library
from itertools import combinations

In [46]:
def apriori_frequent_books(rdd, threshold):

    # Step 1: Creates pairs of books for each user
    pairs_rdd = rdd.flatMap(lambda books: [tuple(sorted(pair)) for pair in combinations(books, 2)])

    # Step 2: Count pairs frequency
    pair_counts_rdd = pairs_rdd.map(lambda pair: (pair, 1)).reduceByKey(lambda a, b: a + b)

    # Step 3: Filter pairs where counts >= threshold
    frequent_pairs_rdd = pair_counts_rdd.filter(lambda pair_count: pair_count[1] >= threshold)

    return frequent_pairs_rdd

In [47]:
# Convert df_filtered as RDD 
rdd = df_filtered.rdd.map(lambda row: (row["User_id"], row["Id"])) \
                     .groupByKey() \
                     .map(lambda x: list(set(x[1]))) 

In [48]:
threshold = 50

In [49]:
frequent_book_pairs = apriori_frequent_books(rdd, threshold)

In [50]:
frequent_book_pairs.sortBy(lambda x: -x[1]).take(15)

[(('B000ILIJE0', 'B000NWU3I4'), 710),
 (('B000NWU3I4', 'B000PC54NG'), 704),
 (('B000ILIJE0', 'B000PC54NG'), 704),
 (('B000NWQXBA', 'B000PC54NG'), 702),
 (('B000NWQXBA', 'B000NWU3I4'), 702),
 (('B000ILIJE0', 'B000NWQXBA'), 702),
 (('B000H9R1Q0', 'B000PC54NG'), 696),
 (('B000NWQXBA', 'B000Q032UY'), 696),
 (('B000ILIJE0', 'B000Q032UY'), 696),
 (('B000H9R1Q0', 'B000Q032UY'), 696),
 (('B000H9R1Q0', 'B000NWU3I4'), 696),
 (('B000H9R1Q0', 'B000ILIJE0'), 696),
 (('B000PC54NG', 'B000Q032UY'), 696),
 (('B000H9R1Q0', 'B000NWQXBA'), 696),
 (('B000NWU3I4', 'B000Q032UY'), 696)]

In [51]:
frequent_df = frequent_book_pairs.toDF(["pair", "count"])

In [52]:
book_counts = rdd.flatMap(lambda books: [(book, 1) for book in books]) \
                 .reduceByKey(lambda a, b: a + b) \
                 .toDF(["book", "count"])

In [53]:
rules_df = frequent_df.select(
    col("pair").getField("_1").alias("book_A"),
    col("pair").getField("_2").alias("book_B"),
    col("count").alias("pair_count")
)

In [54]:
rules_df = rules_df.join(book_counts.withColumnRenamed("book", "book_A"), "book_A") \
                   .withColumnRenamed("count", "count_A") \
                   .join(book_counts.withColumnRenamed("book", "book_B"), "book_B") \
                   .withColumnRenamed("count", "count_B")

In [55]:
rules_df = rules_df.withColumn("support", col("pair_count") / rdd.count()) \
                   .withColumn("confidence_AtoB", col("pair_count") / col("count_A")) \
                   .withColumn("confidence_BtoA", col("pair_count") / col("count_B")) \
                   .withColumn("lift", col("confidence_AtoB") / (col("count_B") / rdd.count()))

rules_df.show(15, False)

+----------+----------+----------+-------+-------+---------------------+------------------+---------------+-----------------+
|book_B    |book_A    |pair_count|count_A|count_B|support              |confidence_AtoB   |confidence_BtoA|lift             |
+----------+----------+----------+-------+-------+---------------------+------------------+---------------+-----------------+
|B000N33NTE|B000FML2C8|140       |140    |140    |0.002597065316192702 |1.0               |1.0            |385.05           |
|B000L8X5UI|B0007FY0G8|51        |51     |51     |9.46073793755913E-4  |1.0               |1.0            |1057.0           |
|1581730470|0395051150|60        |60     |60     |0.0011130279926540153|1.0               |1.0            |898.4499999999999|
|8437615607|0395051150|60        |60     |60     |0.0011130279926540153|1.0               |1.0            |898.4499999999999|
|0708982581|0395051150|60        |60     |60     |0.0011130279926540153|1.0               |1.0            |898.4499999

### **4.2 SON Algorithm**

In [58]:
rdd = df_filtered.rdd.map(lambda row: (row["User_id"], row["Id"])) \
                     .groupByKey() \
                     .map(lambda x: (x[0], list(set(x[1]))))  # Rimuove duplicati e crea lista unica di Id

# Usa 'repartition' per suddividere l'RDD in 5 partizioni
rdd_partitioned = rdd.repartition(5)


In [59]:
from pyspark.rdd import RDD
from itertools import combinations

In [60]:
def map_candidates(record):
    """Mappa ogni utente ai suoi itemset candidati di lunghezza 1"""
    user, items = record
    return [((item,), 1) for item in set(items)]

def reduce_frequencies(a, b):
    """Somma le frequenze degli itemset"""
    return a + b

def map_generate_candidates(record):
    """Genera coppie di itemset frequenti (solo coppie di libri)"""
    user, items = record
    items = list(set(items))  # Rimuove duplicati
    return [(tuple(sorted(comb)), 1) for comb in combinations(items, 2)]

def find_frequent_itemsets(rdd: RDD, s: int = 50):
    """
    Algoritmo MapReduce per trovare coppie di itemset frequenti (solo coppie di libri).
    Restituisce ogni coppia di libri insieme alla sua frequenza.
    """
    # Seconda scansione (genera coppie di itemset frequenti)
    candidates_k2 = rdd.flatMap(map_generate_candidates).reduceByKey(reduce_frequencies)
    
    # Filtra le coppie di itemset che hanno frequenza >= s
    frequent_itemsets_k2 = candidates_k2.filter(lambda x: x[1] >= s)
    
    # Restituisce le coppie di libri con la loro frequenza
    return frequent_itemsets_k2

In [61]:
# Esegui la funzione con l'RDD rdd2
result = find_frequent_itemsets(rdd_partitioned, s=2).collect()


In [62]:
sorted_result = sorted(result, key=lambda x: x[1], reverse=True)
print(sorted_result[:15])

[(('B000ILIJE0', 'B000NWU3I4'), 710), (('B000ILIJE0', 'B000PC54NG'), 704), (('B000NWU3I4', 'B000PC54NG'), 704), (('B000NWQXBA', 'B000PC54NG'), 702), (('B000ILIJE0', 'B000NWQXBA'), 702), (('B000NWQXBA', 'B000NWU3I4'), 702), (('B000NWQXBA', 'B000Q032UY'), 696), (('B000H9R1Q0', 'B000PC54NG'), 696), (('B000H9R1Q0', 'B000NWU3I4'), 696), (('B000NWU3I4', 'B000Q032UY'), 696), (('B000H9R1Q0', 'B000NWQXBA'), 696), (('B000H9R1Q0', 'B000ILIJE0'), 696), (('B000PC54NG', 'B000Q032UY'), 696), (('B000ILIJE0', 'B000Q032UY'), 696), (('B000H9R1Q0', 'B000Q032UY'), 696)]


### 5. Intero dataset

In [68]:
## RUN A PRIORI CON TUTTO DF:

In [69]:
df_filtered = use_sample(False)

1488780


In [70]:
rdd = df_filtered.rdd.map(lambda row: (row["User_id"], row["Id"])) \
                     .groupByKey() \
                     .map(lambda x: list(set(x[1]))) 

In [71]:
frequent_book_pairs = apriori_frequent_books(rdd, threshold)
frequent_book_pairs.sortBy(lambda x: -x[1]).take(15)

[(('B000ILIJE0', 'B000NWU3I4'), 3450),
 (('B000ILIJE0', 'B000PC54NG'), 3428),
 (('B000NWQXBA', 'B000PC54NG'), 3424),
 (('B000ILIJE0', 'B000NWQXBA'), 3424),
 (('B000NWU3I4', 'B000PC54NG'), 3423),
 (('B000NWQXBA', 'B000NWU3I4'), 3418),
 (('B000NWQXBA', 'B000Q032UY'), 3400),
 (('B000ILIJE0', 'B000Q032UY'), 3400),
 (('B000PC54NG', 'B000Q032UY'), 3400),
 (('B000H9R1Q0', 'B000PC54NG'), 3396),
 (('B000H9R1Q0', 'B000Q032UY'), 3396),
 (('B000H9R1Q0', 'B000ILIJE0'), 3396),
 (('B000H9R1Q0', 'B000NWQXBA'), 3396),
 (('B000NWU3I4', 'B000Q032UY'), 3394),
 (('B000H9R1Q0', 'B000NWU3I4'), 3390)]