In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%load_ext lab_black
# or nb_black

# Idea

Let us add an LLM embedding to another recommender systems: among others, there could be two versions of directly integrating an LLM embedding in the Association Rules pipeline:

- First Association Rules, then LLM:
    1) Training: AR rules, then
    2) embedding of antecedents
    3) instead of direct matches, we look close neighbours of the products (in case of coincidence, distance is 0)
    4) => we obtain a wider use of the same rules
- LLM, clustering, AR
    1) embedding of descrpiption before AR training,
    2) clustering to see bubbles,
    3) Association Rules for bubbles found for bubbles
    4) => For prediction, we check distance from a bubble. To find consequence: we keep the original product belonging to each point of the bubble

In this notebook, and in the repository, I have implemented the first idea

# Imports and data

In [3]:
import pickle
import json
from ecommercerecommendation.utils.data import get_data, venn_sets

from ecommercerecommendation.models.arllmrecommender import ARLLMRecommender

from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse

In [4]:
df = get_data("clean_data")

In [5]:
print("Columns: " + ", ".join(df.columns))

Columns: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country


## Train-test-split

We will use a 80-20 split by InvoiceDate in order to prevent data leakage.

In [6]:
cut_off_date = df["InvoiceDate"].quantile(0.8)
df["train"] = df["InvoiceDate"] <= cut_off_date

In [7]:
X_train = df[df["train"]].drop(columns="train")
X_test = df[~df["train"]].drop(columns="train")

In [8]:
from ecommercerecommendation.models.arllmrecommender import ARLLMRecommender
from ecommercerecommendation.models.arrecommender import ARRecommender

In [9]:
ar_recommender = ARRecommender()

In [10]:
ar_recommender.fit(X_train)

In [11]:
ar_recommender.rules[ar_recommender.rules["confidence"] >= 0.5].iloc[50:52][
    ["antecedents", "consequents"]
]

Unnamed: 0,antecedents,consequents
1077,(22698),(22697)
913,"(22697, 22423)",(22699)


In [12]:
X_train[X_train["StockCode"] == 22698][["StockCode", "Description"]].drop_duplicates()

Unnamed: 0,StockCode,Description
12636,22698,PINK REGENCY TEACUP AND SAUCER


In [13]:
# 22383 LUNCH BAG SUKI DESIGN
recommendation = ar_recommender.get_recommendations([22698])
recommendation

[22697, 22699, 22423]

In [14]:
X_train[X_train["StockCode"].isin(recommendation)][
    ["StockCode", "Description"]
].drop_duplicates()

Unnamed: 0,StockCode,Description
845,22423,REGENCY CAKESTAND 3 TIER
1038,22699,ROSES REGENCY TEACUP AND SAUCER
1047,22697,GREEN REGENCY TEACUP AND SAUCER


## AR-LLM algorithm

In [15]:
arllm_recommender = ARLLMRecommender()

In [16]:
arllm_recommender.fit(X_train)

Now, we have the embedding of the antecedents in the rules:

In [17]:
arllm_recommender.rules[arllm_recommender.rules["confidence"] >= 0.5].iloc[50:52][
    ["antecedents", "consequents", "antecedents_embedded"]
]

Unnamed: 0,antecedents,consequents,antecedents_embedded
110,"(22386 JUMBO BAG PINK POLKADOT, 21931 JUMBO ST...",(85099 JUMBO BAG RED RETROSPOT),"[[0.026957354, 0.08470354, -0.014616151, 0.003..."
1208,(23174 REGENCY SUGAR BOWL GREEN),(23175 REGENCY MILK JUG PINK),"[[0.050384134, -0.023970352, -0.017461983, -0...."


In [18]:
selection0 = ["PINK REGENCY TEACUP AND SAUCER"]

We get more similar recommended items:

In [19]:
arllm_recommender.get_recommendations(selection0)

['22423 REGENCY CAKESTAND 3 TIER',
 '22697 GREEN REGENCY TEACUP AND SAUCER',
 '22699 ROSES REGENCY TEACUP AND SAUCER',
 '23171 REGENCY TEA PLATE GREEN',
 '23170 REGENCY TEA PLATE ROSES']

We can aso use it for random tect:

In [20]:
selection1 = ["red paper clip"]

In [21]:
arllm_recommender.get_recommendations(selection1)

['21080 SET 20 RED RETROSPOT PAPER NAPKINS',
 '21086 SET 6 RED SPOTTY PAPER CUPS',
 '21094 SET 6 RED SPOTTY PAPER PLATES',
 '85123 WHITE HANGING HEART T-LIGHT HOLDER',
 '20723 STRAWBERRY CHARLOTTE BAG']

In [22]:
selection2 = ["WHITE HANGING HEART T-LIGHT HOLDER"]

In [23]:
arllm_recommender.get_recommendations(selection2)

['82494 WOODEN FRAME ANTIQUE WHITE',
 '82482 WOODEN PICTURE FRAME WHITE FINISH',
 '84970 HANGING HEART ZINC T-LIGHT HOLDER',
 '23322 LARGE WHITE HEART OF WICKER',
 '23321 SMALL WHITE HEART OF WICKER']

In [24]:
selection3 = ["REGENCY TEA PLATE ROSES", "REGENCY TEA PLATE PINK"]

In [25]:
arllm_recommender.get_recommendations(selection3)

['23171 REGENCY TEA PLATE GREEN',
 '22699 ROSES REGENCY TEACUP AND SAUCER',
 '23174 REGENCY SUGAR BOWL GREEN',
 '23175 REGENCY MILK JUG PINK',
 '22423 REGENCY CAKESTAND 3 TIER']

In [26]:
arllm_recommender.save_pickle("ARLLM_model.pkl")