### Alternating Least Squares


- **ALS (Alternating Least Squares)** là thuật toán được sử dụng để **đề xuất sản phẩm** và **danh mục sản phẩm** phù hợp cho người dùng. Thuật toán này hoạt động bằng cách tối ưu hóa **ma trận tương tác** giữa người dùng và sản phẩm theo từng bước lặp.

- Mỗi bước lặp, ALS cố gắng **tiếp cận** dần với cách biểu diễn được hệ số hóa của dữ liệu gốc. Nói cách khác, thuật toán sẽ phân tích **ma trận tương tác** thành hai ma trận nhỏ hơn, đại diện cho **sở thích** của người dùng và **đặc điểm** của sản phẩm. Sau đó, ALS sử dụng thông tin này để **dự đoán** mức độ yêu thích của người dùng đối với các sản phẩm chưa được tương tác trước đây.

In [1]:
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD

In [2]:
import gdown

file_id ='1FdhKYX7QfInl0DN6UQGYqYnipbHIqgj6'
output_path = 'Olist_Recommendation_Dataset.csv'
gdown.download(f"https://drive.google.com/uc?id={file_id}", output_path, quiet=True,fuzzy=True)

'Olist_Recommendation_Dataset.csv'

In [3]:
Olist_db = pd.read_csv('Olist_Recommendation_Dataset.csv')

In [4]:
Olist_db.columns

Index(['Customer State', 'Review Score', 'Purchase Timestamp', 'Purchase Date',
       'Monetary', 'Customer Segment', 'Marketing Action', 'Customer_ID',
       'Order_ID', 'Product_ID', 'product_name', 'aisle', 'department',
       'Product Category'],
      dtype='object')

In [5]:
Olist_db

Unnamed: 0,Customer State,Review Score,Purchase Timestamp,Purchase Date,Monetary,Customer Segment,Marketing Action,Customer_ID,Order_ID,Product_ID,product_name,aisle,department,Product Category
0,RJ,5,2017-09-13 08:59:02,2017-09-13 00:00:00,281.30,Potential Loyalists,Cross Sell Recommendations and Discount coupons,15438,0,8446,Utility Lighter,more household,household,more household
1,GO,5,2017-06-28 11:52:20,2017-06-28 00:00:00,73.86,Lost Customers,Don't spend too much trying to re-acquire,22529,7229,8446,Utility Lighter,more household,household,more household
2,MG,4,2018-05-18 10:25:53,2018-05-18 00:00:00,108.32,Potential Loyalists,Cross Sell Recommendations and Discount coupons,17139,31508,8446,Utility Lighter,more household,household,more household
3,PR,5,2017-08-01 18:38:42,2017-08-01 00:00:00,472.06,Loyal Customers,Loyality programs;Cross Sell,33008,42252,8446,Utility Lighter,more household,household,more household
4,MG,5,2017-08-10 21:48:40,2017-08-10 00:00:00,1415.74,VVIP - Can't Loose Them,No Price Incentives; Offer Limited edition and...,41377,47321,8446,Utility Lighter,more household,household,more household
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100846,RJ,1,2017-12-09 13:51:24,2017-12-09 00:00:00,618.88,Potential Loyalists,Cross Sell Recommendations and Discount coupons,3664,94603,14949,California Pinot Noir Red Wine,missing,missing,Hand Luggage
100847,SP,4,2017-07-13 11:24:43,2017-07-13 00:00:00,157.30,Hibernating - Almost Lost,Aggressive price incentives,10847,87800,4042,Chicken Curry with Seasoned Basmati Rice,frozen meals,frozen,frozen meals
100848,SP,3,2018-07-27 10:44:02,2018-07-27 00:00:00,68.35,Potential Loyalists,Cross Sell Recommendations and Discount coupons,32799,88387,8881,16inch Macbook Pro -2.6GHz Intel Core i7,ice cream ice,frozen,Apple Macbook
100849,SC,5,2018-08-21 11:29:05,2018-08-21 00:00:00,314.32,Potential Loyalists,Cross Sell Recommendations and Discount coupons,40805,90090,13612,Roja Parfumes Luxe 3.4 oz,candy chocolate,snacks,Perfumes


In [6]:
features = ['Customer_ID', 'product_name', 'Product_ID', 'Review Score']
Olist_db_ = Olist_db[features]
Olist_db_ = Olist_db_.rename(columns={'Review Score': 'Ratings'})
Olist_db_ = Olist_db_.drop_duplicates()

In [7]:
from scipy.sparse import csr_matrix

In [8]:
Olist_db_ = Olist_db_[:20000]

In [9]:
Olist_db_.Customer_ID.unique()

array([15438, 22529, 17139, ...,  6650, 27126, 18524])

Xoay DataFrame Olist_db_ để tạo ma trận xếp hạng sản phẩm của người dùng, điền các giá trị còn thiếu bằng 0, chuyển đổi thành ma trận thưa thớt

In [10]:
# pivot ratings into movie features
df_product_features = Olist_db_.reset_index().pivot_table(
    index='Customer_ID',
    columns='Product_ID',
    values='Ratings'
).fillna(0)
# convert dataframe of movie features to scipy sparse matrix
mat_product_features = csr_matrix(df_product_features.values)

In [11]:
df_product_features.head()

Product_ID,63,69,151,161,185,219,226,238,255,311,...,32021,32050,32052,32083,32115,32134,32164,32170,32171,32203
Customer_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [12]:
df_product_features.shape

(15534, 1172)

In [13]:
X = df_product_features.T
X.head()

Customer_ID,5,9,12,15,16,19,20,24,27,29,...,41410,41412,41413,41416,41418,41419,41421,41422,41424,41425
Product_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
63,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
69,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
151,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
161,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
X.index[:-20]

Index([   63,    69,   151,   161,   185,   219,   226,   238,   255,   311,
       ...
       31567, 31575, 31585, 31605, 31607, 31631, 31654, 31668, 31671, 31692],
      dtype='int64', name='Product_ID', length=1152)

In [15]:
# Khởi tạo một đối tượng TruncatedSVD với n_components được đặt thành 10.
# TruncatedSVD là một kỹ thuật giảm kích thước thường được sử dụng cho dữ liệu thưa thớt.
SVD = TruncatedSVD(n_components=10)

# Khớp mô hình TruncatedSVD với ma trận dữ liệu X và biến nó thành biểu diễn có chiều thấp hơn.
decomposed_matrix = SVD.fit_transform(X)
pd.DataFrame(decomposed_matrix)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.004619,0.001387,0.003657,0.003463,0.000871,-0.000108,0.004085,-0.001224,0.009034,0.009207
1,0.000004,0.000002,0.000006,0.000006,0.000001,0.000002,0.000015,-0.000004,-0.000010,0.000015
2,0.007524,0.002272,0.013613,0.006708,0.019425,-0.005280,0.044534,-0.003494,0.018694,0.386938
3,0.060293,0.026168,0.328496,0.332439,0.158521,-0.124320,0.513800,0.075819,-0.185482,-0.075047
4,0.018949,0.004320,0.274828,0.286672,-0.136943,-0.071229,-0.028863,0.012094,-0.025490,-0.002678
...,...,...,...,...,...,...,...,...,...,...
1167,0.001109,0.001373,0.000391,-0.000332,0.000181,0.001914,0.001242,0.004659,-0.000283,0.003844
1168,0.111189,0.205104,-0.010844,-0.000782,-0.019955,0.003858,-0.007213,0.001466,-0.007323,-0.007303
1169,0.003024,0.001929,0.002550,0.002624,0.004603,0.001089,0.005126,-0.003093,0.002553,0.009997
1170,0.001963,0.000705,0.002965,0.000924,0.000817,-0.001521,0.003288,0.000159,-0.004570,-0.003575


In [16]:
# Tính toán ma trận tương quan từ ma trận phân rã bằng hàm corrcoef.
# Ma trận tương quan cho thấy mối tương quan giữa các cặp biến (trong trường hợp này là features) trong ma trận phân rã.
correlation_matrix = np.corrcoef(decomposed_matrix)
correlation_matrix.shape

(1172, 1172)

In [17]:
pd.DataFrame(correlation_matrix)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1162,1163,1164,1165,1166,1167,1168,1169,1170,1171
0,1.000000,0.208639,0.614612,-0.201620,0.098402,-0.041768,0.110011,-0.027197,-0.317066,0.309078,...,0.643809,-0.102802,0.109027,-0.556272,0.227403,-0.185654,-0.157927,0.690777,-0.484055,0.666227
1,0.208639,1.000000,0.575150,0.530144,0.176313,0.084615,0.204429,0.021305,-0.139018,-0.016094,...,0.718738,0.378744,0.098737,0.361068,-0.138926,0.181152,-0.085808,0.683366,0.361428,0.461315
2,0.614612,0.575150,1.000000,-0.222264,-0.104631,-0.061584,0.173478,-0.020661,-0.418703,0.037742,...,0.911980,-0.339920,-0.033564,-0.496920,-0.446886,0.463938,-0.200107,0.793753,-0.460109,0.466557
3,-0.201620,0.530144,-0.222264,1.000000,0.460064,0.444103,0.364494,0.365376,0.408058,-0.081605,...,0.018215,0.912635,0.299105,0.769504,0.516619,-0.266538,-0.183018,0.050387,0.842592,-0.098375
4,0.098402,0.176313,-0.104631,0.460064,1.000000,0.038722,0.580375,0.010144,0.604589,-0.065805,...,-0.090860,0.350182,0.612424,0.127542,0.211401,-0.273806,-0.068356,-0.115216,0.341625,-0.144588
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1167,-0.185654,0.181152,0.463938,-0.266538,-0.273806,-0.376878,-0.247866,-0.306382,-0.341463,-0.076671,...,0.395795,-0.483312,-0.019340,-0.160581,-0.750642,1.000000,-0.001471,-0.083087,-0.203645,-0.229381
1168,-0.157927,-0.085808,-0.200107,-0.183018,-0.068356,0.012206,-0.019890,-0.002872,-0.250013,-0.096951,...,-0.132361,0.050334,-0.029751,0.214934,-0.144646,-0.001471,1.000000,-0.153756,0.182511,-0.063358
1169,0.690777,0.683366,0.793753,0.050387,-0.115216,0.239221,0.241772,0.218022,-0.328089,0.131494,...,0.825473,0.075025,-0.096715,-0.207625,-0.024987,-0.083087,-0.153756,1.000000,-0.195188,0.746361
1170,-0.484055,0.361428,-0.460109,0.842592,0.341625,0.419398,0.233286,0.341204,0.481761,0.055920,...,-0.180001,0.847222,0.364382,0.958387,0.287883,-0.203645,0.182511,-0.195188,1.000000,-0.154554


In [18]:
fave_prod = Olist_db_.groupby(['Customer_ID']).max()['Product_ID'].to_frame()

In [19]:
fave_prod = fave_prod.reset_index()
fave_prod

Unnamed: 0,Customer_ID,Product_ID
0,5,10630
1,9,6927
2,12,15675
3,15,27919
4,16,10630
...,...,...
15529,41419,13331
15530,41421,18573
15531,41422,15250
15532,41424,21641


In [20]:
def get_prod_id(customer_id):

    product_id = fave_prod[fave_prod.Customer_ID == customer_id]['Product_ID']
    return product_id


prd_id = get_prod_id(41422)
Product_id = prd_id.iloc[0]
Product_id

15250

In [21]:
product_names = list(X.index)
product_ID_Idx = product_names.index(Product_id)
product_ID_Idx

531

In [22]:
correlation_product_ID = correlation_matrix[product_ID_Idx]
correlation_product_ID

array([-0.02091154,  0.54267382,  0.62982029, ...,  0.35789595,
        0.07406533, -0.0368989 ])

### Recommending top 10 highly coorelated products in sequence

In [23]:
Recommend = list(X.index[correlation_product_ID > 0.70])

# Removes the item already bought by the customer
Recommend.remove(Product_id)
Recommend[0:20]

[1477,
 3417,
 4835,
 7564,
 12089,
 12782,
 16684,
 16929,
 16961,
 19005,
 20721,
 28058,
 30062,
 32134]

In [24]:
## Getting Product names from prediction

# Tạo dự đoán chứa 20 chỉ mục sản phẩm được đề xuất đầu tiên.
predictions = pd.DataFrame(Recommend[:20])
predictions.columns = ['Product_ID']
predictions

Unnamed: 0,Product_ID
0,1477
1,3417
2,4835
3,7564
4,12089
5,12782
6,16684
7,16929
8,16961
9,19005


In [25]:
predictions['Product Name'] = predictions.Product_ID.apply(lambda x : Olist_db_[Olist_db_.Product_ID == x]['product_name'].unique()[0])

In [26]:
predictions[:10]

Unnamed: 0,Product_ID,Product Name
0,1477,Wild Mushroom Cauliflower Hempseed Burgers
1,3417,Chocolate Creme Cookies
2,4835,Strong Thai Sweet Chili Energy Bar
3,7564,Protein And Fiber Chocolate Peanut Butter Bar
4,12089,Citrus + Mint Soap
5,12782,Bun Length Beef Franks
6,16684,Honey Butter Crescent Rolls
7,16929,Dried Shiitake Mushrooms
8,16961,45 Calories & Delightful Healthy Multi-Grain B...
9,19005,Jet-Puffed Marshmallows


In [27]:
Olist_db_.to_csv('Olist_db_ALS.csv', index=False)

In [28]:
X.to_csv('X_ALS.csv')

In [29]:
correlation_matrix

array([[ 1.        ,  0.2086393 ,  0.61461225, ...,  0.69077716,
        -0.48405497,  0.66622694],
       [ 0.2086393 ,  1.        ,  0.57515043, ...,  0.68336621,
         0.36142784,  0.46131488],
       [ 0.61461225,  0.57515043,  1.        , ...,  0.79375307,
        -0.46010891,  0.4665574 ],
       ...,
       [ 0.69077716,  0.68336621,  0.79375307, ...,  1.        ,
        -0.19518829,  0.74636145],
       [-0.48405497,  0.36142784, -0.46010891, ..., -0.19518829,
         1.        , -0.15455382],
       [ 0.66622694,  0.46131488,  0.4665574 , ...,  0.74636145,
        -0.15455382,  1.        ]])

In [30]:
correlation_matrix.shape

(1172, 1172)

In [31]:
import json

with open('correlation_matrix_ALS.txt', 'w') as filehandle:
    json.dump(correlation_matrix.tolist(), filehandle)

### Prediction Pipeline

In [32]:
### use 3 files to predict for a User

In [33]:
Olist_db_ = pd.read_csv('Olist_db_ALS.csv')

In [34]:
X = pd.read_csv('X_ALS.csv', index_col=0)
X

Unnamed: 0_level_0,5,9,12,15,16,19,20,24,27,29,...,41410,41412,41413,41416,41418,41419,41421,41422,41424,41425
Product_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
63,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
69,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
151,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
161,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32134,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
32164,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
32170,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
32171,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [35]:
X.columns[4000:4020]

Index(['10670', '10671', '10675', '10677', '10679', '10680', '10681', '10682',
       '10685', '10686', '10689', '10694', '10696', '10698', '10699', '10700',
       '10703', '10712', '10713', '10714'],
      dtype='object')

In [36]:
import json
with open('correlation_matrix_ALS.txt') as f:
    correlation_matrix = json.load(f)

In [37]:
correlation_matrix = np.array(correlation_matrix)

In [38]:
X.index

Index([   63,    69,   151,   161,   185,   219,   226,   238,   255,   311,
       ...
       32021, 32050, 32052, 32083, 32115, 32134, 32164, 32170, 32171, 32203],
      dtype='int64', name='Product_ID', length=1172)

### Streamlit

In [39]:
def Recommendations_ALS(Customer_id):
    #Olist_db_ = Olist_db_[Olist_db_.Customer_ID < 3600]
    fave_prod = Olist_db_.groupby(['Customer_ID']).max()['Product_ID'].to_frame()
    fave_prod = fave_prod.reset_index()

    #prd_id = get_prod_id(Customer_id)
    prd_id = fave_prod[fave_prod.Customer_ID == Customer_id]['Product_ID']
    Product_id = prd_id.iloc[0]
    product_names = list(X.index)
    product_ID_Idx = product_names.index(Product_id)

    correlation_product_ID = correlation_matrix[product_ID_Idx]

    Recommend = list(X.index[correlation_product_ID > 0.70])
    # Removes the item already bought by the customer
    Recommend.remove(Product_id)

    ## Getting Product names froom prediction

    predictions = pd.DataFrame(Recommend[:20])
    predictions.columns = ['Product_ID']

    predictions['Product Name'] = predictions.Product_ID.apply(lambda x : Olist_db_[Olist_db_.Product_ID == x]['product_name'].unique()[0])
    Recommendations = predictions[:10]
    return Recommendations

In [40]:
Recommendations = Recommendations_ALS(10679)
Recommendations[:10000]

Unnamed: 0,Product_ID,Product Name
0,185,Chicken Nugget Meal
1,504,Velveeta Shells & Cheese Bold Jalapeno
2,2694,Superberry Kombucha
3,3947,16inch Macbook Pro -2.6GHz Intel Core i7
4,4575,"Beef Jerky, Teriyaki"
5,4621,Organic Granny Smith Apple Bag
6,5140,Lean & Fit Coconut Lemongrass Chicken
7,6367,Sugar Free Concord Grape Jam
8,6488,Grape Water Enhancer
9,7006,Original Alcohol Free Witch Hazel With Aloe Vera


In [41]:
import pandas as pd
import numpy as np
import pickle
from scipy.sparse import csr_matrix
from sklearn.decomposition import TruncatedSVD
import json

# Function to load and prepare data
def load_and_prepare_data(olist_db_path, x_als_path, correlation_matrix_path):
    """Loads and prepares the data for the recommendation model.

    Args:
        olist_db_path (str): Path to the Olist_db_ALS.csv file.
        x_als_path (str): Path to the X_ALS.csv file.
        correlation_matrix_path (str): Path to the correlation_matrix_ALS.txt file.

    Returns:
        tuple: A tuple containing the following:
            - Olist_db_ (pd.DataFrame): The Olist dataset with customer and product information.
            - X (pd.DataFrame): The product features matrix.
            - correlation_matrix (np.ndarray): The correlation matrix between products.
    """
    Olist_db_ = pd.read_csv(olist_db_path)
    X = pd.read_csv(x_als_path, index_col=0)
    with open(correlation_matrix_path) as f:
        correlation_matrix = json.load(f)
    correlation_matrix = np.array(correlation_matrix)
    return Olist_db_, X, correlation_matrix

# Function to get recommendations
def recommendations_als(customer_id, olist_db_, x, correlation_matrix, num_recommendations=10):
    """Generates product recommendations for a given customer using ALS.

    Args:
        customer_id (int): The ID of the customer.
        olist_db_ (pd.DataFrame): The Olist dataset with customer and product information.
        x (pd.DataFrame): The product features matrix.
        correlation_matrix (np.ndarray): The correlation matrix between products.
        num_recommendations (int, optional): The number of recommendations to generate. Defaults to 10.

    Returns:
        pd.DataFrame: A DataFrame containing the recommended products and their names.
    """
    fave_prod = olist_db_.groupby(['Customer_ID']).max()['Product_ID'].to_frame()
    fave_prod = fave_prod.reset_index()
    prd_id = fave_prod[fave_prod.Customer_ID == customer_id]['Product_ID']
    product_id = prd_id.iloc[0]
    product_names = list(X.index)
    product_id_idx = product_names.index(product_id)
    correlation_product_id = correlation_matrix[product_id_idx]
    recommend = list(X.index[correlation_product_id > 0.70])
    recommend.remove(product_id)  # Remove already bought product
    predictions = pd.DataFrame(recommend[:20])  # Get top 20 candidates
    predictions.columns = ['Product_ID']
    predictions['Product Name'] = predictions.Product_ID.apply(lambda x: olist_db_[olist_db_.Product_ID == x]['product_name'].unique()[0])
    recommendations = predictions[:num_recommendations]  # Select the desired number of recommendations
    return recommendations

# Function to save the model
def save_model(olist_db_path, x_als_path, correlation_matrix_path, filename):
    """Saves the recommendation model to a pickle file.

    Args:
        olist_db_path (str): Path to the Olist_db_ALS.csv file.
        x_als_path (str): Path to the X_ALS.csv file.
        correlation_matrix_path (str): Path to the correlation_matrix_ALS.txt file.
        filename (str): The name of the pickle file to save the model to.
    """
    olist_db_, x, correlation_matrix = load_and_prepare_data(olist_db_path, x_als_path, correlation_matrix_path)
    model = {
        'olist_db_': olist_db_,
        'x': x,
        'correlation_matrix': correlation_matrix,
        'recommendations_als': recommendations_als
    }
    with open(filename, 'wb') as f:
        pickle.dump(model, f)

# Example usage (assuming data files are in the same directory)
if __name__ == '__main__':
    olist_db_path = 'Olist_db_ALS.csv'
    x_als_path = 'X_ALS.csv'
    correlation_matrix_path = 'correlation_matrix_ALS.txt'
    filename = 'SVD_ALS_model.pkl'
    save_model(olist_db_path, x_als_path, correlation_matrix_path, filename)

In [42]:
import pickle

# Load the model
with open('SVD_ALS_model.pkl', 'rb') as f:
    model = pickle.load(f)

# Get recommendations for a customer (e.g., customer ID 10679)
recommendations = model['recommendations_als'](10679, model['olist_db_'], model['x'], model['correlation_matrix'], num_recommendations=15)

# Print the recommendations
print(recommendations)

    Product_ID                                      Product Name
0          185                               Chicken Nugget Meal
1          504            Velveeta Shells & Cheese Bold Jalapeno
2         2694                               Superberry Kombucha
3         3947          16inch Macbook Pro -2.6GHz Intel Core i7
4         4575                              Beef Jerky, Teriyaki
5         4621                    Organic Granny Smith Apple Bag
6         5140             Lean & Fit Coconut Lemongrass Chicken
7         6367                      Sugar Free Concord Grape Jam
8         6488                              Grape Water Enhancer
9         7006  Original Alcohol Free Witch Hazel With Aloe Vera
10        8108                             Spaghetti Style Pasta
11        8721                                 Lite Rice Vinegar
12        8812    Tidy Cats Clumping Litter 4-in-1 Strength Size
13        9442                                  Charentais Melon
14        9510           