Skip to content

liu-yihong/BPRH

Repository files navigation

BPRH

This repository implements the model from Qiu, Huihuai, et al. "BPRH: Bayesian personalized ranking for heterogeneous implicit feedback" Information Sciences 453 (2018): 80-98.

Platform and Packages

The codes are programmed and tested on python 3.7.6. And they should also run on other versions of python.

bprH.py is the basic model wrapped in class for convenient usage. Packages below are required to run bprH.py

  • pickle
  • random
  • numpy==1.18.1
  • pandas==1.0.1
  • tqdm==4.42.1
  • livelossplot==0.5.1
  • scikit-learn==0.22.1

Since repeated vector and matrix manipulations are involved in BPRH model. bprH_gpu.py leverage the power of NVIDIA GPU for acceleration. Package CuPy is required to run bprH_gpu.py. You may check CuPy Installation Guide for installation help. The version we used is cupy-cuda101==7.3.0 and CUDA 10.1.

Sobazaar_cleaning.ipynb is the Jupyter Notebook that cleans the raw Sobazaar data "Sobazaar-hashID.csv.gz" located in data folder. You may unzip it manually before execute Sobazaar_cleaning.ipynb. Notice that we do not consider Like action and only View action will get processed in bprH_gpu.py and bprH.py.

BRPH_50_1000_0.00001_0.1_0.1.ipynb illustrate the usage and training process of BPRH on GPU.

Parameters Sensitivity Analysis

gamma lambda_u, lambda_v lambda_b P@5 P@10 R@5 R@10 AUC
0.1 0.00001 0.00001 0.014 0.011 0.061 0.091 0.857
0.1 0.00001 0.0001 0.014 0.011 0.062 0.094 0.858
0.1 0.00001 0.001 0.018 0.013 0.075 0.105 0.861
0.1 0.00001 0.01 0.033 0.021 0.146 0.175 0.866
0.1 0.00001 0.1 0.052 0.033 0.224 0.276 0.89
0.1 0.00001 1.0 0.052 0.034 0.22 0.285 0.902
0.1 0.0001 0.00001 0.014 0.011 0.06 0.092 0.86
0.1 0.0001 0.0001 0.015 0.011 0.064 0.091 0.856
0.1 0.0001 0.001 0.016 0.012 0.071 0.106 0.86
0.1 0.001 0.00001 0.013 0.01 0.054 0.087 0.858
0.1 0.001 0.0001 0.014 0.011 0.058 0.089 0.859
0.1 0.001 0.001 0.016 0.011 0.069 0.097 0.859

We set the number of iterations as 720,000 for the table above. is selected for a 5-folds cross validation on 600,000 iterations. Results are presented belows.

FOLD NUM P@5 P@10 R@5 R@10 AUC
0 0.047167488 0.031958128 0.190700122 0.252367529 0.877665012
1 0.048883666 0.032961222 0.203459515 0.270577332 0.888670492
2 0.050859514 0.033135744 0.213800679 0.270629746 0.881617605
3 0.050421179 0.032430806 0.20976547 0.262078064 0.881358235
4 0.047374702 0.032159905 0.194494702 0.262413134 0.888949442
AVG 0.04894131 0.032529161 0.202444098 0.263613161 0.883652157
STD 0.001693631 0.000506637 0.009807132 0.007549718 0.004962164

Implementation Detail

This section includes the implementation details unmetioned in Qiu, Huihuai, et al. "BPRH: Bayesian personalized ranking for heterogeneous implicit feedback" Information Sciences 453 (2018): 80-98.

  1. There are nine types of action in the original Sobazaar dataset. We group 'purchase:buy_clicked' as Purchase, 'content:interact:product_clicked', 'content:interact:product_detail_viewed', 'product_detail_clicked' as View, and 'content:interact:product_wanted', 'product_wanted' as Like. Then we can get 4712 users and 7015 items with 15208 purchases, 126846 views, and 96689 likes. This is aligned to Table 4 in Qiu, Huihuai, et al. "BPRH: Bayesian personalized ranking for heterogeneous implicit feedback" Information Sciences 453 (2018): 80-98.
  2. For auxiliary and target actions correlation, we only consider the case of View with Purchase. Hence, . What's more, on Sobazaar dataset, it is possible that , leading to the 0-devided-by-0 error when calculating . Therefore, we set in this case.
  3. For item-set coselection, when item is only purchased by one user, then according to the definition of is an empty set since . However, should contain item no matter what the size of is accroding to the paper. We fix this issue in our code.
  4. For item-set coselection involved in Algorithm 1 in BPRH paper, we think there are some typos. Taking Line 20 - 21 as an instance, to construct the item-set , first we randomly selection item , then should come from , not . So is the case of item-set . We fix this issue in bprH_gpu.py - Line 286, Line 301, Line 317.
  5. BPRH model does not consider user bias. So we add a all-ones-column at the last column in user matrix and set the last row of item matrix as item bias (bprH_gpu.py - Line 255). We utilize normal distribution with 0 expectation and 0.1 standard deviation to initialize user and item matrices.
  6. When constructing item-sets , we may come across some empty item-set because of random spliting train and test dataset. bprH_gpu.py - Line 363 address this issue. For example, when , the objective function of BPRH and corresponding gradients downgrade to COFISET model.
  7. When recommending items for users, a user might appear in test and not in train. In our implementation, we can choose to ignore this type of user, i.e. we do not recommend for this type of user. In another option, we use item popularity of target action learned from training data to make recommendations for this type of users. bprH_gpu.py - Line 520 solve this issue. What's more, we exclude user 's purchased items from user 's recommendation lists.
  8. User online updating scheme. User Updating Scheme

Mathematical Detail

For mathematical details, please visit my blogs.

Copyright

This repository is under MIT License. Please cite this repository if you use our codes.

About

Unofficial Implementation of BPRH: Bayesian personalized ranking for heterogeneous implicit feedback

Topics

Resources

License

Stars

Watchers

Forks