We have provided the final results and all intermediate outcomes in json or pkl formats for verification. To save storage space, all results have been compressed into ZIP format.
-
Step 1: Guideline Generation
- Access raw generated explanations by OpenAI at
candidate_reasons/openai_reasons_cache/. - Constructed database is located at
reasons_cf_datasets/.
- Access raw generated explanations by OpenAI at
-
Step 2: Counterfactual Refinement
- Selected explanations are available at
reasons_cf_datasets/dataset_name/user2reasons.json.
- Selected explanations are available at
-
Step 3: Guided Reasoning
- Recommendation prompt is provided at
recommendation_prompt4LLM/with_reasons_outputs. - Recommendation results can be found at
recommendation_prompt4LLM/with_reasons_results.
- Recommendation prompt is provided at
cuml==24.4.0
torch==2.3.0
recbole==1.2.0
scipy==1.12.0
scikit-learn==1.4.1
sentence-transformers==2.7.0
transformers==4.40.2
vllm==0.4.2Please copy the scripts from the recbole/ directory to your Python RecBole package directory to replace the existing files.
recbole/
├── trainer/
│ └── trainer.py
├── data/
│ ├── dataloader/
│ │ └── general_dataloader.py
│ └── utils.py
├── sampler/
└── sampler.pyWe perform our experiments on the Movielens, Amazon_CDs_and_Vinyl and Amazon_Books datasets. Data is converted to the RecBole-specific format using their official script, available at RecBole Datasetes. Processed data and our subset creation scripts are located in the data/ folder.
This setup is designed for the MovieLens datasets. Please For the Amazon_CDs_and_Vinyl dataset, change the dataset variable to amazoncd at the beginning of each script for Amazon_CDs_and_Vinyl dataset and amazonbook for Amazon Books dataset. Check paths to ensure they are absolute to avoid bugs related to relative addressing.
Generate candidate explanations using the OpenAI API:
# get the prompts for explanations generation
python get_rs_reasons_prompt.py
# generate explanations with API
# Please note here it is also available to use openai batchAPI for less cost, which can reference candidate_reasons/openai_batch.ipynb
python candidate_reasons/openai.py
# Combine the prompt with result
python candidate_reasons/correct_id.py
# construct the database of candidate explanations
python candidate_reasons/dbscan_embedding_openai.pyor you can direct run bash script/explanation_generation.sh
Please note here it is also available to use openai batchAPI for less cost, which can reference candidate_reasons/openai_batch.ipynb
# Calculate item popularity
python calculate_popularity.py
# Train the debias model
bash rs_models/scripts/CF_model_train_movielens_pop.sh
bash rs_models/scripts/CF_model_train_amazoncd_pop.sh
bash rs_models/scripts/CF_model_train_amazonbook_pop.sh
# Retrieve selected explanations
python CF_model_test_debias.py --dataset ml-latest-small --debias_coeffiecient 5 --best_model_path your_model_save_path
python CF_model_test_debias.py --dataset Amazon_CDs_and_Vinyl_small --debias_coeffiecient 1 --best_model_path your_model_save_path
python CF_model_test_debias.py --dataset Amazon_Books_small --debias_coeffiecient 1 --best_model_path your_model_save_path# Generate recommendation prompts
python recommendation_prompt_generation.py --dataset ml-latest-small --topk_reasons 5 --debias_coffecient 5
python recommendation_prompt_generation.py --dataset Amazon_CDs_and_Vinyl_small --topk_reasons 1 --debias_coffecient 1
python recommendation_prompt_generation.py --dataset Amazon_Books_small --topk_reasons 10 --debias_coffecient 1
# Rank with LLMs
python recommendation_prompt4LLM/openai_recommendation.py
# Convert format for evaluation
python recommendation_prompt4LLM/combinations.py
# Evaluate
python llm_evaluation.pyor you can just run bash script/LLM_recommendation.sh
We also provide implementation details for some baseline models and guidance on using local LLMs with adjusted hyperparameters for proper output generation.
Note: The setup details for the baseline models have not yet been fully validated. Hyperparameter adjustments may be required, and some bugs may still need to be resolved.
# Step1
# get the prompts for explanations generation
python get_rs_reasons.py
python correct_id.py
# construct the database of candidate explanations
python candidate_reasons/dbscan_embedding.py
# Step2 is same
# Calculate item popularity
python calculate_popularity.py
# Train the debias model
bash rs_models/scripts/CF_model_train_movielens_pop.sh
bash rs_models/scripts/CF_model_train_amazoncd_pop.sh
# Retrieve selected explanations
python CF_model_test_debias.py
# Step 3
python LLMRS_prompt.py