ItemRAG

This repository is an official implementation of ItemRAG, an item-based retrieval-augmented generation technique for LLM-based recommendation.

Paper information

Title: ItemRAG: Item-based Retrieval-Augmented Generation for LLM-based Recommendation.
Authos: Sunwoo Kim, Geon Lee, Kyungho Kim, Jaemin Yoo, and Kijung Shin.
Venue: SIGIR 2026 (short paper)

Datasets

In this work, we use the four datasets from https://amazon-reviews-2023.github.io/.

Name	# Users	# Items
Sports & Outdoors	25,363	15,701
Toys & Games	19,026	14,718
Beauty & Personal Care	45,490	31,151
Arts, Crafts & Sewing	24,511	18,884

The entire dataset, including (1) user-item interactions and (3) item titles, is presented in the link below:

Link: https://www.dropbox.com/scl/fo/olpz13hyfcdn5jg6a4tzy/AN1Na5w_ySyO4nnYFyYnRpE?rlkey=5i6iyloeq9fpa48tz29ztmaqj&st=ibnulph8&dl=0
Description: Refer to the README.txt file within the link.

Running code

ItemRAG consists of the four steps below:

Step 1: Find semantically similar items based on the textual similarity, which is implemented in step1_get_similar_title_items.py
Step 2: Retrieve relevant items, which is implemented in step2_do_retrieval.py
Step 3: Generate a summary of the relevant items, which is implemented in step3_generate_summary.py
Step 4: Perform final LLM-based recommendation, which is implemented in evaluation.py

The overall repository hierarchy is expected to be formed as follows:

/
 - dataset
   - /dataset/sports_outdoors_full.pickle
   - /dataset/toys_games_full.pickle
   - /dataset/beauty_care_full.pickle
   - /dataset/arts_full.pickle
 - step1_get_similar_title_items.py
 - step2_do_retrieval.py
 - step3_generate_summary.py
 - evaluation.py

All files include a term --dataset. This indicates the name of the target dataset. It should be one of:

Sports & Outdoors -> 'sports_outdoors'
Toys & Games -> 'toys_games'
Beauty & Personal Care -> 'beauty_care'
Arts, Crafts & Sewing -> 'arts'

We detail each Python file:

Step 1: Finding semantically similar items. '--device' indicates the backbone GPU device for a language model. '--K' indicates the number of semantically similar items for each.

python3 step1_get_similar_title_items.py --dataset sports_outdoors --device cuda:0 --K 5

Step 2: Retrieving relevant items. '--K' indicates the number of items to retrieve.

python3 step2_do_retrieval.py --dataset sports_outdoors --K 50

Step 3: Generating summary. Note that one should put a proper api key for GPT in the file!

python3 step3_generate_summary.py --dataset sports_outdoors

Step 4: Performing LLM-based recommendation.

python3 evaluation.py --dataset sports_outdoors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ItemRAG

Paper information

Datasets

Running code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dataset		dataset
ItemRAG_Supplementary_Materials.pdf		ItemRAG_Supplementary_Materials.pdf
README.md		README.md
evaluation.py		evaluation.py
step1_get_similar_title_items.py		step1_get_similar_title_items.py
step2_do_retrieval.py		step2_do_retrieval.py
step3_generate_summary.py		step3_generate_summary.py

Folders and files

Latest commit

History

Repository files navigation

ItemRAG

Paper information

Datasets

Running code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages