Skip to content

kswoo97/ItemRAG

Repository files navigation

ItemRAG

This repository is an official implementation of ItemRAG, an item-based retrieval-augmented generation technique for LLM-based recommendation.

Paper information

  • Title: ItemRAG: Item-based Retrieval-Augmented Generation for LLM-based Recommendation.
  • Authos: Sunwoo Kim, Geon Lee, Kyungho Kim, Jaemin Yoo, and Kijung Shin.
  • Venue: SIGIR 2026 (short paper)

Datasets

In this work, we use the four datasets from https://amazon-reviews-2023.github.io/.

Name # Users # Items
Sports & Outdoors 25,363 15,701
Toys & Games 19,026 14,718
Beauty & Personal Care 45,490 31,151
Arts, Crafts & Sewing 24,511 18,884

The entire dataset, including (1) user-item interactions and (3) item titles, is presented in the link below:

Running code

ItemRAG consists of the four steps below:

  • Step 1: Find semantically similar items based on the textual similarity, which is implemented in step1_get_similar_title_items.py
  • Step 2: Retrieve relevant items, which is implemented in step2_do_retrieval.py
  • Step 3: Generate a summary of the relevant items, which is implemented in step3_generate_summary.py
  • Step 4: Perform final LLM-based recommendation, which is implemented in evaluation.py

The overall repository hierarchy is expected to be formed as follows:

/
 - dataset
   - /dataset/sports_outdoors_full.pickle
   - /dataset/toys_games_full.pickle
   - /dataset/beauty_care_full.pickle
   - /dataset/arts_full.pickle
 - step1_get_similar_title_items.py
 - step2_do_retrieval.py
 - step3_generate_summary.py
 - evaluation.py

All files include a term --dataset. This indicates the name of the target dataset. It should be one of:

  • Sports & Outdoors -> 'sports_outdoors'
  • Toys & Games -> 'toys_games'
  • Beauty & Personal Care -> 'beauty_care'
  • Arts, Crafts & Sewing -> 'arts'

We detail each Python file:

  • Step 1: Finding semantically similar items. '--device' indicates the backbone GPU device for a language model. '--K' indicates the number of semantically similar items for each.
python3 step1_get_similar_title_items.py --dataset sports_outdoors --device cuda:0 --K 5
  • Step 2: Retrieving relevant items. '--K' indicates the number of items to retrieve.
python3 step2_do_retrieval.py --dataset sports_outdoors --K 50
  • Step 3: Generating summary. Note that one should put a proper api key for GPT in the file!
python3 step3_generate_summary.py --dataset sports_outdoors
  • Step 4: Performing LLM-based recommendation.
python3 evaluation.py --dataset sports_outdoors

About

Published as a SIGIR 2026 short paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages