From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization
Cross-domain recommendation (CDR) is crucial for improving recommendation accuracy and generalization, yet traditional methods are often hindered by the reliance on shared user/item IDs, which are unavailable in most real-world scenarios. Consequently, many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps.Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges, including: 1. Item ID tokenization dilemma: which leads to vocabulary explosion and fails to capture high-order collaborative knowledge; 2. Insufficient domain-specific modeling: for the complex evolution of user interests and item semantics. To address these limitations, we propose GenCDR, a novel Generative Cross-Domain Recommendation framework. GenCDR first employs a Domain-adaptive Tokenization module, which generates disentangled semantic IDs for items by dynamically routing between a universal encoder and domain-specific adapters. Symmetrically, a Cross-domain Autoregressive Recommendation module models user preferences by fusing universal and domain-specific interests. Finally, a Domain-aware Prefix-tree enables efficient and accurate generation. Extensive experiments on multiple real-world datasets demonstrate that GenCDR significantly outperforms state-of-the-art baselines.
- Universal encoder for cross-domain knowledge
- Domain-specific adapters for specialized modeling
- Dynamic routing mechanism
- Fuses universal and domain-specific interests
- Handles user preference evolution
- Maintains semantic consistency
- Efficient generation strategy
- Accurate recommendation ranking
conda create --name gencdr python=3.10
conda activate gencdr
pip install -r requirements.txt
If using GPT-based APIs to assist tokenization or quantization, make sure to set your OpenAI API credentials.
Update quantization/rqvae_config.yaml:
openai_api_key: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
openai_base_url: https://
python quantization_rqvae.py --datasets Clothing_Shoes_and_Jewelry Sports_and_Outdoors
python quantization_adapter.py \
--config quantization/rqvae_config.yaml \
--domain Sports_and_Outdoors \
--general_model_path .pth
Please refer to LlamaFactory.
