Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models
π Accepted by ACM SIGKDD 2026 Research Track
All required datasets are hosted on Hugging Face. You can download them here.
data/
βββ train_data/
β βββ raw_data/
β β βββ graphr1/
β β β βββ sft_train_data_filter2048_10000_balance.csv # GraphR1 SFT raw data (for SFT data generation)
β β βββ {arxiv,children,computer,history,photo,pubmed,reddit,sports,wn18rr}/
β β β βββ {dataset}_summary.jsonl # Node summaries (for RL data generation)
β β β βββ {dataset}_graph_bert_encoded.pt # Graph BERT encodings (for RL data generation)
β βββ rl_data/
β βββ test_1050.parquet # GraphR1 validation data (for RL validation data)
βββ test_data/
βββ raw_data/
βββ gofa_test_data/
βββ gofa_test_data_53114.csv # GOFA benchmark (for evaluation)
βββ gofa_supply_data_35603.csv # GOFA supplement benchmark (for evaluation)
outputs/synthetic_data/
βββ sft_data/
β βββ graph_sft_data.json # SFT training data (generated by sft_data_generator.py)
βββ rl_data/
βββ train_9799_graphr2_subgraph.parquet # RL training data (generated by rl_data_generator.py)
βββ test_600_graphr2_subgraph.parquet # RL validation data (generated by rl_data_generator.py)
data/train_data/raw_data/graphr1/: Source data for SFT prompt construction and teacher reasoning.data/train_data/raw_data/{dataset}/: Graph datasets with node summaries and BERT encodings, used for RL subgraph sampling (viapreprocess.py) and prompt construction.data/train_data/rl_data/test_1050.parquet: GraphR1 validation set, converted to SSR prompt format during RL data generation.data/test_data/: GOFA benchmark datasets for model evaluation.outputs/synthetic_data/: Generated training/validation data.
Some environment-specific configurations (e.g., cluster node hostnames, Docker image paths, volume mounts, model checkpoint paths, and vLLM service addresses) are provided as placeholders in the scripts and Python source files. Please search for and replace these placeholders to match your own environment before running.
Generate the SFT dataset by running the following script.
Data prerequisites: Download the GraphR1 SFT data and place it at data/train_data/raw_data/graphr1/sft_train_data_filter2048_10000_balance.csv.
Model prerequisites: Deploy the teacher model and diversity model via vLLM on your GPU nodes using ./scripts/deploy_api.sh. Update the node list, model path, and served model name in the script, then:
cd scripts
bash deploy_api.sh- Update
choose_nameandURL_LIST_DSin./utils/utils.pyfor the teacher model. - Update
choose_nameandURL_LIST_DVin./data_generation/quality_check/graph_diversity.pyfor the diversity model.
The script executes three steps automatically:
- Construct prompts: Extract graph information from the GraphR1 SFT data and build SSR prompts.
- Teacher reasoning + quality filtering: Use the teacher model to generate reasoning traces, compute structural diversity scores, and filter samples by answer correctness and diversity. This step loops until the correct ratio reaches 90%.
- Construct training data: Convert the filtered data into the final SFT training format.
cd ./data_generation/pipelines
python sft_data_generator.pyNote: We also provide pre-generated SFT data for convenience. Download and place it at outputs/synthetic_data/sft_data/graph_sft_data.json.
We utilize LlamaFactory for the SFT process. We recommend using the official Docker image.
Configuration:
- Update
model_name_or_pathandoutput_dirin./supervised_finetuning/full_sft.yaml. - Update the node list,
DATA_DIR, andYAML_FILEin./supervised_finetuning/run_sft_shell.sh.
Execution: Run the following script to launch distributed SFT training across the specified nodes:
cd ./supervised_finetuning
bash run_sft_shell.shGenerate the RL dataset by running the following script.
Data prerequisites:
- Download the graph datasets (arxiv, children, computer, history, photo, pubmed, reddit, sports, wn18rr) and place them under
data/train_data/raw_data/{dataset}/. Each dataset directory should contain{dataset}_summary.jsonland{dataset}_graph_bert_encoded.pt. - Generate subgraph data by running:
cd ./data_generation/pipelines
python preprocess.py- Download
test_1050.parquet(GraphR1 validation data) and place it atdata/train_data/rl_data/.
Model prerequisites: Deploy the SFT model via vLLM on your GPU nodes using ./scripts/deploy_api.sh. Update the node list, model path, and served model name in the script, then:
cd scripts
bash deploy_api.sh- Update
choose_nameandURL_LIST_RLin./data_generation/pipelines/rl_data_generator.pyfor the SFT model.
The script executes three steps automatically:
- Construct RL training data: Sample subgraphs from the graph datasets, construct SSR prompts, and use the SFT model to assess question difficulty (easy/medium/hard). This step loops until the required number of samples per difficulty level is collected.
- Convert training data: Convert the filtered JSONL data into Parquet format for RL training.
- Construct validation data: Transform the GraphR1 validation data into the SSR prompt format and convert to Parquet.
cd ./data_generation/pipelines
python rl_data_generator.pyNote: We also provide pre-generated RL data for convenience. Download and place them at outputs/synthetic_data/rl_data/.
We use verl v0.6.x to carry out the RL process. To align with our adaptive subgraph denoising paradigm, we modified several core components:
-
Training process:
reinforcement_learning\verl\trainer\config\ppo_trainer.yaml: Added parameters for the second-stage RL initiation.reinforcement_learning\verl\trainer\ppo\ray_trainer.py: Modifiedfitand_validatefunctions to customize the training/validation loop.
-
Actor rollout:
reinforcement_learning\verl\workers\rollout\vllm_rollout\vllm_rollout_spmd.py: Optimized thegenerate_sequencesfunction for multi-path reward calculation.
-
Reward function:
reinforcement_learning\verl\workers\reward_manager\naive.py: Customized the__call__function for graph-specific reward logic.reinforcement_learning\verl\utils\reward_score\__init__.py: Import the reward function that we have implemented.reinforcement_learning\verl\utils\reward_score\subgraph_size.py: Our core implementation of the reward function.
Execution: We recommend the verl Docker image. The RL training is split into two independent stages:
-
Stage 1: Authenticity-Reinforced RLVR (
in_second_stage=false): Trains the model to strictly follow the Sample-Select-Reason pipeline. The reward function$R_1$ uses nested logic to enforce subgraph authenticity ($\text{Status}{real}$), selection consistency ($\text{Status}{consist}$), and answer correctness ($\text{Status}_{ans}$ ). -
Stage 2: Denoising-Reinforced RLVR (
in_second_stage=true): Built upon Stage 1, this stage adds a structural parsimony reward to encourage selecting purer (smaller) subgraphs. The reward function$R_2$ extends$R_1$ with a size-based bonus for correct answers.
Configuration: Update the model path, data path, and output path in ./reinforcement_learning/run_rl_stage1.sh and ./reinforcement_learning/run_rl_stage2.sh to match your local settings.
Step 1: Start the Ray cluster
Update the following variables in ./scripts/start_ray_cluster.sh:
NODES: List of node hostnames (the first node serves as the Ray head)IMAGE_PATH: Path to the verl Docker image tar file
Then execute:
cd scripts
bash start_ray_cluster.shStep 2: Run Stage 1 training
Update the following parameters in ./reinforcement_learning/run_rl_stage1.sh:
data.train_files: Path to the training Parquet file (e.g.,outputs/synthetic_data/rl_data/train_*_graphr2_subgraph.parquet)data.val_files: Path to the validation Parquet file (e.g.,outputs/synthetic_data/rl_data/test_*_graphr2_subgraph.parquet)actor_rollout_ref.model.path: Path to the SFT model checkpointtrainer.default_local_dir: Output directory for Stage 1 checkpointstrainer.n_gpus_per_node: Number of GPUs per nodetrainer.nnodes: Number of nodes (must match the Ray cluster size)
SSH into the head node and execute the training script inside the Docker container:
ssh <HEAD_NODE>
docker exec -it verl bash
cd reinforcement_learning
bash run_rl_stage1.shStep 3: Run Stage 2 training
After Stage 1 completes, update the following parameters in ./reinforcement_learning/run_rl_stage2.sh:
actor_rollout_ref.model.path: Path to a Stage 1 checkpoint (e.g.,.../verl_grpo_graph_rl_stage1/global_step_*/actor/huggingface)data.train_filesanddata.val_files: Paths to the training and validation Parquet filestrainer.default_local_dir: Output directory for Stage 2 checkpointstrainer.n_gpus_per_nodeandtrainer.nnodes: Cluster configurationtrainer.second_stage_reward_lambda: Controls the denoising intensity (default: 0.1)
SSH into the head node and execute:
ssh <HEAD_NODE>
docker exec -it verl bash
cd reinforcement_learning
bash run_rl_stage2.shThe final trained model is available on Hugging Face.
Model prerequisites: Deploy the trained model via vLLM on your GPU nodes using ./scripts/deploy_api.sh. Update the node list, model path, and served model name in the script, then:
cd scripts
bash deploy_api.sh- Update
choose_nameandURL_LIST_EVALin./evaluation/evaluate_trained_model.pyfor the evaluation servers.
Execution: Run the evaluation script:
cd evaluation
bash run_evaluate.shIf you find this work helpful, please consider citing:
@article{li2026beyond,
title={Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models},
author={Li, Fengzhi and Zhang, Liang and Zuo, Yuan and Zhao, Ruiqing and Liu, YanSong and Ma, Yunfei and Meng, Fanyu and Feng, Junlan},
journal={arXiv preprint arXiv:2603.02938},
year={2026}
}