GraphCleaner

This repository contains the codes for paper 'GraphCleaner: Detecting Mislabelled Samples in Popular Graph Prediction Benchmarks'.

File Framework

Directories:

dataset: data directory
checkpoints: to save trained GNNs
gnn_results: to save the prediction results of GNNs
mislabel_results: to save the mislabel detection results
tensorboard_logs: to save the loss info for visualization
aum_data: to save internal results when training AUM
case_studies: to give the manual judgement files of case study and the two new variants of PubMed dataset

Code Files:

noise_generator.py: download dataset and add symmetric / asymmetric noises to a specific dataset
Utils.py: shared functions called by mislabel detection methods
run_GNNs_validation.py: called by mislabel detection methods to train GNNs
run_baseline/Confident_Learning/AUM/DY-bootstrap.py: code to run different mislabel detection methods
run_CL_negsamp.py: code to run our GraphCleaner
run_CL_negsamp_withoutCL.py: code to run no Cl version of our GraphCleaner
evaludate_different_methods.py: code to evaluate different mislabel detection methods

Usage

Step0: Preparation

pip install -r requirements.txt

Step1: Generate Noise

python noise_generator.py --dataset <the name of dataset> --noise_type <symmetric or asymmetric> --mislabel_rate <mislabel rate>

The dataset directory will be created, and datasets will be downloaded automatically. This step will generate noisy class map which will be saved under ./dataset/<dataset dir>/raw/ and ground truth mislabel transition matrices.

Step2: Run Mislabel Detection Methods

Run Our GraphCleaner

CUDA_VISIBLE_DEVICES=<gpu id> python run_CL_negsamp.py --model <base classifier name> --noise_type <symmetric or asymmetric> --mislabel_rate <mislabel_rate> --dataset <the name of dataset>

Run Other Mislabel Detection Methods

CUDA_VISIBLE_DEVICES=<gpu id> python run_<method name>.py --model <base classifier name> --noise_type <symmetric or asymmetric> --mislabel_rate <mislabel_rate> --dataset <the name of dataset>

The checkpoints, gnn_results, mislabel_results and tensorboard_logs directories will be created. This step will generate the leaned mislabel transition matrices.

Step3: Evaluation

Concatenate method names with '+' can evaluate several methods at the same time (ours means GraphCleaner):

python evaluate_different_methods.py --method <example: baseline+CL+AUM> --model <base classifier name> --noise_type <symmetric or asymmetric> --mislabel_rate <mislabel_rate> --dataset <the name of dataset>

Visualization

To see the loss curve when training GNNs:

tensorboard --logdir=tensorboard_logs --port=6006

then ssh to the server and open http://localhost:6006/ in local browser.

Case Studies

X_analysis.txt are manual judgement files for dataset X (X = cora / citeseer / pubmed / arxiv). pubmed_clean.pt is the new dataset PubMedCleaned, which removes all zero-labelled and multi-labelled nodes and fixes all detected mislabels. pubmed_multi.pt is the new dataset PubMedMulti, which keeps multi-labelled nodes by using binary label vectors, and fixes all detected mislabels.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
case_studies		case_studies
.gitignore		.gitignore
GNN_models.py		GNN_models.py
Utils.py		Utils.py
evaluate_different_methods.py		evaluate_different_methods.py
noise_generator.py		noise_generator.py
readme.md		readme.md
requirements.txt		requirements.txt
run_AUM.py		run_AUM.py
run_CL_negsamp.py		run_CL_negsamp.py
run_CL_negsamp_without_CL.py		run_CL_negsamp_without_CL.py
run_Confident_Learning.py		run_Confident_Learning.py
run_DY-bootstrap.py		run_DY-bootstrap.py
run_GNNs_validation.py		run_GNNs_validation.py
run_baseline.py		run_baseline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphCleaner

File Framework

Usage

Step0: Preparation

Step1: Generate Noise

Step2: Run Mislabel Detection Methods

Run Our GraphCleaner

Run Other Mislabel Detection Methods

Step3: Evaluation

Visualization

Case Studies

About

Releases

Packages

Languages

lywww/GraphCleaner

Folders and files

Latest commit

History

Repository files navigation

GraphCleaner

File Framework

Usage

Step0: Preparation

Step1: Generate Noise

Step2: Run Mislabel Detection Methods

Run Our GraphCleaner

Run Other Mislabel Detection Methods

Step3: Evaluation

Visualization

Case Studies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages