Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)

by Dongmin Park¹, Seola Choi¹, Doyoung Kim¹, Hwanjun Song^{1, 2}, Jae-Gil Lee¹.

¹ KAIST, ² Amazon AWS AI

Sep 22, 2023: Our work is accepted at NeurIPS 2023.

Brief Summary

Prune4ReL is a new data pruning method for Re-labeling models (e.g., DivideMix & SOP+) showing state-of-the-art performance under label noise.
Inspired by a re-labeling theory, Prune4ReL finds the desired data subset by maximizing the total reduced neighborhood confidence, thereby maximizing re-labeling & generalization performance.
With a greedy approximation, Prune4ReL is efficient and scalable to large datasets including Clothing-1M & ImageNet-1K.
On four real noisy datasets (e.g., CIFAR-10/100N, WebVision, & Clothing-1M), Prune4Rel outperforms data pruning baselines with Re-labeling models by 9.1%, and those with a standard model by 21.6%.

How to run

Prune4ReL

Please follow Table 7 for hyperparameters. For CIFAR-10N dataset with SOP+ as Re-labeling model,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection Prune4Rel --save-log True \
          --metric cossim --uncertainty LeastConfidence --tau 0.975 --eta 1 --balance True

More detailed scripts for other datasets can be found in scripts/ folder.

Data Pruning Baselines: Uniform, SmallLoss, Margin, Forgetting, GraNd, Moderate, etc

Basically, the script is similar to that of Prune4ReL. For example,

python3 main_label_noise.py --gpu 0 --model 'PreActResNet18' --robust-learner 'SOP' -rc 0.9 -rb 0.1 \
          --dataset CIFAR10 --noise-type $noise_type --n-class 10 --lr-u 10 -se 10 --epochs 300 \
          --fraction $fraction --selection *$pruning_algorithm* --save-log True \

where *$pruning_algorithm* must be from [Uniform, SmallLoss, Uncertainty, Forgetting, GraNd, ...], each of which is a class name in deep_core/methods/~~.py.

Citation

@article{park2023robust,
  title={Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy},
  author={Park, Dongmin and Choi, Seola and Kim, Doyoung and Song, Hwanjun and Lee, Jae-Gil},
  journal={NeurIPS 2023},
  year={2023}
}

References

We thank the DeepCore library, on which we built most of our repo. Hope our project helps extend the open-source library of data pruning.

DeepCore library [code] : DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning, Guo et al. 2022.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
deepcore		deepcore
robustlearner		robustlearner
scripts		scripts
README.md		README.md
arguments.py		arguments.py
load_noise_data.py		load_noise_data.py
main_label_noise.py		main_label_noise.py
randaugment.py		randaugment.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepcore

deepcore

robustlearner

robustlearner

scripts

scripts

README.md

README.md

arguments.py

arguments.py

load_noise_data.py

load_noise_data.py

main_label_noise.py

main_label_noise.py

randaugment.py

randaugment.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)

Brief Summary

How to run

Prune4ReL

Data Pruning Baselines: Uniform, SmallLoss, Margin, Forgetting, GraNd, Moderate, etc

Citation

References

About

Releases

Packages

Languages

kaist-dmlab/Prune4Rel

Folders and files

Latest commit

History

Repository files navigation

Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023, PDF)

Brief Summary

How to run

Prune4ReL

Data Pruning Baselines: Uniform, SmallLoss, Margin, Forgetting, GraNd, Moderate, etc

Citation

References

About

Resources

Stars

Watchers

Forks

Languages