Data Distillation and Condensation (DDC) is a data-centric task where a representative (i.e., small but training-effective) batch of data is generated from the large dataset. Models trained on this small batch can obtain similar test performance compared to the models trained on the full dataset. Sometimes, the distilled images preserve certains aspects of semantics corresponding to the annotated objects in the full dataset, which are explainable to human users. A brief demostration of task is shown below:
DDC Basic Pipeline
Some recommended works in this domain include:
- (2018) Dataset Distillation [ArXiv] [Code] [Project]
- (CVPR 2022) Dataset Distillation by Matching Training Trajectories [ArXiv] [Code] [Project] [Workshop Version]
(ICML 2022 Oral) Privacy for Free: How does Dataset Condensation Help Privacy [ArXiv] [Poster]