[20221114] Distillation Design

Offline Distillation Dataloader Generation Utils

Preprocessor: A utility to generate offline distillation dataloader, add teacher predict result for each sample of data.
- preprocess_labels: Pre-run the teacher predict function on the whole dataset and store the prediction result for replay and distil.
- create_replay_dataloader: after preprocess_labels, preprocessor will on replay mode, create_replay_dataloader will return a dataloader similar to user designed dataloader, the only different is the iteration of it will return additional batched teacher prediction results, (batch,) -> (teacher_results, batch,)
_DistilStorage: Save and load the prediction result with keyword format {uid: result}. NNI has MemoryStorage, FileStorage, HDF5Storage(not ready), user can customize their own storage like SqliteStorage by inheriting _DistilStorage.
_UidDataset: A subclass of pytorch Dataset, it will wrap the original dataset. Assume the original dataset returns sample in __getitem__, then after wrapped by _UidDataset, it will return (uid, sample). NNi has IndexedDataset, HashedDataset, AugmentationDataset, user can customize their own uid generation way like FilePathDataset by inheriting _UidDataset.
uid: Users do not need to know what uid is, if he does not customize _UidDataset. The uid needs to have a one-to-one correspondence (or many-to-one) with the sample in the original dataset, and a one-to-one correspondence with the predicted results.

This wiki is a journal that tracks the development of NNI. It's not guaranteed to be up-to-date. Read NNI documentation for latest information: https://nni.readthedocs.io/en/latest/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20221114] Distillation Design

Offline Distillation Dataloader Generation Utils

Clone this wiki locally