Skip to content

[20221114] Distillation Design

J-shang edited this page Nov 14, 2022 · 1 revision

Offline Distillation Dataloader Generation Utils

  • Preprocessor: A utility to generate offline distillation dataloader, add teacher predict result for each sample of data.

    • preprocess_labels: Pre-run the teacher predict function on the whole dataset and store the prediction result for replay and distil.
    • create_replay_dataloader: after preprocess_labels, preprocessor will on replay mode, create_replay_dataloader will return a dataloader similar to user designed dataloader, the only different is the iteration of it will return additional batched teacher prediction results, (batch,) -> (teacher_results, batch,)
  • _DistilStorage: Save and load the prediction result with keyword format {uid: result}. NNI has MemoryStorage, FileStorage, HDF5Storage(not ready), user can customize their own storage like SqliteStorage by inheriting _DistilStorage.

  • _UidDataset: A subclass of pytorch Dataset, it will wrap the original dataset. Assume the original dataset returns sample in __getitem__, then after wrapped by _UidDataset, it will return (uid, sample). NNi has IndexedDataset, HashedDataset, AugmentationDataset, user can customize their own uid generation way like FilePathDataset by inheriting _UidDataset.

  • uid: Users do not need to know what uid is, if he does not customize _UidDataset. The uid needs to have a one-to-one correspondence (or many-to-one) with the sample in the original dataset, and a one-to-one correspondence with the predicted results.

image