Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add tools to convert distill ckpt to student-only ckpt. #381

Merged
merged 4 commits into from
Dec 8, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions tools/model_converters/convert_kd_ckpt_to_student.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Copyright (c) OpenMMLab. All rights reserved.
import argparse
from pathlib import Path

import torch


def parse_args():
parser = argparse.ArgumentParser(
description='Process a checkpoint to be published')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update the description

parser.add_argument('checkpoint', help='input checkpoint filename')
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--model-only is unnecessary.

There are five parts meta, state_dict, optimizer, message_hub, and param_schedulers in the checkpoint.
optimizer, message_hub, and param_schedulers are used for resume training.

When the student part is cut off, it can only be used for inference, deployment, or as a pre-training weight. Therefore, there is no need to consider the situation of resuming training, you can directly delete optimizer, message_hub, and param_schedulers.

meta is important !
There is a lot of meta info of the dataset in meta.
This information may be used in the visualization or inference phase.

This script can keep only meta and state_dict by default, regardless of other situations

'--model-only', action='store_true', help='only save model')
parser.add_argument(
'--inplace', action='store_true', help='replace origin ckpt')
args = parser.parse_args()
return args


def main():
args = parse_args()
checkpoint = torch.load(args.checkpoint, map_location='cpu')
new_state_dict = dict()

for key, value in checkpoint['state_dict'].items():
if key.startswith('architecture.'):
new_key = key.replace('architecture.', '')
new_state_dict[new_key] = value

if args.model_only:
checkpoint = dict()

checkpoint['state_dict'] = new_state_dict

if args.inplace:
torch.save(checkpoint, args.checkpoint)
else:
ckpt_path = Path(args.checkpoint)
ckpt_name = ckpt_path.stem
ckpt_dir = ckpt_path.parent
new_ckpt_path = ckpt_dir / f'{ckpt_name}_student.pth'
torch.save(checkpoint, new_ckpt_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an arg --out-path is better.
If --out-path is None, save the new checkpoint to args.checkpoint's dir, else, save to --out-path.



if __name__ == '__main__':
main()