Script for distilling zero-shot classifier to more efficient student #10244

joeddav · 2021-02-17T20:12:36Z

This PR introduces a script that provides a way to improve the speed and memory performance of a zero-shot classifier by training a more efficient student model from the zero-shot teacher's predictions over an unlabeled dataset.

For a given sequence, the zero-shot classification pipeline requires each possible label to be fed through the large NLI model separately. This requirement slows results considerably, particularly for tasks with a large number of classes K.

Given (1) an unlabeled corpus and (2) a set of candidate class names, this script allows a user to train a standard classification head with K output dimensions. The script generates a softmax distribution for the provided data & class names, and a student classifier is then fine-tuned on these proxy labels. The resulting student model can be used for classifying novel text instances over these K classes with an order-of-magnitude boost in inference speed in addition to decreased memory usage.

A teacher NLI model can be distilled to a student model by running distill_classifier.py like so:

python distill_classifier.py \
--data_file unlabeled_data.txt \
--class_names_file class_names.txt \
--output_dir ./distilled_model

A number of other args are provided as well, such as --teacher_name_or_path and --student_name_or_path for specifying the pre-trained student & teacher models to be used (by default roberta-large-mnli and distillbert-base-uncased) and --hypothesis_template for customizing the hypothesis template used by the teacher zero-shot model. The training is implemented via Trainer, so any TrainingArguments can be specified as well.

The resulting model can then be used trivially in a text classification pipeline or in any other way:

model = AutoModelForSequenceClassification.from_pretrained("./distilled_model")
tokenizer = AutoTokenizer.from_pretrained("./distilled_model")
distilled_classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

See the included README.md for more details and examples.

Soon I'll introduce a similar script for self-training an NLI model, boosting the model's performance after training on only unlabeled data, which model can then be subsequently distilled with this script like any NLI model.

Update: I also just added a link to a working colab notebook demo.

plus tidying up more code

LysandreJik

Fantastic that you're using the Trainer for that. Pinging Sylvain for review.

examples/research_projects/zero-shot-distillation/README.md

examples/research_projects/zero-shot-distillation/distill_classifier.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

sgugger

Great new example! I'm not sure it does support distributed training after reading everything so the PR should either be amended to support it or clearly indicate in the README it does not support it (same for TPUs).

examples/research_projects/zero-shot-distillation/README.md

examples/research_projects/zero-shot-distillation/distill_classifier.py

joeddav · 2021-02-18T18:17:15Z

@LysandreJik cool thanks for the feedback.

@sgugger Thanks, I added fp16 for the teacher predictions. It will also now throw an error if someone tries to run it w/ distributed or TPUs and I added a note in the readme about that as well. It can do multi-gpu though and will do so automatically if multiple GPUs are available on the machine, it just can't do multi-node.

sgugger · 2021-02-18T19:28:45Z

Yes I meant distributed multi-GPU. I did see it will use all GPUs available on the machine however :-)

sgugger · 2021-02-18T19:29:43Z

examples/research_projects/zero-shot-distillation/distill_classifier.py

+    if training_args.local_rank != -1:
+        raise ValueError("Distributed training is not currently supported.")
+    if training_args.tpu_num_cores is not None:
+        raise ValueError("TPU acceleration is not currently supported.")


joeddav added Distillation Related to model distillation Examples Which is related to examples in general labels Feb 17, 2021

joeddav requested review from julien-c, clmnt, VictorSanh and LysandreJik February 17, 2021 20:12

joeddav added 8 commits February 17, 2021 20:34

add zero-shot distillation script

cdd138c

readme wordsmithing

66bbd04

clean up code

c089532

add multi-gpu teacher inference

1efeca0

plus tidying up more code

add use_fast_tokenizer arg

a02d62d

update results in readme

c81466a

more readme wordsmithing

ee83249

style

b8cf6de

joeddav force-pushed the zero-shot-distillation branch from e4cdc94 to b8cf6de Compare February 17, 2021 20:35

LysandreJik approved these changes Feb 18, 2021

View reviewed changes

examples/research_projects/zero-shot-distillation/README.md Outdated Show resolved Hide resolved

LysandreJik requested a review from sgugger February 18, 2021 11:37

LysandreJik reviewed Feb 18, 2021

View reviewed changes

examples/research_projects/zero-shot-distillation/distill_classifier.py Show resolved Hide resolved

joeddav and others added 2 commits February 18, 2021 10:45

Add handle to readme

abe94bc

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fix code block

78ccf32

sgugger approved these changes Feb 18, 2021

View reviewed changes

joeddav added 4 commits February 18, 2021 18:03

add error+docs about distributed & tpu

91fab00

add @sgugger format requests

7af28d2

xla -> tpu

d2fde51

support fp16 for teacher preds

18232dd

sgugger approved these changes Feb 18, 2021

View reviewed changes

joeddav added 2 commits February 18, 2021 20:28

no checkpoint by default

5a189c4

add demo colab link

c34c75f

joeddav added 2 commits February 18, 2021 20:35

add model sharing prompt + model link

e3f088b

correct resulting acc of example

536ef8d

joeddav merged commit c6fe175 into huggingface:master Feb 18, 2021

joeddav deleted the zero-shot-distillation branch February 18, 2021 22:08

joeddav mentioned this pull request Feb 19, 2021

Patch zero shot distillation script cuda issue #10284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script for distilling zero-shot classifier to more efficient student #10244

Script for distilling zero-shot classifier to more efficient student #10244

joeddav commented Feb 17, 2021 •

edited

LysandreJik left a comment

sgugger left a comment

joeddav commented Feb 18, 2021 •

edited

sgugger commented Feb 18, 2021

sgugger Feb 18, 2021

Script for distilling zero-shot classifier to more efficient student #10244

Script for distilling zero-shot classifier to more efficient student #10244

Conversation

joeddav commented Feb 17, 2021 • edited

LysandreJik left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

joeddav commented Feb 18, 2021 • edited

sgugger commented Feb 18, 2021

sgugger Feb 18, 2021

Choose a reason for hiding this comment

joeddav commented Feb 17, 2021 •

edited

joeddav commented Feb 18, 2021 •

edited