Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scriptable Tokenizer for Text Classification Example #1691

Merged
merged 10 commits into from
Jul 12, 2022

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Jun 17, 2022

…ptable tokenizer into a single artifact

Description

This example shows how to combine a model with a scriptable tokenizer and script the two together with TorchScript to achieve a single artifact.
The combination of model and tokenizer makes sure that the same tokenizer is used in training and inference.
Additionally, it decreases handler complexity and makes it easy to deploy these in the future C++ based backend.

Training the model depends on torchdata which is mentioned in the README. Inference does not require the package.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Test plan:

  • Execute commands in examples/text_classification_with_scriptable_tokenzier/README.md
  • As well as examples/text_classification/README.md (text_handler.py was slightly modified)

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthias, super glad to see this work in. Left a bunch of feedback. The important ones is we need some test, updates to the README to clarify benefits to users and a few more comments in the train.py

@mreso mreso changed the title Adds an example for combining a text_classification model with a scri… [WIP] Adds an example for combining a text_classification model with a scri… Jun 17, 2022
@mreso mreso changed the title [WIP] Adds an example for combining a text_classification model with a scri… Adds an example for combining a text_classification model with a scri… Jun 22, 2022
@mreso mreso requested a review from msaroufim June 22, 2022 20:04
@mreso
Copy link
Collaborator Author

mreso commented Jun 23, 2022

Just adding the logs for running the unit test manually:
image

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - although I think we should revert the changes to test/pytest and create_mar.sh let's fix those properly in another PR pending review in the CONTRIBUTING-dev.md spec I shared

@@ -16,7 +16,7 @@ function cleanup {
trap cleanup EXIT

# Download and Extract model's source code
sudo apt-get install zip unzip
sudo apt-get install -y zip unzip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed?

Copy link
Collaborator Author

@mreso mreso Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert the changes to test/pytest and create_mar.sh let's fix those properly in another PR

Sounds good, reverted the changes

@msaroufim msaroufim changed the title Adds an example for combining a text_classification model with a scri… Scriptable Tokenizer for Text Classification Example Jul 2, 2022
@lxning
Copy link
Collaborator

lxning commented Jul 5, 2022

@mreso could you run sanity and regression test to make sure the pytest you added is verified?

@mreso
Copy link
Collaborator Author

mreso commented Jul 6, 2022

Thank you @lxning! Here are the logs for the regression tests: https://gist.github.com/mreso/0bcaed895517f74a09cbd4e80621969c
Let me know if you need anything else!

@msaroufim msaroufim merged commit 42c3225 into pytorch:master Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants