Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

implement BertTensorizer and RoBERTaTensorizer in TorchScript #1053

Conversation

chenyangyu1988
Copy link
Contributor

Summary:
implement BertTensoriimplement BertTensorizer and RoBERTaTensorizer in TorchScript.
ScriptTensorizer have two APIs

  1. numberize: process a single line of input (single string for classification and a pair of string for pair classification), the output will be
    a list of token ids (e.g token index in the vocab)
  2. tensorize: process multiple of line of input, calling numberize and batch all the result together, generate the output tensor as the model input

Differential Revision: D17941983

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 15, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17941983

chenyangyu1988 added a commit to chenyangyu1988/pytext that referenced this pull request Oct 18, 2019
…okresearch#1053)

Summary:
Pull Request resolved: facebookresearch#1053

implement BertTensoriimplement BertTensorizer and RoBERTaTensorizer in TorchScript.
ScriptTensorizer have two APIs
1. numberize: process a single line of input (single string for classification and a pair of string for pair classification), the output will be
a list of token ids (e.g token index in the vocab)
2. tensorize: process multiple of line of input, calling numberize and batch all the result together, generate the output tensor as the model input

Reviewed By: hudeven

Differential Revision: D17941983

fbshipit-source-id: dbf2619bcbd25e3747d1f07fdd7b83d1d18f6ded
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17941983

…okresearch#1053)

Summary:
Pull Request resolved: facebookresearch#1053

implement BertTensoriimplement BertTensorizer and RoBERTaTensorizer in TorchScript.
ScriptTensorizer have two APIs
1. numberize: process a single line of input (single string for classification and a pair of string for pair classification), the output will be
a list of token ids (e.g token index in the vocab)
2. tensorize: process multiple of line of input, calling numberize and batch all the result together, generate the output tensor as the model input

Reviewed By: hudeven

Differential Revision: D17941983

fbshipit-source-id: 9ade19e4af0bf78a3efc3dee8a6d382406216feb
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D17941983

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a0a0fc4.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants