refactor ScriptTensorizer with general tensorize API #1117

chenyangyu1988 · 2019-11-08T04:50:32Z

Summary:
This diff introduced a general API for handling different inputs.

In most general case, we would expect inputs to be either

multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]]
multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]]

For single sentence classification task, we would expect inputs to be either

multiple rows, each row contains a single text ===> List[str]
multiple rows, each row contains a single pre-processed tokens ===> List[List[str]]

This refactoring provides two general API

def tensorize(
self,
texts_list: Optional[List[List[str]]] = None,
tokens_list: Optional[List[List[List[str]]]] = None,
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
def tensorize_single(
self,
texts_list: Optional[List[str]] = None,
tokens_list: Optional[List[List[str]]] = None,
):

And internally it will automate handle the passed inputs is texts or tokens

Differential Revision: D18386345

facebook-github-bot · 2019-11-08T04:50:49Z

This pull request was exported from Phabricator. Differential Revision: D18386345

…h#1117) Summary: Pull Request resolved: facebookresearch#1117 This diff introduced a general API for handling different inputs. In most general case, we would expect inputs to be either 1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]] 2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]] For single sentence classification task, we would expect inputs to be either 1) multiple rows, each row contains a single text ===> List[str] 2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]] This refactoring provides two general API 1) def tensorize( self, texts_list: Optional[List[List[str]]] = None, tokens_list: Optional[List[List[List[str]]]] = None, ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: 2) def tensorize_single( self, texts_list: Optional[List[str]] = None, tokens_list: Optional[List[List[str]]] = None, ): And internally it will automate handle the passed inputs is texts or tokens Differential Revision: D18386345 fbshipit-source-id: 90d19a7d8dad57d16f274a3b389445fa71c8d105

facebook-github-bot · 2019-11-08T22:29:12Z

This pull request was exported from Phabricator. Differential Revision: D18386345

…h#1117) Summary: Pull Request resolved: facebookresearch#1117 This diff introduced a general API for handling different inputs. In most general case, we would expect inputs to be either 1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]] 2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]] For single sentence classification task, we would expect inputs to be either 1) multiple rows, each row contains a single text ===> List[str] 2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]] This refactoring provides two general API 1) def tensorize( self, texts_list: Optional[List[List[str]]] = None, tokens_list: Optional[List[List[List[str]]]] = None, ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: 2) def tensorize_single( self, texts_list: Optional[List[str]] = None, tokens_list: Optional[List[List[str]]] = None, ): And internally it will automate handle the passed inputs is texts or tokens Differential Revision: D18386345 fbshipit-source-id: 7f767e3958d3053801137e454f15a1dd5ae37757

facebook-github-bot · 2019-11-08T22:30:34Z

This pull request was exported from Phabricator. Differential Revision: D18386345

…h#1117) Summary: Pull Request resolved: facebookresearch#1117 This diff introduced a general API for handling different inputs. In most general case, we would expect inputs to be either 1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]] 2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]] For single sentence classification task, we would expect inputs to be either 1) multiple rows, each row contains a single text ===> List[str] 2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]] This refactoring provides two general API 1) def tensorize( self, texts_list: Optional[List[List[str]]] = None, tokens_list: Optional[List[List[List[str]]]] = None, ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: 2) def tensorize_single( self, texts_list: Optional[List[str]] = None, tokens_list: Optional[List[List[str]]] = None, ): And internally it will automate handle the passed inputs is texts or tokens Differential Revision: D18386345 fbshipit-source-id: 0061b0968b908c1e7d08bc2f73759e1ebd74b9f4

facebook-github-bot · 2019-11-08T22:34:07Z

This pull request was exported from Phabricator. Differential Revision: D18386345

facebook-github-bot · 2019-11-09T00:45:14Z

This pull request has been merged in 5f5b164.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 8, 2019

chenyangyu1988 force-pushed the export-D18386345 branch from 3af3589 to 6a42620 Compare November 8, 2019 22:29

chenyangyu1988 force-pushed the export-D18386345 branch from 6a42620 to bdf049a Compare November 8, 2019 22:30

chenyangyu1988 force-pushed the export-D18386345 branch from bdf049a to 4e28314 Compare November 8, 2019 22:34

facebook-github-bot closed this in 5f5b164 Nov 9, 2019

facebook-github-bot added the Merged label Nov 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor ScriptTensorizer with general tensorize API #1117

refactor ScriptTensorizer with general tensorize API #1117

chenyangyu1988 commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 9, 2019

refactor ScriptTensorizer with general tensorize API #1117

refactor ScriptTensorizer with general tensorize API #1117

Conversation

chenyangyu1988 commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 8, 2019

facebook-github-bot commented Nov 9, 2019