Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

refactor ScriptTensorizer with general tensorize API #1117

Conversation

chenyangyu1988
Copy link
Contributor

Summary:
This diff introduced a general API for handling different inputs.

In most general case, we would expect inputs to be either

  1. multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]]
  2. multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]]

For single sentence classification task, we would expect inputs to be either

  1. multiple rows, each row contains a single text ===> List[str]
  2. multiple rows, each row contains a single pre-processed tokens ===> List[List[str]]

This refactoring provides two general API

  1. def tensorize(
    self,
    texts_list: Optional[List[List[str]]] = None,
    tokens_list: Optional[List[List[List[str]]]] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

  2. def tensorize_single(
    self,
    texts_list: Optional[List[str]] = None,
    tokens_list: Optional[List[List[str]]] = None,
    ):

And internally it will automate handle the passed inputs is texts or tokens

Differential Revision: D18386345

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 8, 2019
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18386345

chenyangyu1988 added a commit to chenyangyu1988/pytext that referenced this pull request Nov 8, 2019
…h#1117)

Summary:
Pull Request resolved: facebookresearch#1117

This diff introduced a general API for handling different inputs.

In most general case, we would expect inputs to be either
1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]]
2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]]

For single sentence classification task, we would expect inputs to be either
1) multiple rows, each row contains a single text ===> List[str]
2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]]

This refactoring provides two general API
1) def tensorize(
        self,
        texts_list: Optional[List[List[str]]] = None,
        tokens_list: Optional[List[List[List[str]]]] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

2) def tensorize_single(
        self,
        texts_list: Optional[List[str]] = None,
        tokens_list: Optional[List[List[str]]] = None,
    ):

And internally it will automate handle the passed inputs is texts or tokens

Differential Revision: D18386345

fbshipit-source-id: 90d19a7d8dad57d16f274a3b389445fa71c8d105
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18386345

chenyangyu1988 added a commit to chenyangyu1988/pytext that referenced this pull request Nov 8, 2019
…h#1117)

Summary:
Pull Request resolved: facebookresearch#1117

This diff introduced a general API for handling different inputs.

In most general case, we would expect inputs to be either
1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]]
2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]]

For single sentence classification task, we would expect inputs to be either
1) multiple rows, each row contains a single text ===> List[str]
2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]]

This refactoring provides two general API
1) def tensorize(
        self,
        texts_list: Optional[List[List[str]]] = None,
        tokens_list: Optional[List[List[List[str]]]] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

2) def tensorize_single(
        self,
        texts_list: Optional[List[str]] = None,
        tokens_list: Optional[List[List[str]]] = None,
    ):

And internally it will automate handle the passed inputs is texts or tokens

Differential Revision: D18386345

fbshipit-source-id: 7f767e3958d3053801137e454f15a1dd5ae37757
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18386345

…h#1117)

Summary:
Pull Request resolved: facebookresearch#1117

This diff introduced a general API for handling different inputs.

In most general case, we would expect inputs to be either
1) multiple rows, each row contains a list of text (in most case it is single sentence or a pair) ===> List[List[str]]
2) multiple rows, each row contains a list of pre-processes tokens (in most case it is single sentence or a pair) ===> List[List[List[str]]]

For single sentence classification task, we would expect inputs to be either
1) multiple rows, each row contains a single text ===> List[str]
2) multiple rows, each row contains a single pre-processed tokens ===> List[List[str]]

This refactoring provides two general API
1) def tensorize(
        self,
        texts_list: Optional[List[List[str]]] = None,
        tokens_list: Optional[List[List[List[str]]]] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

2) def tensorize_single(
        self,
        texts_list: Optional[List[str]] = None,
        tokens_list: Optional[List[List[str]]] = None,
    ):

And internally it will automate handle the passed inputs is texts or tokens

Differential Revision: D18386345

fbshipit-source-id: 0061b0968b908c1e7d08bc2f73759e1ebd74b9f4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D18386345

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 5f5b164.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants