BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

chenyangyu1988 · 2019-11-22T01:36:37Z

Summary:
BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable
Over design:

PyText Tensorizer (for example: RoBERTaTensorizer) will delegate the numberize and tensorize logic to Scripted Tensorizer Implementation (for example: RoBERTaTensorizerImpl)

This requires to reimplement numberize() and tensorize() logic in Torchscriptable, but good news is that we already have such implementation in pytext/torchscript/tensorizer, we just need to make minor change.

On the PyText Tensorizer side, it will delegate numberize and tensorize logic to tensorizer_impl.

def numberize(self, row: Dict) -> Tuple[Any, ...]:
	per_sentence_tokens = [
            self.tokenizer.tokenize(row[column]) for column in self.columns
        ]
        return self.tensorizer_impl.numberize(per_sentence_tokens)

def tensorize(self, batch) -> Tuple[torch.Tensor, ...]:
	tokens, segment_labels, seq_lens, positions = zip(*batch)
        return self.tensorizer_impl.tensorize(
            tokens, segment_labels, seq_lens, positions
        )

Differential Revision: D18651538

facebook-github-bot · 2019-11-22T01:37:06Z

This pull request was exported from Phabricator. Differential Revision: D18651538

…criptable (facebookresearch#1163) Summary: Pull Request resolved: facebookresearch#1163 BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable Over design: PyText Tensorizer (for example: RoBERTaTensorizer) will delegate the numberize and tensorize logic to Scripted Tensorizer Implementation (for example: RoBERTaTensorizerImpl) This requires to reimplement numberize() and tensorize() logic in Torchscriptable, but good news is that we already have such implementation in pytext/torchscript/tensorizer, we just need to make minor change. On the PyText Tensorizer side, it will delegate numberize and tensorize logic to tensorizer_impl. ``` def numberize(self, row: Dict) -> Tuple[Any, ...]: per_sentence_tokens = [ self.tokenizer.tokenize(row[column]) for column in self.columns ] return self.tensorizer_impl.numberize(per_sentence_tokens) def tensorize(self, batch) -> Tuple[torch.Tensor, ...]: tokens, segment_labels, seq_lens, positions = zip(*batch) return self.tensorizer_impl.tensorize( tokens, segment_labels, seq_lens, positions ) ``` Reviewed By: rutyrinott Differential Revision: D18651538 fbshipit-source-id: fdb5bb099cd3a4894f90df460650398516177220

facebook-github-bot · 2019-12-13T01:05:10Z

This pull request was exported from Phabricator. Differential Revision: D18651538

facebook-github-bot · 2019-12-13T02:48:50Z

This pull request has been merged in 39467dc.

facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Nov 22, 2019

chenyangyu1988 force-pushed the export-D18651538 branch from 595e669 to 23ec49d Compare December 13, 2019 01:05

facebook-github-bot closed this in 39467dc Dec 13, 2019

facebook-github-bot added the Merged label Dec 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

chenyangyu1988 commented Nov 22, 2019

facebook-github-bot commented Nov 22, 2019

facebook-github-bot commented Dec 13, 2019

facebook-github-bot commented Dec 13, 2019

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

BERTTensorizerBaseImpl to reimplement BERTTensorizerBase to be TorchScriptable #1163

Conversation

chenyangyu1988 commented Nov 22, 2019

facebook-github-bot commented Nov 22, 2019

facebook-github-bot commented Dec 13, 2019

facebook-github-bot commented Dec 13, 2019