Optimize T5 for sequence generation #2054

joecummings · 2023-02-11T00:34:30Z

This PR makes the following changes to T5 to improve generation capabilities.

Adds prepare_inputs_for_generation function to be compliant w/ GenerationWrapper API
Adds get_encoder and get_decoder helper functions.
Utilizes past_key_values to implement incremental decoding. This involves also a custom reorder cache function that can be used for beam search.
Updates docstrings.
Updates model weights to fit new architecture.
Remove T5Wrapper
Fix TorchScripting w/ new APIs

Testing:

Passes existing tests
Tests to come w/ generation integration tests (link)

Todo:

Add license to files
Move to main folder
Add model to README

…ration

mthrok

stamp

mthrok · 2023-02-17T15:58:41Z

test/integration_tests/prototype/test_models.py

@@ -79,7 +76,7 @@ def _t5_get_encoder(self, model, model_input, encoder_output):
        encoder = model.get_encoder()
        # Need to set the tgt_key_padding_mask to ensure the same results
        encoder_padding_mask = model_input.eq(model.padding_idx)
-        output_from_get_encoder = encoder(tgt=model_input, tgt_key_padding_mask=encoder_padding_mask)["encoder_output"]
+        output_from_get_encoder = encoder(model_input, src_key_padding_mask=encoder_padding_mask)["encoder_output"]


This change, is it using different set of existing arguments or changing the name of the arguments?
If changing the name of the arguments, that's BC-breaking unless it's prototype.

changing the name of arguments, but yes this is prototype until tomorrow :)

mthrok · 2023-02-17T15:59:36Z

torchtext/prototype/models/t5/model.py

@@ -56,13 +55,13 @@ def __post_init__(self):
                self.activation = "gelu_new"


-# NOTE: Comparable HuggingFace implentation can be found at https://github.com/huggingface/transformers/blob/8581a798c0a48fca07b29ce2ca2ef55adcae8c7e/src/transformers/models/t5/modeling_t5.py#L1269


Is this context no-longer applicable?

We include in the header that several functions are based on HF and I call it out in the docstring of those functions, as well. No need to say that there is a comparable HF implementation for ones that are just the normal Enc/Dec forward functions.

mthrok · 2023-02-17T16:01:20Z

torchtext/prototype/models/t5/model.py

+    @torch.jit.export
+    def _reorder_cache(
+        self, past: List[Tuple[Tensor, Tensor, Tensor, Tensor]], beam_idx: Tensor
+    ) -> List[Tuple[Tensor, Tensor, Tensor, Tensor]]:


would be nice if there is a comment/docstring of why and what, for the future developer.

mthrok · 2023-02-17T16:02:35Z

torchtext/prototype/models/t5/model.py

+        for layer_past_states in past:
+            # get the correct batch idx from layer past batch dim
+            # batch dim of `past` is at 2nd position
+            reordered_layer_past_states = ()


List would be semantically better, but is it for TorchScript compaibility?

mthrok · 2023-02-17T16:04:55Z

torchtext/prototype/models/t5/model.py

+    ) -> Dict[
+        str,
+        Union[
+            Tensor,
+            Dict[str, Union[Optional[Tensor], List[Tensor], List[Optional[Tensor]]]],
+            Optional[List[Tuple[Tensor, Tensor, Tensor, Tensor]]],
+            bool,
+        ],
+    ]:


This annotation is complex and it seems to be repeated. Can we define variable to store the annotation?

facebook-github-bot added the cla signed label Feb 11, 2023

joecummings force-pushed the optimize-t5 branch 3 times, most recently from daa19d7 to 73465d8 Compare February 17, 2023 02:10

joecummings changed the title ~~Optimize t5~~ Optimize T5 for sequence generation Feb 17, 2023

joecummings added 4 commits February 17, 2023 00:49

Separate encoding/decoding logic for T5 model in preparation for gene…

1beb8f2

…ration

Implement incremental decoding capabilities for T5

89ba59f

Remove T5Wrapper

2c3df0b

TorchScript changes

d0e866e

joecummings force-pushed the optimize-t5 branch from e807b7e to d0e866e Compare February 17, 2023 05:49

Linting fixes

94fac3a

joecummings marked this pull request as ready for review February 17, 2023 06:00

joecummings requested review from Nayef211, abhinavarora, rshraga and mthrok February 17, 2023 06:09

mthrok approved these changes Feb 17, 2023

View reviewed changes

joecummings added 5 commits February 17, 2023 11:48

Define reusable types

c127ed5

Add docstring to reorder cache

3a405c3

Update licenses

2b89577

Linting fixes (round 2)

720be38

Add license to t5_transform

10b0b77

joecummings merged commit 19f8bc9 into pytorch:main Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize T5 for sequence generation #2054

Optimize T5 for sequence generation #2054

joecummings commented Feb 11, 2023 •

edited

Loading

mthrok left a comment

mthrok Feb 17, 2023

joecummings Feb 17, 2023

mthrok Feb 17, 2023

joecummings Feb 17, 2023

mthrok Feb 17, 2023

mthrok Feb 17, 2023

joecummings Feb 17, 2023

mthrok Feb 17, 2023

		@@ -56,13 +55,13 @@ def __post_init__(self):
		self.activation = "gelu_new"


		# NOTE: Comparable HuggingFace implentation can be found at https://github.com/huggingface/transformers/blob/8581a798c0a48fca07b29ce2ca2ef55adcae8c7e/src/transformers/models/t5/modeling_t5.py#L1269

Optimize T5 for sequence generation #2054

Optimize T5 for sequence generation #2054

Conversation

joecummings commented Feb 11, 2023 • edited Loading

mthrok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joecummings commented Feb 11, 2023 •

edited

Loading