-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TFSpeech2Text #15113
Add TFSpeech2Text #15113
Conversation
|
Tagging @sgugger as a core dev, @patil-suraj as a core dev + original creator of our Some pipeline tests are failing almost surely due to the changes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice addition, great work!
| @@ -756,6 +759,8 @@ def generate( | |||
| ) | |||
| # expand encoder_outputs | |||
| encoder_outputs = (tf.gather(encoder_outputs[0], expanded_batch_idxs, axis=0),) | |||
| if "attention_mask" in locals(): # vision models don't have this | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is a bit weird. "attention" mask is in the keyword arguments of this function so it will show up in the locals. Maybe check if it's not set to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, for some reason I didn't notice that it was in the keyword arguments (and thus the current version would fail) 👍 Checking against None now
| # past drops the `encoder_outputs` during the loop and it may be needed (encoder-decoder models) | ||
| if len(past) > 1 and encoder_outputs is not None: | ||
| past = (encoder_outputs, past) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's double check with @patrickvonplaten for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked this part to be less intrusive: the past variable is not touched, encoder_outputs is passed to prepare_inputs_for_generation a few lines below. This means that it will be ignored as part of kwargs in most models, and encoder-decoder models can now explicitly access it if they wish to do so (like TFSpeech2Text does).
This change also solves the assert comments below, in prepare_inputs_for_generation.
| self.num_heads = num_heads | ||
| self.dropout = tf.keras.layers.Dropout(dropout) | ||
| self.head_dim = embed_dim // num_heads | ||
| assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to fix this assert in the original model (no new assert in the modeling files normally, but this is a copied from).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied the PT check (that raises a ValueError) to both models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it turns out that many models used a copy of this 😅 Changed them all 👍
src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py
Outdated
Show resolved
Hide resolved
src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py
Outdated
Show resolved
Hide resolved
src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py
Outdated
Show resolved
Hide resolved
tests/test_modeling_tf_common.py
Outdated
| if model_class.__name__ in ["TFSpeech2TextModel", "TFSpeech2TextForConditionalGeneration"]: | ||
| inputs = { | ||
| "decoder_input_ids": tf.keras.Input( | ||
| batch_shape=(2, max_input), | ||
| name="decoder_input_ids", | ||
| dtype="int32", | ||
| ), | ||
| "input_ids": tf.keras.Input(batch_shape=(2, 32, max_input), name="input_ids", dtype="float32"), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another (more general) check we could use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By more general do you mean instead of hardcoding the possible names in model_class.__name__? We have TF_MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, but it doesn't cover TFSpeech2TextModel :(
| @@ -756,6 +759,8 @@ def generate( | |||
| ) | |||
| # expand encoder_outputs | |||
| encoder_outputs = (tf.gather(encoder_outputs[0], expanded_batch_idxs, axis=0),) | |||
| if attention_mask is not None: # vision models don't have this | |||
| attention_mask = tf.gather(attention_mask, expanded_batch_idxs, axis=0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only needed for the new model or was it a bug to not have done this previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about this as it could lead to some unexpected errors in the more complicated models like TFRag. Could we maybe run the following tests here to double check that this change is ok:
RUN_SLOW=1 pytest tests/test_modeling_tf_bart.py
RUN_SLOW=1 pytest tests/test_modeling_tf_t5.py
RUN_SLOW=1 pytest tests/test_modeling_tf_rag.py (not that you need some more dependencies to be installed here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spot on, it failed for those models.
I've reworked these lines to behave like pytorch's _expand_inputs_for_generation(), which is used in all modalities. I've tested the changes against bart, bert, t5, vit, and s2t -- let me know what you think, @patrickvonplaten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Could you please also run the slow TFRag tests - it's a bit annoying as it's a huge and slow test, but it's the model that is most prone to break if there are changes in tf generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm that it passes TFRag tests as well (had to spin up a larger machine 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Good for me then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general! Maybe just two final things before merging:
- Quickly run some slow tests to make sure we haven't broken
generateaccidently for some of the more complicated models - Should we maybe leave
pastfor now until we improve the naming of generate more globally?
…gface#15262) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix TF Causal LM models' returned logits * Fix expected shape in the tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
…nizerBase` `__init__` (huggingface#15454) * replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__` * add test * fix kwargs * reformat test * format * format * fix typo to render the documentation
…st version is available (huggingface#15319) * add new test * update test * remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py` * add `tokenizer_file` for the fast only tokenizer * change global variables layoutxml * remove `"tokenizer_file"` from DPR tokenizer's Global variables * remove `tokenizer_file` from herbert slow tokenizer init * `"tokenizer_file"` from LED tokenizer's Global variables * remove `tokenizer_file` from mbart slow tokenizer init * remove `tokenizer_file` from slow tokenizer template * adapt to versioning * adapt the `test_tokenizer_mismatch_warning` test * clean test * clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py * Revert "remove `tokenizer_file` from mbart slow tokenizer init" This reverts commit 0dbb723. * Revert "`"tokenizer_file"` from LED tokenizer's Global variables" This reverts commit 5a3f879. * Revert "remove `tokenizer_file` from herbert slow tokenizer init" This reverts commit f5e1000. * Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables" This reverts commit da08953. * set `tokenizer_file` in super `__init__` of mbart
|
Cloned your branch and did some experimentation and LGTM! I noticed one issue - |
| @@ -1131,8 +1119,9 @@ def _generate_beam_search( | |||
| done = [False for _ in range(batch_size)] | |||
|
|
|||
| while cur_len < max_length: | |||
| # import pdb; pdb.set_trace() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smh not using breakpoint() in the year 2022
| shape=(-1,), | ||
| ) | ||
| # prepares text-based inputs | ||
| if len(shape_list(input_ids)) == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly worried here about the TF vision tests. Can you check this one as well:
RUN_SLOW=1 pytest tests/test_modeling_tf_vision_encoder_decoder.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm that they pass 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final vision tests as mentioned before and then it's good for me as well
|
Pending the results of automated tests, this should be the last planned commit. We already have @patrickvonplaten -- One important final change that I'd like to ask for a double-check is the removal of the positional embeddings weights from the To check the changes, I've run: EDIT: TF models uploaded. |
| @@ -153,9 +153,9 @@ def make_weights(self, num_embeddings: int, embedding_dim: int, padding_idx: Opt | |||
| # in forward put the weights on the correct dtype and device of the param | |||
| emb_weights = emb_weights.to(dtype=self.weights.dtype, device=self.weights.device) | |||
|
|
|||
| self.weights = nn.Parameter(emb_weights) | |||
| self.weights = emb_weights | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj - is this change ok? Why exactly do we have to do this here actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can just save a tensor like this in PyTorch. When doing model.to(device) does this tensor correctly move to the device as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why this would cause an issue in TF<=>PT exactly - what was the issue? BTW, we always wrap the static sinusoidal encodings in a nn.Parameter() or nn.Embedding(...), E.g. Vit's positional embeddings, DistilBERT's position embeddings. Why should we do it differently here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe it's totally fine. What do you think @LysandreJik @sgugger ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the issue?
This is the only model that I could find that may change the size of the embedding layer in the forward call (here). The TF version for other embeddings allocates these named model parameters in build(), which is expected to happen before the first call(). Since nn.Parameter() is mostly a wrapper to add the variable to the model's list of parameters (AFAIK), then dropping it means we don't need a named parameter in TF, which means we can simplify the code and avoid solving the resizing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, but dropping it also removes it from the checkpoint, no? This might prevent proper loading/saving AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Facebook checkpoints do not have it and they load correctly in TF and PT regardless of this change. I'm assuming that, for the same reason, we exclude errors when loading a checkpoint in PT and this variable doesn't exist, as it is in _keys_to_ignore_on_load_missing (here).
The issue that prompted me to do this change was uncovered in the equivalence tests, where we try to load in TF a PT model that does not come from a checkpoint (here). We also have this var in _keys_to_ignore_on_save, but the test attempts to load a PT model in TF without saving it first -- i.e., having the var in nn.Parameter() breaks the test because it is not excluded, contrarily to the checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, someone who fine-tuned (or just saved and reshared) a Speech2Text model will have this weight in the checkpoint. I don't know how it's used exactly, so I don't know if it's breaking to ignore it (we would have to put it in the _keys_to_ignore_on_load_unexpected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(sorted on Slack -- conclusion: reverting this change since it is needed for the to method)
|
@patrickvonplaten reverted the previous change and added the embedding weights as named variables in TF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it!
* Add wrapper classes * convert inner layers to tf * Add TF Encoder and Decoder layers * TFSpeech2Text models * Loadable model * TF model with same outputs as PT model * test skeleton * correct tests and run the fixup * correct attention expansion * TFSpeech2Text pask_key_values with TF format
What does this PR do?
This PR adds a TF port of Speech2Text. A summary of the changes:
test_modelling_tf_common.pythat didn't quite fit this new kind of model.TODO:
from_pt=True