-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing slow pipeline tests #14260
Merged
Merged
Fixing slow pipeline tests #14260
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
e00a3d6
Fiixng slow pipeline tests
Narsil c7ead4a
Remove the image-segmentaiton override.
Narsil 443fcba
Fixing clamping only in training.
Narsil f307ef6
Wav2vec2.
Narsil dca3788
Remove last mention of `no_grad`.
Narsil 0809fe3
Fixing copies.
Narsil bbb9b27
Rename.
Narsil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,76 +93,74 @@ def __init__(self, args_parser=TableQuestionAnsweringArgumentHandler(), *args, * | |
) | ||
|
||
def batch_inference(self, **inputs): | ||
with torch.no_grad(): | ||
return self.model(**inputs) | ||
return self.model(**inputs) | ||
|
||
def sequential_inference(self, **inputs): | ||
""" | ||
Inference used for models that need to process sequences in a sequential fashion, like the SQA models which | ||
handle conversational query related to a table. | ||
""" | ||
with torch.no_grad(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! |
||
all_logits = [] | ||
all_aggregations = [] | ||
prev_answers = None | ||
batch_size = inputs["input_ids"].shape[0] | ||
|
||
input_ids = inputs["input_ids"].to(self.device) | ||
attention_mask = inputs["attention_mask"].to(self.device) | ||
token_type_ids = inputs["token_type_ids"].to(self.device) | ||
token_type_ids_example = None | ||
|
||
for index in range(batch_size): | ||
# If sequences have already been processed, the token type IDs will be created according to the previous | ||
# answer. | ||
if prev_answers is not None: | ||
prev_labels_example = token_type_ids_example[:, 3] # shape (seq_len,) | ||
model_labels = np.zeros_like(prev_labels_example.cpu().numpy()) # shape (seq_len,) | ||
|
||
token_type_ids_example = token_type_ids[index] # shape (seq_len, 7) | ||
for i in range(model_labels.shape[0]): | ||
segment_id = token_type_ids_example[:, 0].tolist()[i] | ||
col_id = token_type_ids_example[:, 1].tolist()[i] - 1 | ||
row_id = token_type_ids_example[:, 2].tolist()[i] - 1 | ||
|
||
if row_id >= 0 and col_id >= 0 and segment_id == 1: | ||
model_labels[i] = int(prev_answers[(col_id, row_id)]) | ||
|
||
token_type_ids_example[:, 3] = torch.from_numpy(model_labels).type(torch.long).to(self.device) | ||
|
||
input_ids_example = input_ids[index] | ||
attention_mask_example = attention_mask[index] # shape (seq_len,) | ||
all_logits = [] | ||
all_aggregations = [] | ||
prev_answers = None | ||
batch_size = inputs["input_ids"].shape[0] | ||
|
||
input_ids = inputs["input_ids"].to(self.device) | ||
attention_mask = inputs["attention_mask"].to(self.device) | ||
token_type_ids = inputs["token_type_ids"].to(self.device) | ||
token_type_ids_example = None | ||
|
||
for index in range(batch_size): | ||
# If sequences have already been processed, the token type IDs will be created according to the previous | ||
# answer. | ||
if prev_answers is not None: | ||
prev_labels_example = token_type_ids_example[:, 3] # shape (seq_len,) | ||
model_labels = np.zeros_like(prev_labels_example.cpu().numpy()) # shape (seq_len,) | ||
|
||
token_type_ids_example = token_type_ids[index] # shape (seq_len, 7) | ||
outputs = self.model( | ||
input_ids=input_ids_example.unsqueeze(0), | ||
attention_mask=attention_mask_example.unsqueeze(0), | ||
token_type_ids=token_type_ids_example.unsqueeze(0), | ||
) | ||
logits = outputs.logits | ||
for i in range(model_labels.shape[0]): | ||
segment_id = token_type_ids_example[:, 0].tolist()[i] | ||
col_id = token_type_ids_example[:, 1].tolist()[i] - 1 | ||
row_id = token_type_ids_example[:, 2].tolist()[i] - 1 | ||
|
||
if self.aggregate: | ||
all_aggregations.append(outputs.logits_aggregation) | ||
if row_id >= 0 and col_id >= 0 and segment_id == 1: | ||
model_labels[i] = int(prev_answers[(col_id, row_id)]) | ||
|
||
all_logits.append(logits) | ||
token_type_ids_example[:, 3] = torch.from_numpy(model_labels).type(torch.long).to(self.device) | ||
|
||
dist_per_token = torch.distributions.Bernoulli(logits=logits) | ||
probabilities = dist_per_token.probs * attention_mask_example.type(torch.float32).to( | ||
dist_per_token.probs.device | ||
) | ||
input_ids_example = input_ids[index] | ||
attention_mask_example = attention_mask[index] # shape (seq_len,) | ||
token_type_ids_example = token_type_ids[index] # shape (seq_len, 7) | ||
outputs = self.model( | ||
input_ids=input_ids_example.unsqueeze(0), | ||
attention_mask=attention_mask_example.unsqueeze(0), | ||
token_type_ids=token_type_ids_example.unsqueeze(0), | ||
) | ||
logits = outputs.logits | ||
|
||
coords_to_probs = collections.defaultdict(list) | ||
for i, p in enumerate(probabilities.squeeze().tolist()): | ||
segment_id = token_type_ids_example[:, 0].tolist()[i] | ||
col = token_type_ids_example[:, 1].tolist()[i] - 1 | ||
row = token_type_ids_example[:, 2].tolist()[i] - 1 | ||
if col >= 0 and row >= 0 and segment_id == 1: | ||
coords_to_probs[(col, row)].append(p) | ||
if self.aggregate: | ||
all_aggregations.append(outputs.logits_aggregation) | ||
|
||
all_logits.append(logits) | ||
|
||
dist_per_token = torch.distributions.Bernoulli(logits=logits) | ||
probabilities = dist_per_token.probs * attention_mask_example.type(torch.float32).to( | ||
dist_per_token.probs.device | ||
) | ||
|
||
coords_to_probs = collections.defaultdict(list) | ||
for i, p in enumerate(probabilities.squeeze().tolist()): | ||
segment_id = token_type_ids_example[:, 0].tolist()[i] | ||
col = token_type_ids_example[:, 1].tolist()[i] - 1 | ||
row = token_type_ids_example[:, 2].tolist()[i] - 1 | ||
if col >= 0 and row >= 0 and segment_id == 1: | ||
coords_to_probs[(col, row)].append(p) | ||
|
||
prev_answers = {key: np.array(coords_to_probs[key]).mean() > 0.5 for key in coords_to_probs} | ||
prev_answers = {key: np.array(coords_to_probs[key]).mean() > 0.5 for key in coords_to_probs} | ||
|
||
logits_batch = torch.cat(tuple(all_logits), 0) | ||
logits_batch = torch.cat(tuple(all_logits), 0) | ||
|
||
return (logits_batch,) if not self.aggregate else (logits_batch, torch.cat(tuple(all_aggregations), 0)) | ||
return (logits_batch,) if not self.aggregate else (logits_batch, torch.cat(tuple(all_aggregations), 0)) | ||
|
||
def __call__(self, *args, **kwargs): | ||
r""" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stas00 IS that ok to remove at inference time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory yes. In practice, it depends on how the model was pre-trained.
The model weights don't change during inference, so we don't need to keep things in check all the time.
However if the pre-trained model's weights lead to an overflow in a single iteration during training, as is the case with some mt5 models under mixed-precision then this can occur just as well during inference.
This is primarily an issue with pre-trained on bf16 models fine-tuned/inferenced on fp16 (mixed or non-mixed precision).
If a model was pretrained with fp16/mixed precision it's pretty sure the clamping won't be needed.
To give you a more intelligent answer it'd require running some tests with the actual DETR models and checking their activations magnitudes at the point you're asking about, which should be pretty trivial, using https://huggingface.co/transformers/debugging.html#underflow-and-overflow-detection, which can be plugged into HF Trainer and the examples with just a single cl arg
--debug underflow_overflow
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I think this code was just badly copy pasted, so I'm more in favor of disabling this hack for training (as it is done now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if everyone is favorable, then let's do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you must have meant for inference, right Patrick?