BUG: onnxruntime output NAN value #13798

yangsp5 · 2022-12-01T03:31:25Z

Describe the issue

Bug Report

The model like this:

from transformers import BertConfig, BertModel
from transformers.models.bert.modeling_bert import BertPreTrainedModel

class Bert(BertPreTrainedModel):
    def __init__(self,
                 model_name_or_path='clue/roberta_chinese_clue_tiny',
                 add_pooling_layer=False,
                 emb_size=32):
        self.config = BertConfig.from_pretrained(model_name_or_path)
        super().__init__(self.config)

        self.bert = BertModel(self.config, add_pooling_layer=add_pooling_layer)
        self.emb_size = emb_size
        self.projection = nn.Linear(self.config.hidden_size, emb_size)
        self.activation = nn.Tanh()

    def forward(self,
                input_ids,
                token_type_ids=None,
                attention_mask=None):
        outputs = self.bert(
            input_ids=input_ids,
            token_type_ids=token_type_ids,
            attention_mask=attention_mask
        )
        last_hidden_state = outputs.last_hidden_state

        # ignore padding
        select_length = attention_mask.sum(axis=1).detach() 
        feature = last_hidden_state * attention_mask.unsqueeze(2)
        feature = feature.sum(axis=1) / select_length.unsqueeze(1)

        feature = self.projection(feature)
        feature = self.activation(feature)

        return feature

AND torch -> onnx like this:

model = BERT()

device = 'cuda'
model.to(device)
model.eval()

input_ids = torch.zeros(1, 32, dtype=torch.int64, device=device)
attention_mask = torch.zeros(1, 32, dtype=torch.int64, device=device)
token_type_ids = torch.zeros(1, 32, dtype=torch.int64, device=device)

torch.onnx.export(
    model,
    (input_ids, attention_mask, token_type_ids),
    input_names=['input_ids', 'attention_mask', 'token_type_ids'],
    output_names=['res'],
    f='./test.onnx',
    do_constant_folding=True,
    export_params=True,
    opset_version=11,
    dynamic_axes={
        'input_ids':      {0: 'batch_size', 1: 'seq_len'},
        'token_type_ids': {0: 'batch_size', 1: 'seq_len'},
        'attention_mask': {0: 'batch_size', 1: 'seq_len'},
        'feature':        {0: 'batch_size',}
    }
)

# Predict
worker= onnxruntime.InferenceSession('./text.onnx', providers=["CUDAExecutionProvider"])
tokenizer = BertTokenizer.from_pretrained('clue/roberta_chinese_clue_tiny')

tensor = tokenizer('test this', padding="max_length", truncation=True, max_length=32, return_tensors="pt", add_special_tokens=True)
inputs = {
    'input_ids': tensor['input_ids'].numpy().astype(np.int64),
    'token_type_ids': tensor['token_type_ids'].numpy().astype(np.int64),
    'attention_mask': tensor['attention_mask'].numpy().astype(np.int64)
}
outputs = worker.run(['feature'], inputs)

The outputs is NAN:

[array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan]], dtype=float32)]

How can I fix it ???

To reproduce

TODO

Urgency

No response

Platform

Linux

OS Version

Ubuntu18.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnx 1.12.0 onnxruntime-gpu 1.13.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

CUDA

Execution Provider Library Version

cuda 11.3

The text was updated successfully, but these errors were encountered:

tianleiwu · 2022-12-01T05:34:27Z

@yangsp5,
You might notice that the attention_mask and token_type_ids is switched in exporting onnx:
``
forward(self, input_ids, token_type_ids=None, attention_mask=None):

input_names=['input_ids', 'attention_mask', 'token_type_ids'],
``
The order of input_names shall match exactly as those in forward function.

That causes zeros of token_type_ids passed to attention_mask, thus there is no valid word (since all words are masked) in input.

yangsp5 · 2022-12-01T07:02:45Z

thanks.
The order matters!!!!

github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Dec 1, 2022

yangsp5 closed this as completed Dec 1, 2022

goutamyg mentioned this issue Mar 12, 2024

Occasional NaN results during inference #19851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: onnxruntime output NAN value #13798

BUG: onnxruntime output NAN value #13798

yangsp5 commented Dec 1, 2022 •

edited

Loading

tianleiwu commented Dec 1, 2022 •

edited

Loading

yangsp5 commented Dec 1, 2022 •

edited

Loading

BUG: onnxruntime output NAN value #13798

BUG: onnxruntime output NAN value #13798

Comments

yangsp5 commented Dec 1, 2022 • edited Loading

Describe the issue

Bug Report

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

tianleiwu commented Dec 1, 2022 • edited Loading

yangsp5 commented Dec 1, 2022 • edited Loading

yangsp5 commented Dec 1, 2022 •

edited

Loading

tianleiwu commented Dec 1, 2022 •

edited

Loading

yangsp5 commented Dec 1, 2022 •

edited

Loading