Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: onnxruntime output NAN value #13798

Closed
yangsp5 opened this issue Dec 1, 2022 · 2 comments
Closed

BUG: onnxruntime output NAN value #13798

yangsp5 opened this issue Dec 1, 2022 · 2 comments
Labels
ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.

Comments

@yangsp5
Copy link

yangsp5 commented Dec 1, 2022

Describe the issue

Bug Report

  • The model like this:
from transformers import BertConfig, BertModel
from transformers.models.bert.modeling_bert import BertPreTrainedModel

class Bert(BertPreTrainedModel):
    def __init__(self,
                 model_name_or_path='clue/roberta_chinese_clue_tiny',
                 add_pooling_layer=False,
                 emb_size=32):
        self.config = BertConfig.from_pretrained(model_name_or_path)
        super().__init__(self.config)

        self.bert = BertModel(self.config, add_pooling_layer=add_pooling_layer)
        self.emb_size = emb_size
        self.projection = nn.Linear(self.config.hidden_size, emb_size)
        self.activation = nn.Tanh()

    def forward(self,
                input_ids,
                token_type_ids=None,
                attention_mask=None):
        outputs = self.bert(
            input_ids=input_ids,
            token_type_ids=token_type_ids,
            attention_mask=attention_mask
        )
        last_hidden_state = outputs.last_hidden_state

        # ignore padding
        select_length = attention_mask.sum(axis=1).detach() 
        feature = last_hidden_state * attention_mask.unsqueeze(2)
        feature = feature.sum(axis=1) / select_length.unsqueeze(1)

        feature = self.projection(feature)
        feature = self.activation(feature)

        return feature

AND torch -> onnx like this:

model = BERT()

device = 'cuda'
model.to(device)
model.eval()

input_ids = torch.zeros(1, 32, dtype=torch.int64, device=device)
attention_mask = torch.zeros(1, 32, dtype=torch.int64, device=device)
token_type_ids = torch.zeros(1, 32, dtype=torch.int64, device=device)

torch.onnx.export(
    model,
    (input_ids, attention_mask, token_type_ids),
    input_names=['input_ids', 'attention_mask', 'token_type_ids'],
    output_names=['res'],
    f='./test.onnx',
    do_constant_folding=True,
    export_params=True,
    opset_version=11,
    dynamic_axes={
        'input_ids':      {0: 'batch_size', 1: 'seq_len'},
        'token_type_ids': {0: 'batch_size', 1: 'seq_len'},
        'attention_mask': {0: 'batch_size', 1: 'seq_len'},
        'feature':        {0: 'batch_size',}
    }
)

# Predict
worker= onnxruntime.InferenceSession('./text.onnx', providers=["CUDAExecutionProvider"])
tokenizer = BertTokenizer.from_pretrained('clue/roberta_chinese_clue_tiny')

tensor = tokenizer('test this', padding="max_length", truncation=True, max_length=32, return_tensors="pt", add_special_tokens=True)
inputs = {
    'input_ids': tensor['input_ids'].numpy().astype(np.int64),
    'token_type_ids': tensor['token_type_ids'].numpy().astype(np.int64),
    'attention_mask': tensor['attention_mask'].numpy().astype(np.int64)
}
outputs = worker.run(['feature'], inputs)

The outputs is NAN:

[array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan]], dtype=float32)]

How can I fix it ???

To reproduce

TODO

Urgency

No response

Platform

Linux

OS Version

Ubuntu18.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnx 1.12.0 onnxruntime-gpu 1.13.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

CUDA

Execution Provider Library Version

cuda 11.3

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Dec 1, 2022
@tianleiwu
Copy link
Contributor

tianleiwu commented Dec 1, 2022

@yangsp5,
You might notice that the attention_mask and token_type_ids is switched in exporting onnx:
``
forward(self, input_ids, token_type_ids=None, attention_mask=None):

input_names=['input_ids', 'attention_mask', 'token_type_ids'],
``
The order of input_names shall match exactly as those in forward function.

That causes zeros of token_type_ids passed to attention_mask, thus there is no valid word (since all words are masked) in input.

@yangsp5
Copy link
Author

yangsp5 commented Dec 1, 2022

thanks.
The order matters!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Projects
None yet
Development

No branches or pull requests

2 participants