Verify memory usage is not prohibitively high in the ONNX export #1012

fxmarty · 2023-04-24T09:23:12Z

Feature request

I have not checked whether the ONNX export does not e.g. triple the memory usage for decoder models. I believe it should not be the case, but it could be worth making sure we don't overly use RAM vs the vanilla torch.onnx.export, which would make the export of large models difficult.

cc @xenova

Motivation

/

Your contribution

/

The text was updated successfully, but these errors were encountered:

xenova · 2023-05-19T11:40:32Z

Encountered another OOM issue for EleutherAI/gpt-neo-1.3B, even with 25GB of RAM on google colab.

Here's the colab RAM graph (until it was killed):

fxmarty · 2023-05-31T07:52:27Z

@xenova At which point do you OOM? I realize validation actually takes a fair bit amount of memory as the model outputs (and especially pkv) are written in memory, and may be large. For the export itself, it does not appear we do worth than vanilla torch.onnx.export.

from memory_profiler import memory_usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

from optimum.exporters.onnx import main_export
##
def f():
    model = AutoModelForCausalLM.from_pretrained("gpt2-large")
    
    tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
    tokenizer.add_special_tokens({'pad_token': 'a'})
    
    fake_inp = tokenizer(["This is me", "This is you and me"], padding=True, return_tensors="pt")
    
    fake_inp = {"input_ids": fake_inp["input_ids"], "attention_mask": fake_inp["attention_mask"]}

    torch.onnx.export(
        model,
        (fake_inp,),
        f="fake.onnx",
        input_names=["input_ids", "attention_mask"],
        output_names=["logits", "past_key_values"],
        dynamic_axes={
            "input_ids": {0: "batch_size"},
            "attention_mask": {0: "batch_size"}, 
            "logits": {0: "batch_size"}, 
            "past_key_values": {0: "batch_size"}, 
        }
    )
##
def optimum_export():
    main_export("gpt2-large", "gpt2_large_onnx", no_post_process=True, task="text-generation", do_validation=False)

##

mem_usage = memory_usage(optimum_export)
print('Memory usage (in chunks of .1 seconds): %s' % mem_usage)
print('Maximum memory usage: %s' % max(mem_usage))

xenova · 2023-05-31T10:00:03Z

Will test now 👍 It may be good enough to just skip validation.

xenova · 2023-05-31T10:32:35Z

Can confirm OOM occurs during validation. Skipping validation seems to be a suitable workaround for now.

With validation:

Without validation:

Let me convert some larger models I've had problems with before (like whisper-large-v2; xenova/transformers.js#102) and I'll get back to you.

xenova · 2023-05-31T13:14:57Z

Success! https://huggingface.co/Xenova/whisper-large-v2/tree/main

fxmarty · 2023-06-15T06:29:23Z

Fixed in #1111. Kind of shameful we had this bug...

fxmarty added feature-request New feature or request onnx Related to the ONNX export labels Apr 24, 2023

This was referenced Apr 26, 2023

[Feature request] Is it possible to get support for NLLB models? xenova/transformers.js#101

Closed

How to convert Whisper Large v2 xenova/transformers.js#102

Closed

This was referenced May 20, 2023

convert llama-7B Failed to allocate memory for requested buffer of size 90177536 #1060

Closed

Support for transformers webmachinelearning/webnn#375

Open

fxmarty closed this as completed Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify memory usage is not prohibitively high in the ONNX export #1012

Verify memory usage is not prohibitively high in the ONNX export #1012

fxmarty commented Apr 24, 2023

xenova commented May 19, 2023 •

edited

Loading

fxmarty commented May 31, 2023

xenova commented May 31, 2023

xenova commented May 31, 2023

xenova commented May 31, 2023

fxmarty commented Jun 15, 2023

Verify memory usage is not prohibitively high in the ONNX export #1012

Verify memory usage is not prohibitively high in the ONNX export #1012

Comments

fxmarty commented Apr 24, 2023

Feature request

Motivation

Your contribution

xenova commented May 19, 2023 • edited Loading

fxmarty commented May 31, 2023

xenova commented May 31, 2023

xenova commented May 31, 2023

xenova commented May 31, 2023

fxmarty commented Jun 15, 2023

xenova commented May 19, 2023 •

edited

Loading