-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify memory usage is not prohibitively high in the ONNX export #1012
Comments
Encountered another OOM issue for EleutherAI/gpt-neo-1.3B, even with 25GB of RAM on google colab. |
@xenova At which point do you OOM? I realize validation actually takes a fair bit amount of memory as the model outputs (and especially pkv) are written in memory, and may be large. For the export itself, it does not appear we do worth than vanilla from memory_profiler import memory_usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from optimum.exporters.onnx import main_export
##
def f():
model = AutoModelForCausalLM.from_pretrained("gpt2-large")
tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
tokenizer.add_special_tokens({'pad_token': 'a'})
fake_inp = tokenizer(["This is me", "This is you and me"], padding=True, return_tensors="pt")
fake_inp = {"input_ids": fake_inp["input_ids"], "attention_mask": fake_inp["attention_mask"]}
torch.onnx.export(
model,
(fake_inp,),
f="fake.onnx",
input_names=["input_ids", "attention_mask"],
output_names=["logits", "past_key_values"],
dynamic_axes={
"input_ids": {0: "batch_size"},
"attention_mask": {0: "batch_size"},
"logits": {0: "batch_size"},
"past_key_values": {0: "batch_size"},
}
)
##
def optimum_export():
main_export("gpt2-large", "gpt2_large_onnx", no_post_process=True, task="text-generation", do_validation=False)
##
mem_usage = memory_usage(optimum_export)
print('Memory usage (in chunks of .1 seconds): %s' % mem_usage)
print('Maximum memory usage: %s' % max(mem_usage)) |
Will test now 👍 It may be good enough to just skip validation. |
Can confirm OOM occurs during validation. Skipping validation seems to be a suitable workaround for now. Let me convert some larger models I've had problems with before (like whisper-large-v2; xenova/transformers.js#102) and I'll get back to you. |
Fixed in #1111. Kind of shameful we had this bug... |
Feature request
I have not checked whether the ONNX export does not e.g. triple the memory usage for decoder models. I believe it should not be the case, but it could be worth making sure we don't overly use RAM vs the vanilla
torch.onnx.export
, which would make the export of large models difficult.cc @xenova
Motivation
/
Your contribution
/
The text was updated successfully, but these errors were encountered: