Saving external data for large ONNX models #255

NouamaneTazi · 2022-07-01T17:12:37Z

What does this PR do?

Fixes #254 and #377

We can now load and save ORT models that have external data 🚀

HuggingFaceDocBuilderDev · 2022-07-01T17:15:42Z

The documentation is not available anymore as the PR was closed or merged.

optimum/onnxruntime/modeling_ort.py

NouamaneTazi · 2022-07-02T15:44:35Z

With the latest commit, we're now able to do:

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache",
    onnx_cache_dir="./onnx_cache",  # saves ONNX model with external data if large model to "./onnx_cache"
)

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache"  # like previous behaviour where `onnx_cache_dir`= `cache_dir`
)

And model.save_pretrained(save_path) would just copy files from onnx_cache_dir to the provided save_path

…tory

NouamaneTazi · 2022-07-02T18:48:52Z

The following should be working now

# load small ONNX model
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-small-testing-onnx", use_auth_token=True)
# load large ONNX model (>2GB) by specifying folder containing model's weights
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-350m-onnx-folder", use_auth_token=True, onnx_folder="onnx")

Example of uploading a large ONNX model (>2GB) to the hub

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import shutil
from huggingface_hub import HfApi

model_ckpt = "bigscience/bloom-350m"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    onnx_cache_dir="./onnx_cache",  # saves ONNX model to "./onnx_cache"
)

# save to local folder
model.save_pretrained(save_path / "onnx")
shutil.move(save_path / "onnx" / "config.json", save_path / "config.json")
tokenizer.save_pretrained(save_path)

# push to hub
repo_id = "nouamanetazi/bloom-350m-onnx-folder-test"
api = HfApi()
api.create_repo(repo_id=repo_id, exist_ok=True)
api.upload_folder(folder_path=save_path, repo_id=repo_id, path_in_repo=".", repo_type="model")

philschmid

Can you make sure tests are passing and style is correct?

optimum/onnxruntime/modeling_ort.py

…/NouamaneTazi/255

NouamaneTazi · 2022-12-07T13:47:18Z

We can now save/load large ORTModelForSeq2SeqLM

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForSeq2SeqLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
)

# # save to local folder
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)

optimum/onnxruntime/modeling_decoder.py

optimum/onnxruntime/utils.py

fxmarty · 2022-12-07T14:25:41Z

Awesome! I think it would be great to add tests, essentially that saving / reloading works well, in the encoder-only/encoder-decoder cases.

optimum/onnxruntime/utils.py

michaelbenayoun · 2022-12-07T14:38:14Z

optimum/onnxruntime/utils.py

+    import onnx
+    from onnx.external_data_helper import ExternalDataInfo, _get_initializer_tensors
+
+    model_paths = src_file_names.copy()


It's a list of Paths so I don't think it copies anything here, we might as well start from a new empty list, and fill it.

I checked without the copy() and the extending does modify the list src_file_names unfortunately :/

Yes, my point is that you can create an empty list

model_paths = []

And fill it as you go?

My point here is that src_files_names[0] will be the same instance of model_paths[0].

I'm sorry I don't quite get it. I only use model_paths to iterate over the inputted src_files_names. Then I keep on filling src_files_names

model_paths = [] for model_path in model_paths: # load model graph model = onnx.load(str(model_path), load_external_data=False) # filter out tensors that are not external data model_tensors = _get_initializer_tensors(model) model_tensors_ext = [ ExternalDataInfo(tensor).location for tensor in model_tensors if tensor.HasField("data_location") and tensor.data_location == onnx.TensorProto.EXTERNAL ] src_paths.extend([model_path.parent / tensor_name for tensor_name in model_tensors_ext]) dst_file_names.extend(model_tensors_ext) return src_paths, dst_file_names

So this shouldn't work

optimum/onnxruntime/utils.py

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

…/NouamaneTazi/255

NouamaneTazi · 2022-12-07T15:49:28Z

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit.
Will try to add tests once I have time @fxmarty

mht-sharma · 2022-12-08T07:34:11Z

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit.
Will try to add tests once I have time @fxmarty

You could use the following api to convert a small model to external data. converting-an-onnx-model-to-external-data

The size threshold has to be low so that it can create the files.

mht-sharma · 2022-12-08T07:54:57Z

Hi @NouamaneTazi, thanks for the PR, it would require a small change to handle one more use-case for the modeling_seq2seq and modeling_decoder.

Taking an example of Seq2Seq class, it generates 3 different models, encoder.onnx, decoder.onnx, decoder_with_past.onnx. These are generated in the same folder right now. In case there are external files, there is a chance of an overwrite if they have the same names. See: 26983

Possible fix is to save them in folders like, encoder/encoder.onnx, decoder/decoder.onnx etc. The same change would be required in the exporters.

fxmarty · 2022-12-08T09:20:48Z

We should probably do the same in exporters actually

…/NouamaneTazi/255

NouamaneTazi · 2022-12-12T15:24:36Z

I'm trying to write tests for saving/loading with external data, but it's not as trivial as it seems.
Trying to apply your suggestion @mht-sharma by using :

            model = ORTModelForSeq2SeqLM.from_pretrained(self.ONNX_SEQ2SEQ_MODEL_ID, use_cache=True)
            model.save_pretrained(tmpdirname)

            # load model proto
            onnx_model = onnx.load(str(model.model_path)) 

            # save external data
            os.makedirs(str(model.model_path.parent / "external_data"), exist_ok=True)
            onnx.save_model(onnx_model, str(model.model_path.parent / "external_data" / "model.onnx"), save_as_external_data=True, all_tensors_to_one_file=False, size_threshold=8, convert_attribute=False)

            # need to do this for encoder/decoder/decoder_with_past

But again this wouldn't test our model.save_pretrained API at all. Because in our API the saving to onnx is done using torch.onnx.export here which doesn't accept an argument to specify external data format.

I'm open for suggestions, or else we can merge this for now

michaelbenayoun

LGTM!

optimum/onnxruntime/utils.py

fxmarty · 2022-12-12T16:20:23Z

@NouamaneTazi Why not use actual >2GB models, initialized and saved random from transformers (so no download time)? So no need of custom logic.

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

NouamaneTazi · 2022-12-12T23:09:39Z

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

fxmarty · 2022-12-13T08:28:06Z

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

You can do save_pretrained() on the PretrainedModel, and then from_pretrained then from a local folder using ORTModel.

PoodleWang · 2022-12-15T01:15:00Z

@NouamaneTazi
from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = ORTModelForSeq2SeqLM.from_pretrained(
model_ckpt,
from_transformers=True,
)
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)

Log:
<Trial 2015437 worker_0> genius $ python3 /opt/tiger/genius/tensorrt/load.py
2022-12-15 09:13:01.539465: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:912: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:100: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min))
Traceback (most recent call last):
File "/opt/tiger/genius/tensorrt/load.py", line 55, in
from_transformers=True,
File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_ort.py", line 280, in from_pretrained
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/optimum/modeling_base.py", line 263, in from_pretrained
**model_kwargs,
File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py", line 597, in _from_transformers
output=save_dir.joinpath(ONNX_DECODER_NAME),
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 353, in export
return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 204, in export_pytorch
raise err
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 189, in export_pytorch
opset_version=opset,
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/init.py", line 280, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
use_external_data_format=use_external_data_format)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 706, in _export
val_add_node_names, val_use_external_data_format, model_file_location)
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

** Environment
optimum-1.5.1

PoodleWang · 2022-12-15T03:16:45Z

I have similar issue mentioned here: #589 (comment)

NouamaneTazi · 2022-12-16T16:02:30Z

Migrated this PR to #586

attempt at fixing saving onnx model with external data

031c70e

styling

6fec76f

philschmid reviewed Jul 1, 2022

View reviewed changes

optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved

NouamaneTazi added 2 commits July 2, 2022 14:03

fix: cache_dir wasn't used when loading from transformers

7c59c2e

separate onnx_cache_dir argument from model's cache_dir

387fe6c

we can now load large ONNX models by specifying external's data direc…

cce8e90

…tory

NouamaneTazi requested a review from philschmid July 4, 2022 08:52

philschmid requested review from mfuntowicz and echarlaix July 4, 2022 13:15

philschmid requested changes Jul 4, 2022

View reviewed changes

optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved

mht-sharma mentioned this pull request Dec 1, 2022

MBart model cannot be loaded #377

Closed

4 tasks

NouamaneTazi added 2 commits December 7, 2022 11:39

Merge branch 'main' of https://github.com/huggingface/optimum into pr…

5902565

…/NouamaneTazi/255

Fix saving external data for large models (seq2seq)

845f788

NouamaneTazi added 2 commits December 7, 2022 15:10

fix saving external data for all ORT models

1f2687f

make style

cd2babd

NouamaneTazi requested review from philschmid, michaelbenayoun, fxmarty and mht-sharma December 7, 2022 14:13

fxmarty reviewed Dec 7, 2022

View reviewed changes

optimum/onnxruntime/modeling_decoder.py Outdated Show resolved Hide resolved

fxmarty reviewed Dec 7, 2022

View reviewed changes

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved

fxmarty reviewed Dec 7, 2022

View reviewed changes

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved

michaelbenayoun reviewed Dec 7, 2022

View reviewed changes

NouamaneTazi and others added 2 commits December 7, 2022 16:18

typing

16d2e24

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

apply suggestions

0a5a52c

NouamaneTazi added 2 commits December 7, 2022 16:30

Merge branch 'main' of https://github.com/huggingface/optimum into pr…

b3f7bbb

…/NouamaneTazi/255

make style

2ec8009

NouamaneTazi requested review from fxmarty and michaelbenayoun December 7, 2022 15:49

Merge branch 'main' of https://github.com/huggingface/optimum into pr…

2c1e907

…/NouamaneTazi/255

michaelbenayoun approved these changes Dec 12, 2022

View reviewed changes

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved

apply suggestion

af664f8

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

NouamaneTazi mentioned this pull request Dec 13, 2022

Handling ONNX models with external data #586

Merged

5 tasks

JingyaHuang mentioned this pull request Dec 14, 2022

Codegen model fails #589

Closed

4 tasks

NouamaneTazi closed this Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving external data for large ONNX models #255

Saving external data for large ONNX models #255

NouamaneTazi commented Jul 1, 2022 •

edited

HuggingFaceDocBuilderDev commented Jul 1, 2022 •

edited

NouamaneTazi commented Jul 2, 2022 •

edited

NouamaneTazi commented Jul 2, 2022 •

edited

philschmid left a comment

NouamaneTazi commented Dec 7, 2022

fxmarty commented Dec 7, 2022

michaelbenayoun Dec 7, 2022

NouamaneTazi Dec 7, 2022

michaelbenayoun Dec 7, 2022

NouamaneTazi Dec 7, 2022

NouamaneTazi commented Dec 7, 2022

mht-sharma commented Dec 8, 2022

mht-sharma commented Dec 8, 2022 •

edited

fxmarty commented Dec 8, 2022

NouamaneTazi commented Dec 12, 2022

michaelbenayoun left a comment

fxmarty commented Dec 12, 2022 •

edited

NouamaneTazi commented Dec 12, 2022

fxmarty commented Dec 13, 2022

PoodleWang commented Dec 15, 2022

PoodleWang commented Dec 15, 2022

NouamaneTazi commented Dec 16, 2022

Saving external data for large ONNX models #255

Saving external data for large ONNX models #255

Conversation

NouamaneTazi commented Jul 1, 2022 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 1, 2022 • edited

NouamaneTazi commented Jul 2, 2022 • edited

NouamaneTazi commented Jul 2, 2022 • edited

philschmid left a comment

Choose a reason for hiding this comment

NouamaneTazi commented Dec 7, 2022

fxmarty commented Dec 7, 2022

michaelbenayoun Dec 7, 2022

Choose a reason for hiding this comment

NouamaneTazi Dec 7, 2022

Choose a reason for hiding this comment

michaelbenayoun Dec 7, 2022

Choose a reason for hiding this comment

NouamaneTazi Dec 7, 2022

Choose a reason for hiding this comment

NouamaneTazi commented Dec 7, 2022

mht-sharma commented Dec 8, 2022

mht-sharma commented Dec 8, 2022 • edited

fxmarty commented Dec 8, 2022

NouamaneTazi commented Dec 12, 2022

michaelbenayoun left a comment

Choose a reason for hiding this comment

fxmarty commented Dec 12, 2022 • edited

NouamaneTazi commented Dec 12, 2022

fxmarty commented Dec 13, 2022

PoodleWang commented Dec 15, 2022

PoodleWang commented Dec 15, 2022

NouamaneTazi commented Dec 16, 2022

NouamaneTazi commented Jul 1, 2022 •

edited

HuggingFaceDocBuilderDev commented Jul 1, 2022 •

edited

NouamaneTazi commented Jul 2, 2022 •

edited

NouamaneTazi commented Jul 2, 2022 •

edited

mht-sharma commented Dec 8, 2022 •

edited

fxmarty commented Dec 12, 2022 •

edited