Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving external data for large ONNX models #255

Closed
wants to merge 15 commits into from

Conversation

NouamaneTazi
Copy link
Member

@NouamaneTazi NouamaneTazi commented Jul 1, 2022

What does this PR do?

Fixes #254 and #377

We can now load and save ORT models that have external data 🚀

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 1, 2022

The documentation is not available anymore as the PR was closed or merged.

@NouamaneTazi
Copy link
Member Author

NouamaneTazi commented Jul 2, 2022

With the latest commit, we're now able to do:

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache",
    onnx_cache_dir="./onnx_cache",  # saves ONNX model with external data if large model to "./onnx_cache"
)

model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    cache_dir="model_cache"  # like previous behaviour where `onnx_cache_dir`= `cache_dir`
)

And model.save_pretrained(save_path) would just copy files from onnx_cache_dir to the provided save_path

@NouamaneTazi
Copy link
Member Author

NouamaneTazi commented Jul 2, 2022

The following should be working now

# load small ONNX model
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-small-testing-onnx", use_auth_token=True)
# load large ONNX model (>2GB) by specifying folder containing model's weights
model = ORTModelForCausalLM.from_pretrained("nouamanetazi/bloom-350m-onnx-folder", use_auth_token=True, onnx_folder="onnx")

Example of uploading a large ONNX model (>2GB) to the hub

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import shutil
from huggingface_hub import HfApi

model_ckpt = "bigscience/bloom-350m"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForCausalLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
    onnx_cache_dir="./onnx_cache",  # saves ONNX model to "./onnx_cache"
)

# save to local folder
model.save_pretrained(save_path / "onnx")
shutil.move(save_path / "onnx" / "config.json", save_path / "config.json")
tokenizer.save_pretrained(save_path)

# push to hub
repo_id = "nouamanetazi/bloom-350m-onnx-folder-test"
api = HfApi()
api.create_repo(repo_id=repo_id, exist_ok=True)
api.upload_folder(folder_path=save_path, repo_id=repo_id, path_in_repo=".", repo_type="model")

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure tests are passing and style is correct?

optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved
@mht-sharma mht-sharma mentioned this pull request Dec 1, 2022
4 tasks
@NouamaneTazi
Copy link
Member Author

We can now save/load large ORTModelForSeq2SeqLM

from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt, use_auth_token=True)
model = ORTModelForSeq2SeqLM.from_pretrained(
    model_ckpt,
    use_auth_token=True,
    from_transformers=True,
)

# # save to local folder
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)

@fxmarty
Copy link
Collaborator

fxmarty commented Dec 7, 2022

Awesome! I think it would be great to add tests, essentially that saving / reloading works well, in the encoder-only/encoder-decoder cases.

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved
import onnx
from onnx.external_data_helper import ExternalDataInfo, _get_initializer_tensors

model_paths = src_file_names.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a list of Paths so I don't think it copies anything here, we might as well start from a new empty list, and fill it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked without the copy() and the extending does modify the list src_file_names unfortunately :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my point is that you can create an empty list

model_paths = []

And fill it as you go?

My point here is that src_files_names[0] will be the same instance of model_paths[0].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I don't quite get it. I only use model_paths to iterate over the inputted src_files_names. Then I keep on filling src_files_names

    model_paths = []
    for model_path in model_paths:
        # load model graph
        model = onnx.load(str(model_path), load_external_data=False)
        # filter out tensors that are not external data
        model_tensors = _get_initializer_tensors(model)
        model_tensors_ext = [
            ExternalDataInfo(tensor).location
            for tensor in model_tensors
            if tensor.HasField("data_location") and tensor.data_location == onnx.TensorProto.EXTERNAL
        ]
        src_paths.extend([model_path.parent / tensor_name for tensor_name in model_tensors_ext])
        dst_file_names.extend(model_tensors_ext)
    return src_paths, dst_file_names

So this shouldn't work

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved
NouamaneTazi and others added 2 commits December 7, 2022 16:18
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
@NouamaneTazi
Copy link
Member Author

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit.
Will try to add tests once I have time @fxmarty

@mht-sharma
Copy link
Contributor

For the tests, it would be cool if we could enforce saving a small model in external data format. I tried looking quickly for a way, but there doesn't seem to be an easy way to bypass the 2GB protobuf file limit.
Will try to add tests once I have time @fxmarty

You could use the following api to convert a small model to external data. converting-an-onnx-model-to-external-data

The size threshold has to be low so that it can create the files.

@mht-sharma
Copy link
Contributor

mht-sharma commented Dec 8, 2022

Hi @NouamaneTazi, thanks for the PR, it would require a small change to handle one more use-case for the modeling_seq2seq and modeling_decoder.

Taking an example of Seq2Seq class, it generates 3 different models, encoder.onnx, decoder.onnx, decoder_with_past.onnx. These are generated in the same folder right now. In case there are external files, there is a chance of an overwrite if they have the same names. See: 26983

Possible fix is to save them in folders like, encoder/encoder.onnx, decoder/decoder.onnx etc. The same change would be required in the exporters.

@fxmarty
Copy link
Collaborator

fxmarty commented Dec 8, 2022

We should probably do the same in exporters actually

@NouamaneTazi
Copy link
Member Author

I'm trying to write tests for saving/loading with external data, but it's not as trivial as it seems.
Trying to apply your suggestion @mht-sharma by using :

            model = ORTModelForSeq2SeqLM.from_pretrained(self.ONNX_SEQ2SEQ_MODEL_ID, use_cache=True)
            model.save_pretrained(tmpdirname)

            # load model proto
            onnx_model = onnx.load(str(model.model_path)) 

            # save external data
            os.makedirs(str(model.model_path.parent / "external_data"), exist_ok=True)
            onnx.save_model(onnx_model, str(model.model_path.parent / "external_data" / "model.onnx"), save_as_external_data=True, all_tensors_to_one_file=False, size_threshold=8, convert_attribute=False)

            # need to do this for encoder/decoder/decoder_with_past

But again this wouldn't test our model.save_pretrained API at all. Because in our API the saving to onnx is done using torch.onnx.export here which doesn't accept an argument to specify external data format.

I'm open for suggestions, or else we can merge this for now

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

optimum/onnxruntime/utils.py Outdated Show resolved Hide resolved
@fxmarty
Copy link
Collaborator

fxmarty commented Dec 12, 2022

@NouamaneTazi Why not use actual >2GB models, initialized and saved random from transformers (so no download time)? So no need of custom logic.

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
@NouamaneTazi
Copy link
Member Author

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

@fxmarty
Copy link
Collaborator

fxmarty commented Dec 13, 2022

@fxmarty Yes definitely! I can use a randomly intialized model, but It seems there's no exposed API to load for example ORTModelForSequenceClassification from a BertForSequenceClassification instance?

You can do save_pretrained() on the PretrainedModel, and then from_pretrained then from a local folder using ORTModel.

@JingyaHuang JingyaHuang mentioned this pull request Dec 14, 2022
4 tasks
@PoodleWang
Copy link

@NouamaneTazi
from pathlib import Path
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_ckpt = "facebook/mbart-large-en-ro"
save_path = Path(f"saved_model/{model_ckpt}")
save_path.mkdir(parents=True, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = ORTModelForSeq2SeqLM.from_pretrained(
model_ckpt,
from_transformers=True,
)
model.save_pretrained(save_path / "onnx")
tokenizer.save_pretrained(save_path)

Log:
<Trial 2015437 worker_0> genius $ python3 /opt/tiger/genius/tensorrt/load.py
2022-12-15 09:13:01.539465: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:239: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:246: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:912: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/home/tiger/.local/lib/python3.7/site-packages/transformers/models/mbart/modeling_mbart.py:100: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min))
Traceback (most recent call last):
File "/opt/tiger/genius/tensorrt/load.py", line 55, in
from_transformers=True,
File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_ort.py", line 280, in from_pretrained
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/optimum/modeling_base.py", line 263, in from_pretrained
**model_kwargs,
File "/usr/local/lib/python3.7/dist-packages/optimum/onnxruntime/modeling_seq2seq.py", line 597, in _from_transformers
output=save_dir.joinpath(ONNX_DECODER_NAME),
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 353, in export
return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 204, in export_pytorch
raise err
File "/home/tiger/.local/lib/python3.7/site-packages/transformers/onnx/convert.py", line 189, in export_pytorch
opset_version=opset,
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/init.py", line 280, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 94, in export
use_external_data_format=use_external_data_format)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/onnx/utils.py", line 706, in _export
val_add_node_names, val_use_external_data_format, model_file_location)
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

** Environment
optimum-1.5.1

@PoodleWang
Copy link

I have similar issue mentioned here: #589 (comment)

@NouamaneTazi
Copy link
Member Author

Migrated this PR to #586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Saving external data for > 2GB models
8 participants