Skip to content

Do not force download by default in ORTModel#356

Merged
regisss merged 3 commits into
huggingface:mainfrom
fxmarty:do-not-force-download-ortmodel
Aug 24, 2022
Merged

Do not force download by default in ORTModel#356
regisss merged 3 commits into
huggingface:mainfrom
fxmarty:do-not-force-download-ortmodel

Conversation

@fxmarty
Copy link
Copy Markdown
Contributor

@fxmarty fxmarty commented Aug 23, 2022

We passed by default force_download=True, although the caching works well when passing force_download=False. Especially when passing from_transformers=False, everything works as expected as the onnx model is loaded from the cache hub/ subfolder.

When passing from_transformers=True, although we cache only the PyTorch model (before the new transformers cache system in transformers/ cache subfolder, in hub/ subfolder in the next transformers release), the force_download argument is anyway ignored in _from_transformers so this PR has no impact there:

@classmethod
def _from_transformers(
cls,
model_id: str,
save_dir: Union[str, Path] = default_cache_path,
use_auth_token: Optional[Union[bool, str, None]] = None,
revision: Optional[Union[str, None]] = None,
force_download: bool = False,
cache_dir: Optional[str] = None,
**kwargs,
):
"""
Converts a vanilla Transformers model into an optimized model using `transformers.onnx.export_onnx`.
Arguments:
model_id (`str` or `Path`):
Directory from which to load
save_dir (`str` or `Path`):
Directory where the onnx model should be saved, default to `transformers.file_utils.default_cache_path`, which is the cache dir for
transformers.
use_auth_token (`str` or `bool`):
Is needed to load models from a private repository
revision (`str`):
Revision is the specific model version to use. It can be a branch name, a tag name, or a commit id
cache_dir (`Union[str, Path]`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the
standard cache should not be used.
force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force the (re-)download of the model weights and configuration files, overriding the
cached versions if they exist.
kwargs (`Dict`, *optional*):
kwargs will be passed to the model during initialization
"""
# create local save dir in cache dir
save_dir = Path(save_dir).joinpath(model_id)
save_dir.mkdir(parents=True, exist_ok=True)
kwargs["model_save_dir"] = save_dir
# reads pipeline task from ORTModelForXXX class if available else tries to extract from hub
if cls.export_feature is not None:
task = cls.export_feature
else:
task = HfApi().model_info(model_id, revision=revision).pipeline_tag
if task in ["sentiment-analysis", "text-classification", "zero-shot-classification"]:
task = "sequence-classification"
elif task in ["feature-extraction", "fill-mask"]:
task = "default"
# 2. convert to temp dir
# FIXME: transformers.onnx conversion doesn't support private models
preprocessor = get_preprocessor(model_id)
model = FeaturesManager.get_model_from_feature(task, model_id)
_, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=task)
onnx_config = model_onnx_config(model.config)
# export model
export(
preprocessor=preprocessor,
model=model,
config=onnx_config,
opset=onnx_config.default_onnx_opset,
output=save_dir.joinpath(ONNX_WEIGHTS_NAME),
)
kwargs["config"] = model.config.__dict__
# 3. load normal model
return cls._from_pretrained(save_dir.as_posix(), **kwargs)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @fxmarty!

If the force_download argument is not used in _from_transformers, maybe we should remove it what do you think?

Copy link
Copy Markdown
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome! But we have to add test to validate that it is working as expected. Especially since we are storing the "model.onnx" files at the top level with out any hash. So there is no way for the library to know if the cached "model.onnx" is the correct one or not.

We can add the local_files_only argument to the from_pretrained method and have then working and failing tests for "cached" and regular models. As well as for loading different models after each other to make sure we have the correct ones loaded.
See documentation: https://huggingface.co/docs/huggingface_hub/package_reference/file_download#huggingface_hub.hf_hub_download.local_files_only

We should also check that this works for the seq2seq models.

And we should check if the new cache system allows better distribution of onnx files. Currently, when a user loads and saves a onnx file it is saved at the top cache directory with model.onnx, which means if he loads a new model "from" cache it uses the old model. Thats why i went with "force_download" in the beginning.

@fxmarty
Copy link
Copy Markdown
Contributor Author

fxmarty commented Aug 23, 2022

When loading an onnx model with from_transformers=False, here's where it is loaded from in the hub/ subfolder:

image

For seq2seq models (with from_transformers=False), we have:

image

which does work as well. Thanks for the suggestion to add tests, I will try to add some!

When from_transformers=True, the argument force_download is ignored.

In case we passed from_transformers=True, in the old caching system, we indeed get e.g. a transformers/facebook/levit-256/model.onnx file with no hash recorded. However, even if this file exists, it is NOT loaded into the InferenceSession and the cached pytorch model is converted to onnx and loaded into an InferenceSession (the file is overwritten at each load).

Under the new caching system, if we pass from_transformers=True, we get e.g. a hub/facebook/levit-256/model.onnx with no hash recorded as well. Looking at the timestamps, this file is overwritten at every conversion.

In any case, I think the case from_transformers=True is independent from this PR.

@fxmarty fxmarty requested a review from philschmid August 23, 2022 14:02
Copy link
Copy Markdown
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit 122a9d8 into huggingface:main Aug 24, 2022
JingyaHuang added a commit that referenced this pull request Sep 7, 2022
* Override export of ORTSeq2SeqTrainer

* Do not force download by default in ORTModel (#356)

* Update OnnxConfigWithLoss wrapper

* ORT optimizer refactorization (#294)

* Refactorization of ORTOptimizer

* Refactorization of ORTModel

* Adapt examples according to refactorization

* Adapt tests

* Fix style

* Remove quantizer modification

* Fix style

* Apply modifications from #270 for quantizer and optimizer to have same behavior

* Add test for optimization of Seq2Seq models

* Fix style

* Add ort config saving when optimizing a model

* Add ort config saving when quantizing a model

* Add tests

* Fix style

* Adapt optimization examples

* Fix readme

* Remove unused parameter

* Adapt quantization examples

* Fix quantized model and ort config saving

* Add documentation

* Add model configuration saving to simplify loading of optimized model

* Fix style

* Fix description

* Fix quantization tests

* Remove opset argument which is onnx config default opset when exporting with ORTModels

* Fix import (#360)

* Fix export of decoders

* Add flag to export only decoders

* Fix ORTTrainer inference ort subclass parsing

* Fix filenames when empty suffix given (#363)

* fix(optimization): handle empty file suffix

* fix(quantization): handle empty file suffix

* use pathlibfor save_dir

* run test again

* Update optimum/onnxruntime/quantization.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

* ReRun test that failed because of cache (network)

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

* Override the evaluation and prediction loop in ORTSeq2SeqTrainer

* Fix documentation (#369)

* fix class

* Update optimization.mdx

* Fix label smoother device prob

* Fix lm_logits and labels dimension mismatch

* Clean up

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
Co-authored-by: Pierre Snell <ierezell@gmail.com>
Co-authored-by: Pierre Snell <pierre.snell@botpress.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
JingyaHuang added a commit that referenced this pull request Oct 2, 2022
* Inference with ORTModel

* Clean up unused imports

* Replace Inference session by ort model

* Inference with ORTModel

* Clean up unused imports

* Replace Inference session by ort model

* Update modeling for custom tasks

* Replace in evaluation_loop

* refectoring prediction_loop

* ORTSeq2SeqTrainer refactoring - Inference with ORTModel (#359)

* Override export of ORTSeq2SeqTrainer

* Do not force download by default in ORTModel (#356)

* Update OnnxConfigWithLoss wrapper

* ORT optimizer refactorization (#294)

* Refactorization of ORTOptimizer

* Refactorization of ORTModel

* Adapt examples according to refactorization

* Fix ORTTrainer inference ort subclass parsing

* Replace datasets.load_metric by evaluate

* Add summarization example

* Enable ORT inference

* Fix inference args

* Mention ORT inference in READMEs

* Remove repetitve code in Trainer

* Update examples to trfrs 4.22.1

* Fix qa example prediction error

* Update summarization/README.md

* Fix logger consistency

* Make readme consistent with trfrs

* Put back onnx config with past and loss test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants