Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save model with smaller default max_shard_size for HuggingFace flavor #8567

Merged
merged 7 commits into from
May 31, 2023

Conversation

wenfeiy-db
Copy link
Contributor

@wenfeiy-db wenfeiy-db commented May 30, 2023

Related Issues/PRs

Follow-up of #8448

What changes are proposed in this pull request?

Save transformer models with smaller shard size by default (500MB). An environment variable MLFLOW_HUGGINGFACE_MODEL_MAX_SHARD_SIZE is added to allow those who don't like this behavior to override our default of 500MB. HF's default setting is 10GB.

This PR also saves the model directly instead of the wrapping pipeline, which was for some backward compatibility issue that no longer is a problem now that we enforce a minimum transformers version

How is this patch tested?

  • Existing unit/integration tests
  • Manual tests (describe details, including test results, below)

Manually testing with a wheel

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

mlflow.transformers.save_model now saves model with max shard size as 500 MB by default. Users can override it via an environmental variable MLFLOW_HUGGINGFACE_MODEL_MAX_SHARD_SIZE.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: wenfeiy-db <wenfei.yan@databricks.com>
Signed-off-by: wenfeiy-db <wenfei.yan@databricks.com>
Signed-off-by: wenfeiy-db <wenfei.yan@databricks.com>
@mlflow-automation
Copy link
Collaborator

mlflow-automation commented May 31, 2023

Documentation preview for 9b462ae will be available here when this CircleCI job completes successfully.

More info

mlflow/transformers.py Outdated Show resolved Hide resolved
@@ -426,7 +428,10 @@ def save_model(
flavor_conf.update({_PROCESSOR_TYPE_KEY: _get_instance_type(processor)})

# Save the pipeline object
built_pipeline.save_pretrained(save_directory=path.joinpath(_PIPELINE_BINARY_FILE_NAME))
built_pipeline.model.save_pretrained(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we can stop saving the pipeline entirely in this way? I think there's a risk that the max_shard_size parameter may not be supported for all pipeline types (though not sure). What do we think about the following approach:

  1. Try to save_pretrained on the model
  2. If that doesn't work, just save the pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_shard_size parameter may not be supported for all pipeline types (though not sure)

Do you mean maybe not all "model" types support the max_shard_size param? It seems for me it's supported for any transformers.PreTrainedModel by reading its doc.

But yeah I agree we can play safe. I can add the try catch thing. Do you have suggestion on which errors to catch? I'm not super familiar with HF errors, and don't want to accidentally swallow errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think it's broadly supported I'm good with it - this comment mostly originates from me not understanding Huggingface pipelines' full level of flexibility

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're calling this on the base PreTrainedModel, which provides a save_pretrained method with max_shard_size. It's possible that some models that override save_pretrained, may be missing this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For models who override save_pretrained and break its signature, I don't think mlflow transformers flavor is guaranteed to work, right?
I'll keep this unchanged (i.e. no try catch) for now then. We can revisit this if we later find the assumption is broken.

@ankit-db ankit-db changed the title Save Hugginface model with smaller shard size Support max_shard_size parameter for HuggingFace flavor May 31, 2023
built_pipeline.save_pretrained(save_directory=path.joinpath(_PIPELINE_BINARY_FILE_NAME))
built_pipeline.model.save_pretrained(
save_directory=path.joinpath(_MODEL_BINARY_FILE_NAME),
max_shard_size=MLFLOW_HUGGINGFACE_MODEL_MAX_SHARD_SIZE.get(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check if the parameter is passed in in kwargs, so that we can support folks overriding it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can override it with the env var now. Do we want to expose it in mlflow.transformers.save_model too? I think it depends on how easy we want to allow users to override this, and I thought we don't want it to be too easy.
Let me know if we want 1) env var 2) kwargs or 3) both. I can change accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think restricting this behavior to purely environment variables is fine. Can we add a note in models.rst explaining the behavior and the env var key that would need to be overridden for edge cases to change the sharding size?

@ankit-db
Copy link
Collaborator

Btw, tests were triggered (several non-audio also failing)

@github-actions github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels May 31, 2023
Copy link
Member

@BenWilson2 BenWilson2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small request to add a note in models.rst explaining how to override this setting

Copy link
Collaborator

@ankit-db ankit-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - let's get it in before the next MLflow release!

Signed-off-by: wenfeiy-db <wenfei.yan@databricks.com>
@wenfeiy-db
Copy link
Contributor Author

wenfeiy-db commented May 31, 2023

@BenWilson2 I added a note in model.rst, but some tests (while all were passing before) are failing for some reason. My last commit only changes the doc and one comment, so the new failing tests are quite confusing for me. Do you know what might go wrong?

Edit: The tests are also passing in my local env, so i'm making empty commit to trigger another run.

Signed-off-by: wenfeiy-db <wenfei.yan@databricks.com>
@BenWilson2 BenWilson2 merged commit 67ed9b7 into mlflow:master May 31, 2023
35 checks passed
@wenfeiy-db wenfeiy-db changed the title Support max_shard_size parameter for HuggingFace flavor Save model with smaller default max_shard_size for HuggingFace flavor Jun 1, 2023
dbczumar added a commit to dbczumar/mlflow that referenced this pull request Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants