-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Improve performance by lowering amount of calls to retrieve model #5507
Comments
Hi @Davidswinkels have you taken a look at the model registry functionality? The ability to retrieve a particular model with a single API call is in there, allowing you to get an artifact by specifying a version or a stage directly. This might simplify your use case. As far as tightly coupling the tracking server and artifact retrieval into a single API call, I'm afraid that it wouldn't buy any performance improvement (they are separate services) and would only complicate the APIs. Hopefully the model registry (and perhaps also Projects https://www.mlflow.org/docs/latest/projects.html ) might help to reduce the amount of lines of code in your work if that is the concern. Please let me know if there are any other points that you'd like to discuss. |
Hi Ben. Thanks for the answer! We will look into the model registry documentation even more to see if that improves performance and if we can fetch models easier with less code. Then we will get back here. Yes agree from a loose-coupling perspective it's nice to keep tracking server and artifact retrieval separated. So then it's not wise to do: Do you or others think it is interesting to be able to search_runs based on experiment_name? Or is there no need for that with model_registry?
|
There certainly won't be a need to search for the experiment name while using the model registry since there is a very small subset of "production-capable" models that would be registered. mlflow/mlflow/store/tracking/sqlalchemy_store.py Lines 382 to 396 in 3ab6fbf
|
Hi @Davidswinkels if you're up for creating a search_runs_by_experiment_name() implementation that performs the client-side resolution of experiment names to experiment_id's and then submits those to the search_runs() API, please feel free to file a PR and we'll be more than happy to review and provide feedback. |
Hey Ben. We were thinking of adding this since we were using MLFlow with file-based backend-store-uri. We are currently switching to a database backend to improve performance. Plus with database backend we can now switch to model registry too. We still have to test how much performance would increase from using the calls via model registry. From initial performance check the "get_experiment_by_name" does not seem the bottleneck anymore:
Performance comparison of MLFlow model retrieval (file-based vs SQLite database) over 10 calls:
The requested feature to get run by experiment_name would still improve performance quite a bit for people who use file-based backend, but we won't develop it for now since for us with database backend getting experiment based on experiment_name is less of a performance issue. |
I'll give this a try |
This issue was resolved by this PR (#5564) and mlflow 1.25.0 release. Did a small test on MLFlow==1.25.0 with a SQLite database. Performance did improve! It varied quite a bit compared to before. Probably due too environment (local vs kubernetes cluster, and file-based vs SQLite) and also how many runs/models were stored. Summary performance check model retrieval per code chunk
Tracking registry model retrievalRetrieve model via name + experiment + run (1.48 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Retrieve model via name + run (1.46 s ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Model registry model retrievalRetrieve model via version + model registry (1.62 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Retrieve model via stage None + model registry (1.46 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Retrieve model via stage Production + model registry (1.49 s ± 66.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Thanks restless for implementing this. More neat to be able to get run based on experiment_name directly from tracking registry :) |
Thank you for submitting a feature request. Before proceeding, please review MLflow's Issue Policy for feature requests and the MLflow Contributing Guide.
Please fill in this feature request template to ensure a timely and thorough response.
Willingness to contribute
The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?
Proposal Summary
Retrieve models more efficiently by lowering required amount of requests.
Currently to retrieve a model we have to do 3 requests:
experiment_name="energy_forecast_10001_Amsterdam"
experiment = mlflow.get_experiment_by_name(experiment_name)
run = mlflow.search_runs(experiment.experiment_id, max_results=1)
model = mlflow.sklearn.load_model(os.path.join(run.artifact_uri[0], "model/"))
It would be nice if this can be speeded up by getting model in only 1 request:
model = mlflow.sklearn.load_latest_model(experiment_name)
or 2 requests:
run = mlflow.search_runs(experiment_name, max_results=1)
model = mlflow.sklearn.load_model(os.path.join(run.artifact_uri[0], "model/"))
Motivation
Performance
Performance for all users to load models.
Performance.
It's more difficult/impossible to improve the performance at higher level when lower calls are not performant.
What component(s), interfaces, languages, and integrations does this feature affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterfaces
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguages
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsDetails
(Use this section to include any additional information about the feature. If you have a proposal for how to implement this feature, please include it here. For implementation guidelines, please refer to the Contributing Guide.)
The text was updated successfully, but these errors were encountered: