-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mlflow works extremly slow with PostgreSQL #1763
Comments
@DimaZhu Can you try this again on MLflow 1.2 and confirm that the issue persists? The latest release contains some performance improvements for the SQLAlchemyStore. Are you logging a large number of metrics to MLflow? The 1.2 release should help significantly in that case. Regarding your question about run deletion, MLflow provides a |
@dbczumar Just tried v1.2, didn't notice any difference.
We have approximately 40 parameters and 50 metrics per run and also artifacts. I'm sorry, my question wasn't clear enough. In mlflow I can see approximately 500 runs in summary. But in psql I can see 5500 experiments, deleted ones are still in database and marked as 'deleted'. Can this be a bottleneck? I read in the docs that in case of file system database I should manually delete deleted experiments from trash. Do I need to do something similar with PostgreSQL? |
I tried and didn't noticed any difference. And mlflow server didn't ask to update db scheme. Maybe I can share a copy of db with you? |
@dbczumar yes, I'm pretty sure. I've created new conda env for this test. And installed mlflow from instructions in Dockerfile. And |
@DimaZhu What is the output of |
|
If you've installed MLflow from source, the version displayed should be Line 4 in 007bf13
|
@dbczumar, I followed the instructions. |
@DimaZhu were you asked to upgrade your database upon running |
@dbczumar No, mlflow server didn't ask to upgrade and there is no logs in browser. Maybe something wring with access to db. I'll check |
@dbczumar 🎉 That's much much much better! 30 s vs 240 s. And already loaded runs opens almost instantly and you don't have to wait another 4 minutes! Thank you very much! |
@DimaZhu Awesome! Glad to hear that. If possible, it would be great to see how we could improve things further by using your database as an example workload. If you still don't mind sharing it, feel free to send me ( |
@dbczumar To do this right, I should discuss this with my boss. I'll let you know |
@dbczumar I've found interesting moment. I run npm install from probably master and it worked fine. Now I've tried from branch in Pull Request:
And it failed. Installation from master fixes it |
@DimaZhu Does running |
This class of performance issues has been addressed by #1767 and #1805. These changes will be incorporated into the next MLflow release. In the mean time, you can test the changes by cloning the MLflow repository running the MLflow tracking server on the |
Hi, everyone. We started to use MlFlow several weeks ago and decided to use PostgreSQL to speed up processing. But now experiment with ~300 runs loads approximately infinity and exceeds any reasonable time limits. Than I've updated to v1.1 and it started to work better (~2 minutes), but still devastating. I've looked in database and found out that there is 5500 runs. Could it be the reason of slow mlflow work? And if so, if there is API to clean deleted runs from db?
The text was updated successfully, but these errors were encountered: