-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902
Comments
Can you add a tag with the date or an expiry date? Then you can filter by that? |
Have you tried using the SQL store with a MySQL backend? |
I also experience retrieval quite slow and sometimes even timing out resulting in an "Internal Server Error". Even as few as 10 runs in an experiment can sometimes time out. Also using s3 for artifact storage. I could switch to a MySQL backend but that shouldn't really help in something as small as 10 runs right? So there must be another issue why this is so slow. I'll post here if I come up with a solution. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue! |
Hi, we're currently exploring options to solve the main issue we have currently with mlflow, which is performance for reading run data (filesystem backend) :(, can imagine highly related to the way mlflow stores data, ex: every single param, metric goes into a separate file. Depending on project we can track hundreds of data points per run. So we end up on mlruns folder with 50k 100k files, even if for a lower number of runs.
We are considering sql backends also, but wondering if even in the filesystem could we have some workarounds available, like caching, or aggregating, especially for stale/closed runs.
This and the fact we can't use mlflowui to compare runs in different experiments (we use one backend per project), highly reduces the usefulness of mlflow ui projects with large number of runs. We end up producing a huge dataframe (which we can update around ~1 min for a few thousands of runs), which is even cached, refreshed only on a need basis.
Blazingly fast, but we lose the mlflow, so was thinking if some kind of caching could be enabled for mlflow server/ui? refreshed periodically, something like that.
thanks!
The text was updated successfully, but these errors were encountered: