Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902

rquintino · 2019-10-03T09:01:01Z

Hi, we're currently exploring options to solve the main issue we have currently with mlflow, which is performance for reading run data (filesystem backend) :(, can imagine highly related to the way mlflow stores data, ex: every single param, metric goes into a separate file. Depending on project we can track hundreds of data points per run. So we end up on mlruns folder with 50k 100k files, even if for a lower number of runs.

We are considering sql backends also, but wondering if even in the filesystem could we have some workarounds available, like caching, or aggregating, especially for stale/closed runs.

This and the fact we can't use mlflowui to compare runs in different experiments (we use one backend per project), highly reduces the usefulness of mlflow ui projects with large number of runs. We end up producing a huge dataframe (which we can update around ~1 min for a few thousands of runs), which is even cached, refreshed only on a need basis.

Blazingly fast, but we lose the mlflow, so was thinking if some kind of caching could be enabled for mlflow server/ui? refreshed periodically, something like that.

thanks!

sabman · 2019-10-04T06:26:02Z

Can you add a tag with the date or an expiry date? Then you can filter by that?

ankmathur96 · 2019-10-07T23:49:10Z

Have you tried using the SQL store with a MySQL backend?

DoktorMike · 2019-10-14T06:44:15Z

I also experience retrieval quite slow and sometimes even timing out resulting in an "Internal Server Error". Even as few as 10 runs in an experiment can sometimes time out. Also using s3 for artifact storage. I could switch to a MySQL backend but that shouldn't really help in something as small as 10 runs right? So there must be another issue why this is so slow. I'll post here if I come up with a solution.

stale · 2019-11-04T07:33:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-01-03T08:31:58Z

This issue has not been interacted with for 60 days since it was marked as stale! As such, the issue is now closed. If you have this issue or more information, please re-open the issue!

stale bot added the stale label Nov 4, 2019

stale bot closed this as completed Jan 3, 2020

e-dorigatti mentioned this issue Jan 12, 2024

We'd like your feedback on our MLflow experience! #10329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902

Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902

rquintino commented Oct 3, 2019

sabman commented Oct 4, 2019

ankmathur96 commented Oct 7, 2019

DoktorMike commented Oct 14, 2019

stale bot commented Nov 4, 2019

stale bot commented Jan 3, 2020

Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902

Performance for accessing run data, FileSystem backend, caching, other workarounds? #1902

Comments

rquintino commented Oct 3, 2019

sabman commented Oct 4, 2019

ankmathur96 commented Oct 7, 2019

DoktorMike commented Oct 14, 2019

stale bot commented Nov 4, 2019

stale bot commented Jan 3, 2020