Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching parsed output files #273

Merged
merged 6 commits into from
Nov 6, 2023
Merged

Conversation

gpetretto
Copy link
Contributor

Summary

Following up on #268 I wrote an initial implementation for caching the output files. It is based on a modified version of the lru_cache decorator. This should make it easy switch to helper function for storing different properties, while still allowing Custodian to clear the cache after each time the loop on the handlers/validators has been executed.

One potential drawback of the implementation is that the set of cached functions is stored as a class attribute of tracked_lru_cache. This means for example that in a multithreading execution (memory shared), with multiple instances of Custodian running, the cache will be cleared more frequently than expected. It should not lead to overlaps, due to the caching being based on the absolute paths. However, a multithreading execution does not seem very likely to be used with Cusodian. No problem will happen in a multiprocessing execution of multiple instances of Custodian (no memory shared).

Making the set of cached function a property of a single Custodian instance is also feasible, but it will need to be passed to the handlers so that the decorated functions can be tracked, making the implementation of the cached functions a bit more involved.

I would like to first have a feedback on this solution. If it is fine I will proceed to write specific tests for the new functionality.

If there is also an interest for switching to the parsing of specific values, instead of the full vasprun.xml/OUTCAR, I will do that in a separate PR.

@shyuep
Copy link
Member

shyuep commented Jul 20, 2023

This generally looks good to me.

@gpetretto
Copy link
Contributor Author

Thanks for looking into this.
I have added the tests.

@janosh
Copy link
Member

janosh commented Nov 5, 2023

@gpetretto Looks like this fell under the radar. If you could resolve the conflicts, we'll get this merged!

@janosh janosh added enhancement perf Performance issues or improvements labels Nov 5, 2023
@gpetretto
Copy link
Contributor Author

Thanks for getting back on this @janosh. It should be fine now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of a new io module, we just define

ChachedOutcar = tracked_lru_cache(Outcar)
ChachedVasprun = tracked_lru_cache(Vasprun)

directly in custodian/vasp/handlers.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done, but since this is also used in the validators module they should be imported from the handlers. I think that having a generic io module could be fine, but I have no problem moving them to handlers.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess no need to move to handlers.py.

Comment on lines +17 to +24
@pytest.fixture(autouse=True)
def _clear_tracked_cache():
"""
Clear the cache of the stored functions between the tests.
"""
from custodian.utils import tracked_lru_cache

tracked_lru_cache.tracked_cache_clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fixture currently runs on every test, not just the handler tests which would be unnecessary work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move it to custodian/vasp/handlers.py but keep autouse=True to only auto apply it to those tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is also used in the validators, I think the fixture should either be duplicated in the handlers and validators files, or added it explicitly to the tests that need it, removing the autouse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably duplicating it in both files makes it more explicit too. People are more likely to see it and become aware this is happening.

@shyuep shyuep merged commit 1d31ba5 into materialsproject:master Nov 6, 2023
3 checks passed
@janosh
Copy link
Member

janosh commented Nov 6, 2023

@gpetretto Perhaps you could address my comments in a new PR?

@gpetretto gpetretto deleted the cache branch November 16, 2023 00:04
@gpetretto
Copy link
Contributor Author

Hi @janosh , sorry for the delay. I can indeed adress these comments in a separate PR. Can you maybe clarify what would be your preferences in the comments above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement perf Performance issues or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants