-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make disk_cache persist across runs #195
Make disk_cache persist across runs #195
Conversation
sciunit/models/backends.py
Outdated
if isinstance(from_kwargs, Path) or isinstance(from_kwargs, str): | ||
location = str(from_kwargs) | ||
else: | ||
location = Path(tempfile.mkdtemp()) / "cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgerkin, as mentioned in the comment, we'll need to add config.set_to_disk()
for the commented-out version to work.
Before I commence with that, however, how about another idea.
We can use Path(os.getcwd()) / 'cache'
, which will create the cache
file in the folder the user executes their script from.
This way we don't need to mess with the config, and the user keeps direct control over the size of their cache file.
Do you see any drawbacks to this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds appealing, but most applications hide their cache somewhere that the user is unlikely to mess with it. Maybe this is undesirable for something like sciunit where the modal user is also a developer. But I could still imagine that cache contents get accidentally git committed, for example.
Also, the user might execute their script from different locations (cwd could be a directory full of notebooks one time and the project root another time) and then we end up with unnecessary computation and two caches!
So I'd prefer to have tempfile decide where the cache goes, and if a user wants to explicitly pass their current directory as the location they can, or if they want to change the default locations used by tempfile they can do that.
Some decision has to be made about how relative paths will be handled. I don't like appending to os.getcwd()
but perhaps appending to the .sciunit directory in the user's home directory? That is where the config is already stored I believe.
We could also have explicit cache purging or cache pruning operations (prune cache entries older than N days, for example, though this would require a parallel cache to store dates).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some decision has to be made about how relative paths will be handled.
Why, when a user passes their "use_disk_cache": "./relative/path"
?
explicit cache purging & delete-cache-entries-every-n-days
Agreed about the cache purging function.
About delete-cache-entries-every-n-days -
tempfile
already tries to store files in folders that tend to be cleaned occasionally.
Storing cache location in
{home}/.sciunit/config.json
To explain my concerns over this loss of control over the cache location -
- I'd prefer my cached sim results not to disappear (on reboot or in n hours, what if it took me 30 minutes to run the sim),
- I'd like to keep an eye on the cache file size (and to be able to clean my cache without reading the docs),
- I'd like my cache to be per-project and not per-user (what if I .deleteCache() and that deletes the cache of another project of mine too).
This is not the position I'd normally take with cache, this is the position I take knowing netpyne can casually take 10 minutes to run the sim 🤔
In either case, the user can still do the local cache file. If you confirm you certainly prefer the {home}/.sciunit/config.json
& tempfile
default - let's go this way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I prefer that, and it will persist until the user deletes. It please do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To set the record straight - we don't want tempfile
to handle the cache location anymore (and thus we don't need to store any paths in $HOME/.sciunit/config.json
), we want a persistent cache in $HOME/.sciunit/cache
.
Yes, let’s do the .sciunit location (something like $HOME/.sciunit/cache) for disk_cache=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could still allow it with some special path location argument (e.g. "temp") but I'm not sure what the use case would be for this, so I think you can get rid of it.
sciunit/models/backends.py
Outdated
if isinstance(from_kwargs, Path) or isinstance(from_kwargs, str): | ||
location = str(from_kwargs) | ||
else: | ||
location = Path(tempfile.mkdtemp()) / "cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds appealing, but most applications hide their cache somewhere that the user is unlikely to mess with it. Maybe this is undesirable for something like sciunit where the modal user is also a developer. But I could still imagine that cache contents get accidentally git committed, for example.
Also, the user might execute their script from different locations (cwd could be a directory full of notebooks one time and the project root another time) and then we end up with unnecessary computation and two caches!
So I'd prefer to have tempfile decide where the cache goes, and if a user wants to explicitly pass their current directory as the location they can, or if they want to change the default locations used by tempfile they can do that.
Some decision has to be made about how relative paths will be handled. I don't like appending to os.getcwd()
but perhaps appending to the .sciunit directory in the user's home directory? That is where the config is already stored I believe.
We could also have explicit cache purging or cache pruning operations (prune cache entries older than N days, for example, though this would require a parallel cache to store dates).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See slack and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes suggested
myModel = Model() | ||
backend = Backend() | ||
backend.model = myModel | ||
backend.init_backend(use_disk_cache="/some/good/path", use_memory_cache=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this fake path possibly work? Can you instead use the tempfile
module to just explicitly create an real temporary path? I assume that init_backend
doesn't actually try to write to this location (so it doesn't matter if it is real or not), but in that case it isn't much of a test of whether a manual location can work. You should instead have each of these cases actually try writing data to that cache, checking if it is there, etc.
'''A class for creating and deleting a folder in "./unit_test/". | ||
''' | ||
|
||
path = Path(__file__).parent / "unit_test" / "delete_after_tests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are moving to doing everything inside HOME/.sciunit, so maybe the temp test folder should be there as well, i.e. HOME/.sciunit/unit_test. A user could theoretically be running code from a folder in which they don't have write access, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user could theoretically be running code from a folder in which they don't have write access, for example.
And by the user you mean the sciunit developer who runs the tests? (we only use this util for tests)
Would you prefer your temporary test files to be in HOME/.sciunit
, somewhere outside of your git tracker?
If something bad happens and test files don't get deleted I'd certainly like to see them as my unstaged files.
If something doesn't work I'd want to uncomment the line that deletes the file, and check the file myself (without going to my HOME folder).
Do these concerns sound unrelatable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are probably arguments both ways so I think it is fine as you have it.
No description provided.