-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add persistence to cache.FileCache #90
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to split the
preload
and thefreeze
of cache?In the current implementation, user has to do following to
freeze
the cache and get multi-process safe:load => preserve => preload
If we can split these two functionalities, then we can simply do "load" => "freeze". Moreover, the cache can be modified before "freeze".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by load?
If "load" means writing data into cache files, that means creating file cache with specific name if I understand it correctly. But it is unacceptable because it leads to partially written named files left after job failure and they'll definitely mess up the cache and following re-run of the job.
Also the principle of cache in this module is data being immutable and deterministic, there should be no modification. What do you mean by "modified"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to cause confusion, by "load" I mean reading data from remote storage. If I understand correctly, in order to use this module, user has to do following:
My suggestion is to have a "freeze" function, so that we can:
to support the multiprocess. I do not think it would increase the risk of having a broken cache.
By modifying, I do not mean changing the data itself, but the ability of adding new cache entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe definitely it happens. For example,
put(1, b'bar')
Even in this case partial write may happen like
b'ba'
was written to disk butb'r'
wasn't. Then, restart the job,put(1, b'bar')
as the data 1 is immutableAlso in this case it is not possible to put data nor to run
get(1)
if partial write happened. This is typical scenario of broken data and broken cache files must be thrown away.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we both are misunderstanding each other. I think in your code, you integrated the "freeze" into preserve, which I misunderstood, so the 3 is not necessary. But I think that design forces the user to preserve the cache, which is not always necessary. For example, they just want to enable multiprocessing, which only needs "freezing" the cache. Preserving the data disables the automatic deletion of the cache.
In my example, the cache should not be frozen until all the data is loaded, and after "freeze", no data can be added to the cache. Then when the job fails at a "put", which is definitely before the "freeze", the cache will be neither frozen nor preserved. That is why I think such design will not increase the risk of having broken case.
Sorry for making confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatic deletion of the cache among multiple processes does not work correctly because there are no synchronization on closing the file objects, where child processes can't read contents after parent process exit. This is because Python's
tempfile
module explicitly unlinks named temporary files. Please try following script:You'll see this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for the explanation. Can you fix the other issues?