New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit file rehashing policy #1062
Comments
In At the time, I worked around this problem by making |
I think this was happening for external user-defined files, the equivalent of |
I was hoping to find out which systems support trustworthy file timestamps, but it looks like this information is too varied and too brittle. |
I think the right call may be to enhance the
Remarks:
|
On second thought, the more complicated file cue would create more for users to think about. For format = "file", I like the file threshold rule more. Smaller files are more likely to change more often, and the general approach has worked for years. Maybe something else would work better, but I still want to fully automate it. |
New plan:
|
Implemented in 720a865. |
Prework
Problem
The current file rehashing policy is coded in
file_should_rehash()
:targets/R/class_file.R
Lines 58 to 63 in c64a7e8
In particular, small files are always rehashed which is a bottleneck for pipelines with large numbers of small files. Because time stamps have low resolution on e.g. Windows, they are only trusted when the file is large.
Proposal
For small files in
_targets/objects/
which the user should not modify by hand, I propose we try to avoid rehashing them. We might just compare the modification time toSys.time()
and trust the time stamp if it is older than a second. We could reduce this threshold on non-Windows machines. Would be good to revisit the actual timestamp resolution on various platforms.I plan to keep the existing policy for
format = "file"
because those files are controlled by the user and it is harder to make the required simplifying assumptions for more nuanced cache invalidation.The text was updated successfully, but these errors were encountered: