-
-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a 'current' symlink to the task-versioned prefix of the workdir. #4220
Conversation
When working on custom tasks it's common to have to bump their versions, and it quickly gets confusing to figure out which workdir is the current one to poke around in.
task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12]) | ||
stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME) | ||
safe_mkdir(task_version_prefix) | ||
safe_delete(stable_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested these three statements for backwards compatibility with old workdirs? It looks like it should work, but symlinks can be tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.
The CacheKeyGenerator would be an easy place to add this as a hash_value
or something
@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key): | |||
super(VersionedTarget, self).__init__(cache_manager, [self]) | |||
self.id = target.id | |||
|
|||
def _results_dir_prefix(self, root_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method having sideeffects makes me a bit uncomfortable. Could this logic move under create_results_dir
?
But it is also being called once per target. Best might be for it to move into a static method called during setup by the cachemanager so that each target isn't executing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add a root_dir or results_prefix kwarg to the cache_manager constructor?
Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.
I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mateor What was the motivation behind your refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - sorry for the late reply - I got sick as a dog this week.
I did not give you a great link w/r/t to refactor itself. The changes and reasoning behind it should be mostly complete in this commit: mateor@3c1e239.
The refactor was broken out of the work I did wiping invalid results_dirs. I ended up reducing the scope because the existing test coverage was a bit light and if I added any bad behavior it could have been very challenging to track.
But the refactor was done because I thought the existing code was pretty hard to reason about.
- Variable names are misleading
- Relies heavily on mutating properties
- Some methods have very surprising side effects (deleting this line removed ~400 file copies per cache hit in our repo) .
There was just a lot of older code there. But as evidenced by your question - without a change to hang it on, the refactor somewhat lost its purpose. If you do take a look and think I should pick it back up, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - one other motivation - it included the change you just added, so we at minimum agreed on that part :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would favor adding the prefix dir as a kwarg, which I detail a little in the comments. But if that is somehow objectionable, then this patch is probably fine, since nothing is relying on it.
It would probably make the most sense to just pass the task instance to the cache_manager constructor since ~90% of the args are task attributes. However, the public API requirement can make that big of a change unappetizing. So I thought adding a kwarg could possibly be a reasonable middle ground
@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key): | |||
super(VersionedTarget, self).__init__(cache_manager, [self]) | |||
self.id = target.id | |||
|
|||
def _results_dir_prefix(self, root_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about add a root_dir or results_prefix kwarg to the cache_manager constructor?
Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.
I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268
task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12]) | ||
stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME) | ||
safe_mkdir(task_version_prefix) | ||
safe_delete(stable_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.
The CacheKeyGenerator would be an easy place to add this as a hash_value
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some unit tests for this?
OK, I've completely changed the implementation. Now the cache manager creates the prefix. In fact, this nicely simplified parts of the invalidator API, so I think it's a sounder approach than my earlier one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks muuuch better. Thanks.
Added a couple of tests: One for this new symlink, and another for the existing result dir symlink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks Benjy. Sorry for the late comments.
self._results_dir_prefix = os.path.join(results_dir_root, sha1(self._task_version).hexdigest()[:12]) | ||
safe_mkdir(self._results_dir_prefix) | ||
stable_prefix = os.path.join(results_dir_root, self._STABLE_DIR_NAME) | ||
safe_delete(stable_prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously doesn't hurt, but I think relative_symlink will gracefully repoint the existing symlink without requiring the delete.
We had to more anal about it in create_results_dirs because we explicitly wanted to catch places where the symlink had been replaced by a real dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I responded to your earlier question here: c509d80#r99345117, in case that is hard to find otherwise.
…antsbuild#4220) When working on custom tasks it's common to have to bump their versions, and it quickly gets confusing to figure out which workdir is the current one to poke around in.
…antsbuild#4220) When working on custom tasks it's common to have to bump their versions, and it quickly gets confusing to figure out which workdir is the current one to poke around in.
…antsbuild#4220) When working on custom tasks it's common to have to bump their versions, and it quickly gets confusing to figure out which workdir is the current one to poke around in.
When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.