Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a 'current' symlink to the task-versioned prefix of the workdir. #4220

Merged
merged 4 commits into from
Feb 1, 2017

Conversation

benjyw
Copy link
Sponsor Contributor

@benjyw benjyw commented Jan 30, 2017

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.
@benjyw benjyw requested a review from mateor January 30, 2017 01:54
task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12])
stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME)
safe_mkdir(task_version_prefix)
safe_delete(stable_prefix)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested these three statements for backwards compatibility with old workdirs? It looks like it should work, but symlinks can be tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key):
super(VersionedTarget, self).__init__(cache_manager, [self])
self.id = target.id

def _results_dir_prefix(self, root_dir):
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method having sideeffects makes me a bit uncomfortable. Could this logic move under create_results_dir?

But it is also being called once per target. Best might be for it to move into a static method called during setup by the cachemanager so that each target isn't executing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateor What was the motivation behind your refactor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - sorry for the late reply - I got sick as a dog this week.

I did not give you a great link w/r/t to refactor itself. The changes and reasoning behind it should be mostly complete in this commit: mateor@3c1e239.

The refactor was broken out of the work I did wiping invalid results_dirs. I ended up reducing the scope because the existing test coverage was a bit light and if I added any bad behavior it could have been very challenging to track.

But the refactor was done because I thought the existing code was pretty hard to reason about.

  • Variable names are misleading
  • Relies heavily on mutating properties
  • Some methods have very surprising side effects (deleting this line removed ~400 file copies per cache hit in our repo) .

There was just a lot of older code there. But as evidenced by your question - without a change to hang it on, the refactor somewhat lost its purpose. If you do take a look and think I should pick it back up, let me know.

Copy link
Member

@mateor mateor Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - one other motivation - it included the change you just added, so we at minimum agreed on that part :)

Copy link
Member

@mateor mateor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would favor adding the prefix dir as a kwarg, which I detail a little in the comments. But if that is somehow objectionable, then this patch is probably fine, since nothing is relying on it.

It would probably make the most sense to just pass the task instance to the cache_manager constructor since ~90% of the args are task attributes. However, the public API requirement can make that big of a change unappetizing. So I thought adding a kwarg could possibly be a reasonable middle ground

@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key):
super(VersionedTarget, self).__init__(cache_manager, [self])
self.id = target.id

def _results_dir_prefix(self, root_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12])
stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME)
safe_mkdir(task_version_prefix)
safe_delete(stable_prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests for this?

@benjyw
Copy link
Sponsor Contributor Author

benjyw commented Jan 31, 2017

OK, I've completely changed the implementation. Now the cache manager creates the prefix. In fact, this nicely simplified parts of the invalidator API, so I think it's a sounder approach than my earlier one.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks muuuch better. Thanks.

@benjyw
Copy link
Sponsor Contributor Author

benjyw commented Feb 1, 2017

Added a couple of tests: One for this new symlink, and another for the existing result dir symlink.

@benjyw benjyw merged commit 3641dd3 into pantsbuild:master Feb 1, 2017
@benjyw benjyw deleted the current_link branch February 1, 2017 23:46
Copy link
Member

@mateor mateor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Benjy. Sorry for the late comments.

self._results_dir_prefix = os.path.join(results_dir_root, sha1(self._task_version).hexdigest()[:12])
safe_mkdir(self._results_dir_prefix)
stable_prefix = os.path.join(results_dir_root, self._STABLE_DIR_NAME)
safe_delete(stable_prefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously doesn't hurt, but I think relative_symlink will gracefully repoint the existing symlink without requiring the delete.

We had to more anal about it in create_results_dirs because we explicitly wanted to catch places where the symlink had been replaced by a real dir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I responded to your earlier question here: c509d80#r99345117, in case that is hard to find otherwise.

@ity ity added engine and removed engine labels Feb 3, 2017
peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 16, 2017
…antsbuild#4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.
peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 17, 2017
…antsbuild#4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.
lenucksi pushed a commit to lenucksi/pants that referenced this pull request Apr 25, 2017
…antsbuild#4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants