Add a 'current' symlink to the task-versioned prefix of the workdir. #4220

Merged
merged 4 commits into from Feb 1, 2017

Conversation

Projects
None yet
5 participants
@benjyw
Contributor

benjyw commented Jan 30, 2017

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.

Add a 'current' symlink to the task-versioned prefix of the workdir.
When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.

@benjyw benjyw requested a review from mateor Jan 30, 2017

+ task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12])
+ stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME)
+ safe_mkdir(task_version_prefix)
+ safe_delete(stable_prefix)

This comment has been minimized.

@stuhood

stuhood Jan 30, 2017

Member

Have you tested these three statements for backwards compatibility with old workdirs? It looks like it should work, but symlinks can be tricky.

@stuhood

stuhood Jan 30, 2017

Member

Have you tested these three statements for backwards compatibility with old workdirs? It looks like it should work, but symlinks can be tricky.

This comment has been minimized.

@mateor

mateor Jan 30, 2017

Member

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

@mateor

mateor Jan 30, 2017

Member

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key):
super(VersionedTarget, self).__init__(cache_manager, [self])
self.id = target.id
+ def _results_dir_prefix(self, root_dir):

This comment has been minimized.

@stuhood

stuhood Jan 30, 2017

Member

This method having sideeffects makes me a bit uncomfortable. Could this logic move under create_results_dir?

But it is also being called once per target. Best might be for it to move into a static method called during setup by the cachemanager so that each target isn't executing it.

@stuhood

stuhood Jan 30, 2017

Member

This method having sideeffects makes me a bit uncomfortable. Could this logic move under create_results_dir?

But it is also being called once per target. Best might be for it to move into a static method called during setup by the cachemanager so that each target isn't executing it.

This comment has been minimized.

@mateor

mateor Jan 30, 2017

Member

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

@mateor

mateor Jan 30, 2017

Member

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

This comment has been minimized.

@benjyw

benjyw Jan 31, 2017

Contributor

@mateor What was the motivation behind your refactor?

@benjyw

benjyw Jan 31, 2017

Contributor

@mateor What was the motivation behind your refactor?

This comment has been minimized.

@mateor

mateor Feb 3, 2017

Member

Hi - sorry for the late reply - I got sick as a dog this week.

I did not give you a great link w/r/t to refactor itself. The changes and reasoning behind it should be mostly complete in this commit: mateor@3c1e239.

The refactor was broken out of the work I did wiping invalid results_dirs. I ended up reducing the scope because the existing test coverage was a bit light and if I added any bad behavior it could have been very challenging to track.

But the refactor was done because I thought the existing code was pretty hard to reason about.

  • Variable names are misleading
  • Relies heavily on mutating properties
  • Some methods have very surprising side effects (deleting this line removed ~400 file copies per cache hit in our repo) .

There was just a lot of older code there. But as evidenced by your question - without a change to hang it on, the refactor somewhat lost its purpose. If you do take a look and think I should pick it back up, let me know.

@mateor

mateor Feb 3, 2017

Member

Hi - sorry for the late reply - I got sick as a dog this week.

I did not give you a great link w/r/t to refactor itself. The changes and reasoning behind it should be mostly complete in this commit: mateor@3c1e239.

The refactor was broken out of the work I did wiping invalid results_dirs. I ended up reducing the scope because the existing test coverage was a bit light and if I added any bad behavior it could have been very challenging to track.

But the refactor was done because I thought the existing code was pretty hard to reason about.

  • Variable names are misleading
  • Relies heavily on mutating properties
  • Some methods have very surprising side effects (deleting this line removed ~400 file copies per cache hit in our repo) .

There was just a lot of older code there. But as evidenced by your question - without a change to hang it on, the refactor somewhat lost its purpose. If you do take a look and think I should pick it back up, let me know.

This comment has been minimized.

@mateor

mateor Feb 3, 2017

Member

Oh - one other motivation - it included the change you just added, so we at minimum agreed on that part :)

@mateor

mateor Feb 3, 2017

Member

Oh - one other motivation - it included the change you just added, so we at minimum agreed on that part :)

@mateor

I would favor adding the prefix dir as a kwarg, which I detail a little in the comments. But if that is somehow objectionable, then this patch is probably fine, since nothing is relying on it.

It would probably make the most sense to just pass the task instance to the cache_manager constructor since ~90% of the args are task attributes. However, the public API requirement can make that big of a change unappetizing. So I thought adding a kwarg could possibly be a reasonable middle ground

@@ -172,20 +172,34 @@ def __init__(self, cache_manager, target, cache_key):
super(VersionedTarget, self).__init__(cache_manager, [self])
self.id = target.id
+ def _results_dir_prefix(self, root_dir):

This comment has been minimized.

@mateor

mateor Jan 30, 2017

Member

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

@mateor

mateor Jan 30, 2017

Member

How about add a root_dir or results_prefix kwarg to the cache_manager constructor?

Task.py could pass self.workdir as a kwarg, and would break no current callers. Then the cache_manager could just hold the reference to the prefix as a property or something.

I got ~ 95% done with a refactor that added this but decided to stop since I wasn't confident it would be accepted into master: https://github.com/mateor/pants/blob/b77f27644e06ff7ef3753753f8bacfea83c17d1f/src/python/pants/invalidation/cache_manager.py#L268

+ task_version_prefix = os.path.join(root_dir, sha1(task_version).hexdigest()[:12])
+ stable_prefix = os.path.join(root_dir, self._STABLE_DIR_NAME)
+ safe_mkdir(task_version_prefix)
+ safe_delete(stable_prefix)

This comment has been minimized.

@mateor

mateor Jan 30, 2017

Member

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

@mateor

mateor Jan 30, 2017

Member

in the same vein - the hashing logic should probably be isolated into just one place, since it would silently break if the hashing scheme did.

The CacheKeyGenerator would be an easy place to add this as a hash_value or something

@baroquebobcat

Could you add some unit tests for this?

@benjyw

This comment has been minimized.

Show comment
Hide comment
@benjyw

benjyw Jan 31, 2017

Contributor

OK, I've completely changed the implementation. Now the cache manager creates the prefix. In fact, this nicely simplified parts of the invalidator API, so I think it's a sounder approach than my earlier one.

Contributor

benjyw commented Jan 31, 2017

OK, I've completely changed the implementation. Now the cache manager creates the prefix. In fact, this nicely simplified parts of the invalidator API, so I think it's a sounder approach than my earlier one.

@stuhood

stuhood approved these changes Feb 1, 2017

This looks muuuch better. Thanks.

@benjyw

This comment has been minimized.

Show comment
Hide comment
@benjyw

benjyw Feb 1, 2017

Contributor

Added a couple of tests: One for this new symlink, and another for the existing result dir symlink.

Contributor

benjyw commented Feb 1, 2017

Added a couple of tests: One for this new symlink, and another for the existing result dir symlink.

@benjyw benjyw merged commit 3641dd3 into pantsbuild:master Feb 1, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@benjyw benjyw deleted the benjyw:current_link branch Feb 1, 2017

@mateor

Looks good, thanks Benjy. Sorry for the late comments.

+ self._results_dir_prefix = os.path.join(results_dir_root, sha1(self._task_version).hexdigest()[:12])
+ safe_mkdir(self._results_dir_prefix)
+ stable_prefix = os.path.join(results_dir_root, self._STABLE_DIR_NAME)
+ safe_delete(stable_prefix)

This comment has been minimized.

@mateor

mateor Feb 3, 2017

Member

Obviously doesn't hurt, but I think relative_symlink will gracefully repoint the existing symlink without requiring the delete.

We had to more anal about it in create_results_dirs because we explicitly wanted to catch places where the symlink had been replaced by a real dir.

@mateor

mateor Feb 3, 2017

Member

Obviously doesn't hurt, but I think relative_symlink will gracefully repoint the existing symlink without requiring the delete.

We had to more anal about it in create_results_dirs because we explicitly wanted to catch places where the symlink had been replaced by a real dir.

This comment has been minimized.

@mateor

mateor Feb 3, 2017

Member

Oh, and I responded to your earlier question here: c509d80#r99345117, in case that is hard to find otherwise.

@mateor

mateor Feb 3, 2017

Member

Oh, and I responded to your earlier question here: c509d80#r99345117, in case that is hard to find otherwise.

@ity ity added engine and removed engine labels Feb 3, 2017

peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 16, 2017

Add a 'current' symlink to the task-versioned prefix of the workdir. (#…
…4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.

peiyuwang pushed a commit to twitter/pants that referenced this pull request Feb 17, 2017

Add a 'current' symlink to the task-versioned prefix of the workdir. (#…
…4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.

lenucksi added a commit to lenucksi/pants that referenced this pull request Apr 25, 2017

Add a 'current' symlink to the task-versioned prefix of the workdir. (#…
…4220)

When working on custom tasks it's common to have to bump their versions,
and it quickly gets confusing to figure out which workdir is the current
one to poke around in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment