New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard link or copy ivy and coursier cache #6246

Merged
merged 14 commits into from Jul 30, 2018

Conversation

Projects
None yet
4 participants
@wisechengyi
Copy link
Contributor

wisechengyi commented Jul 26, 2018

Resolves #6218

Problem

Symlinks makes snapshot difficult.

Solution

Use hard links instead if possible, copy otherwise.

Result

Coursier and ivy artifacts under .pants.d are now snapshottable

$ target/debug/fs_util --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=/Users/yic/workspace/pants/.pants.d 'ivy/**'
f67fd38c09de0eb230a7fb539920c5e9eba74ce2ff47b85df5f4012de064917f 78
$ target/debug/fs_util --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=/Users/yic/workspace/pants/.pants.d 'coursier/**'
2f041d753c6d7fe8452ad3c1173ad3afac649995c6fe04fbb1fce9cc82d7213e 83
@wisechengyi
Copy link
Contributor

wisechengyi left a comment

Essence of change highlighted. Other things are just rename and documentation.

try:
os.symlink(path, symlink)
os.link(path, hardlink)

This comment has been minimized.

@wisechengyi

wisechengyi Jul 26, 2018

Contributor

Essence of change for ivy

This comment has been minimized.

@jsirois

jsirois Jul 26, 2018

Member

Is the code structured in such a way that path and hardlink are guaranteed to be on the same device? If not the hard link will fail.

This comment has been minimized.

@wisechengyi

wisechengyi Jul 26, 2018

Contributor

There is no specific check for same device, but I'd expect an immediate error like https://stackoverflow.com/questions/42392600/oserror-errno-18-invalid-cross-device-link, which should be okay?

cc @illicitonion whether we can accept the error, or should we actively handle it.

This comment has been minimized.

@jsirois

jsirois Jul 26, 2018

Member

I expect that error as well and I can't imagine it would be ok. AFAICT that would amount to a systemic error the use could only fix by relocating their pants workdir to the same device as their ivy cache. That is neither an obvious fix nor a friendly one.

This comment has been minimized.

@stuhood

stuhood Jul 27, 2018

Member

@wisechengyi : This needs to do something like:

If we can hardlink where we currently symlink, do so.
If not, copy the file.

...which might amount to a utility function that attempts to hardlink, and then falls back to copying. I don't know if that needs any further optimization, or if just catching the failure to hardlink and then retrying is fine (I expect it is rare enough that that would be fine).

This comment has been minimized.

@jsirois

jsirois Jul 27, 2018

Member

Exactly.

This comment has been minimized.

@wisechengyi

wisechengyi Jul 27, 2018

Contributor

Thanks! It makes sense.

I was also thinking about the case for coursier to resolve against ~/.m2/repositories because coursier would not copy it into $coursier_cache_dir/, so this case also be covered.

This comment has been minimized.

@wisechengyi
@@ -613,7 +613,7 @@ def _map_coord_to_resolved_jars(cls, result, coursier_cache_path, pants_jar_path

if not os.path.exists(pants_path):
safe_mkdir(os.path.dirname(pants_path))
os.symlink(jar_path, pants_path)
os.link(jar_path, pants_path)

This comment has been minimized.

@wisechengyi

wisechengyi Jul 26, 2018

Contributor

Essence of change for coursier

wisechengyi added some commits Jul 26, 2018

@jsirois
Copy link
Member

jsirois left a comment

This LGTM except for the pex.common.safe_copy use. Its ugly, but I'd be happier with copying that code into pants.util.fileutil.py and adding a test (which the pex code does not have). I'd be interested to hear your and other reviewers opinions though.

@@ -17,6 +16,7 @@
from functools import total_ordering

import six
from pex.common import safe_copy

This comment has been minimized.

@jsirois

jsirois Jul 27, 2018

Member

I'm all for code re-use, but this seems off. For one, jvm code depending on pex doesn't make sense. Probably more important though, I'm pretty sure pex.common is not intended to be public.

This comment has been minimized.

@wisechengyi

wisechengyi Jul 27, 2018

Contributor

Done

wisechengyi added some commits Jul 27, 2018

@@ -216,65 +218,43 @@ def test_second_noop_does_not_invoke_coursier(self):
task.execute()
task.runjava.assert_not_called()

def test_when_invalid_artifact_symlink_should_trigger_resolve(self):
def test_when_invalid_hardlink_and_coursier_cache_should_trigger_resolve(self):

This comment has been minimized.

@wisechengyi

wisechengyi Jul 27, 2018

Contributor

This test changed.

Before

  1. Run task.execute()
  2. Remove symlink under .pants.d/coursier and realpath(symlink)
  3. Run task.execute() should call coursier again

Now it's all hardlink/actual file, 3) would noop, hence changing it to

  1. Run task.execute()
  2. Remove hardlink under .pants.d/coursier and the entire coursier cache dir
  3. Run task.execute() should call coursier again

@wisechengyi wisechengyi changed the title [WIP] Hard link ivy and coursier cache [WIP] Hard link or copy ivy and coursier cache Jul 27, 2018

@wisechengyi wisechengyi changed the title [WIP] Hard link or copy ivy and coursier cache Hard link or copy ivy and coursier cache Jul 27, 2018

@wisechengyi wisechengyi requested a review from illicitonion Jul 27, 2018

@wisechengyi

This comment has been minimized.

Copy link
Contributor

wisechengyi commented Jul 27, 2018

Ready.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looks great, thanks!

@@ -795,7 +794,7 @@ def _exec_ivy(cls, ivy, confs, ivyxml, args, jvm_options, executor,
ivy_args.extend(args)

ivy_jvm_options = list(jvm_options)
# Disable cache in File.getCanonicalPath(), makes Ivy work with -symlink option properly on ng.
# Disable cache in File.getCanonicalPath(), makes Ivy work with -hardlink option properly on ng.

This comment has been minimized.

@illicitonion

illicitonion Jul 27, 2018

Contributor

This probably isn't necessary any more (but is probably worth keeping anyway, because caching symlink traversals is scary)

This comment has been minimized.

@wisechengyi

wisechengyi Jul 27, 2018

Contributor

Done. Good catch!

context = self.context(target_roots=[jar_lib])
task = self.execute(context)
compile_classpath = context.products.get_data('compile_classpath')
self.assertFalse(os.path.islink(path))

This comment has been minimized.

@illicitonion

illicitonion Jul 27, 2018

Contributor

Maybe remove this assertion? Out of the context of this change, it seems a little out of nowhere?

This comment has been minimized.

@wisechengyi

wisechengyi Jul 27, 2018

Contributor

Done

@@ -37,3 +39,33 @@ def line_count(filename):
'nosize': lambda srcs: 0,
'random': lambda srcs: random.randint(0, 10000),
}


# Copied from pex for JVM related code not to depend on pex

This comment has been minimized.

@jsirois

jsirois Jul 27, 2018

Member

The comment could be dropped; especially if you add a unit test - which would be great.

This comment has been minimized.

@wisechengyi

wisechengyi Jul 30, 2018

Contributor

Comment removed and tests added

shutil.copyfile(source, temp_dest)
os.rename(temp_dest, dest)

# If the platform supports hard-linking, use that and fall back to copying.

This comment has been minimized.

@jsirois

jsirois Jul 27, 2018

Member

This comment and check are very out of character for the codebase - since we don't support windows. I'm fine with it staying perhaps, but it does add complexity to the function and the resulting test, which will need to mock or somesuch to test the else branch.

This comment has been minimized.

@wisechengyi

wisechengyi Jul 30, 2018

Contributor

Windows portion removed.

wisechengyi added some commits Jul 30, 2018

@stuhood stuhood merged commit a43c71c into pantsbuild:master Jul 30, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@wisechengyi wisechengyi deleted the wisechengyi:hard_link branch Jul 30, 2018

CMLivingston pushed a commit to CMLivingston/pants that referenced this pull request Aug 27, 2018

Hard link or copy ivy and coursier cache (pantsbuild#6246)
### Problem

Symlinks makes snapshot difficult.

### Solution

Use hard links instead if possible, copy otherwise.

### Result

Coursier and ivy artifacts under `.pants.d` are now snapshottable. Resolves pantsbuild#6218.

```
$ target/debug/fs_util --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=/Users/yic/workspace/pants/.pants.d 'ivy/**'
f67fd38c09de0eb230a7fb539920c5e9eba74ce2ff47b85df5f4012de064917f 78
$ target/debug/fs_util --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=/Users/yic/workspace/pants/.pants.d 'coursier/**'
2f041d753c6d7fe8452ad3c1173ad3afac649995c6fe04fbb1fce9cc82d7213e 83
```

stuhood added a commit that referenced this pull request Oct 29, 2018

Move ivy/coursier link farms under versioned task directories (#6686)
### Problem

As described in #6679, the ivy and coursier link farms are not versioned along with the tasks that consume them. This had the effect of breaking backwards compatibility when those links converted from being symlinks to hardlinks in #6246.

### Solution

Move the "task versioned workdir" logic from `CacheManager` to `Task`, to allow it to be used in more places, and then use it for both `IvyTaskMixin` and `CoursierResolve`.

### Result

Rather than having a "global" link farm at `.pants.d/{ivy,coursier}`, the installed instances of tasks extending `IvyTaskMixin` or `CoursierResolve` will each have their own independent, versioned link farms.

Fixes #6679.

stuhood added a commit that referenced this pull request Oct 29, 2018

Move ivy/coursier link farms under versioned task directories (#6686)
### Problem

As described in #6679, the ivy and coursier link farms are not versioned along with the tasks that consume them. This had the effect of breaking backwards compatibility when those links converted from being symlinks to hardlinks in #6246.

### Solution

Move the "task versioned workdir" logic from `CacheManager` to `Task`, to allow it to be used in more places, and then use it for both `IvyTaskMixin` and `CoursierResolve`.

### Result

Rather than having a "global" link farm at `.pants.d/{ivy,coursier}`, the installed instances of tasks extending `IvyTaskMixin` or `CoursierResolve` will each have their own independent, versioned link farms.

Fixes #6679.

stuhood added a commit that referenced this pull request Oct 29, 2018

Move ivy/coursier link farms under versioned task directories (#6686)
As described in #6679, the ivy and coursier link farms are not versioned along with the tasks that consume them. This had the effect of breaking backwards compatibility when those links converted from being symlinks to hardlinks in #6246.

Move the "task versioned workdir" logic from `CacheManager` to `Task`, to allow it to be used in more places, and then use it for both `IvyTaskMixin` and `CoursierResolve`.

Rather than having a "global" link farm at `.pants.d/{ivy,coursier}`, the installed instances of tasks extending `IvyTaskMixin` or `CoursierResolve` will each have their own independent, versioned link farms.

Fixes #6679.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment