New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume the bootstrapper and modify zinc to allow remote exec #6463

Merged
merged 19 commits into from Sep 14, 2018

Conversation

Projects
None yet
3 participants
@blorente
Copy link
Contributor

blorente commented Sep 6, 2018

(This work was made together with @dotordogh and @illicitonion. Even if we used my fork for convenience, we programmed most of the code together. This work is based on @ity's work on the same problem.)

Problem

See related issues below for discussion.

Solution

We modify the zinc subsystem to call the newly created bootstraper jar.
That jar will compile the compiler-bridge that we pass to every invocation of zinc.
The compiled jar is cached in a fingerprinted directory, in the Pants working directory.
The compilation is made via an ExecuteProcessRequest, which will enable remote execution of the compilation.

Result

Depends on (#6462 )
Closes (#6155)

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looking great! :)

],
main=Zinc.ZINC_COMPILE_MAIN,
custom_rules=shader_rules)
'bootstrap',

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

zinc-bootstrapper?

JarDependency('org.pantsbuild', 'zinc-extractor_2.11', '0.0.4')
])

# TODO(borja) is this the right way to do it? Alternative is fully specifying the version by hand:

This comment has been minimized.

@illicitonion

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

This is close, but won't work as it stands.

It's important to differentiate the scala version that Zinc's classpath uses (which is whatever the pantsbuild/pants repo is using: 2.11 currently; hence all of the hardcoded references to 2.11 in here), from the version of scala that someone using pants is using (which is configured on the ScalaPlatform subsystem with the --version flag, which is fully qualified as --scala-version).

Hardcoding 2.11 for this particular case ignores the ScalaPlatform --version . So rather than using ScalaPlatform._create_compiler_jardep('2.11'), you should imitate the example set by the ScalaPlatform._tool_classpath(..) method, which consumes the --version option internally. You will probably need to expose new methods over there to do that.

hasher.update(os.path.relpath(cp_entry, self.workdir))
key = hasher.hexdigest()[:12]

return os.path.join(self.workdir, 'zinc', key)

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

Let's throw in a compiler-bridge dir between zinc and key

PathGlobs(tuple([self._make_relative(jar) for jar in bootstrapper_args])),
text_type(self.workdir)
),))
input_jars_digest = context._scheduler.merge_directories(

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

capture_snapshots returns one element per element in the tuple it's called with, so input_jar_snapshots here is a tuple containing exactly one thing, so you should be able to just [0] it out, rather than requesting they be merged


def _make_relative(self, path):
"""A utility function to create relative paths to the work dir"""
return fast_relpath(path, self.workdir)

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

hermetic execution generally happens relative to the buildroot, rather than workdir; you probably want to get_buildroot() here rather than self.workdir (and below, where you capture a Snapshot)

output_files=(self._make_relative(bridge_jar),),
output_directories=(self._make_relative(self._compiler_bridge_cache_dir),),
description="bootstrap compiler bridge.",
jdk_home=text_type(self.dist.home),

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

Does this work without text_type? If not, we should probably make Distribution.home always a string, rather than converting here

description="bootstrap compiler bridge.",
jdk_home=text_type(self.dist.home),
)
res = context.execute_process_synchronously(req, 'zinc-subsystem', [WorkUnitLabel.COMPILER])

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

This doesn't throw if the command fails; you probably either want to make a product_request inside a workunit (possibly making a helper function to do that), or check res for its exit code. I'd lean towards the latter (or making execute_process_synchronously raise if the command failed), because I can see this being a common mistake. Sorry for the bad API!

This comment has been minimized.

@blorente

blorente Sep 10, 2018

Contributor

I think I lean against making execute_process_synchronously raise an exception, as then it would be doing judgement calls about the value of the return code. I think the best thing to address this is to create a helper function to call the process and raise if it fails, or something of the sort.

)
res = context.execute_process_synchronously(req, 'zinc-subsystem', [WorkUnitLabel.COMPILER])

#TODO(borja) We should only materialize if we are running locally

This comment has been minimized.

@illicitonion

illicitonion Sep 7, 2018

Contributor

context has an options field, from which you can find out whether both: 1. Zinc's execution-strategy is hermetic, and 2. whether a remote-execution-server is set.

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Sep 10, 2018

Regarding #6463 (comment),
I chose to extract a helper function. It should probably not go on Zinc, what would be the best place for it?

@blorente blorente changed the title [WIP] Consume the bootstrapper and modify zinc to allow remote exec Consume the bootstrapper and modify zinc to allow remote exec Sep 10, 2018

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Sep 10, 2018

This is ready for review @stuhood @ity @dotordogh @illicitonion

@illicitonion illicitonion requested review from stuhood , ity and dotordogh Sep 10, 2018

@illicitonion
Copy link
Contributor

illicitonion left a comment

Generally looks good to me! :)

# TODO: Kill zinc-cache-dir: https://github.com/pantsbuild/pants/issues/6155
# But for now, this will probably fail remotely because the homedir probably doesn't exist.
zinc_args.extend(['-zinc-cache-dir', self._zinc_cache_dir])
#zinc_args.extend(['-compiler-interface', compiler_interface])

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

Remove, rather than comment :)

This comment has been minimized.

@stuhood
@@ -399,3 +400,14 @@ def execute_process_synchronously(self, execute_process_request, name, labels=No
workunit.output("stderr").write(result.stderr)
workunit.set_outcome(WorkUnit.FAILURE if result.exit_code else WorkUnit.SUCCESS)
return result

def execute_process_synchronously_or_raise(self, execute_process_request, name, labels=None):

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

Maybe rename the other to fallibly_execute_process_synchronously or execute_process_synchronously_without_raising or similar, so it's obvious that if you don't want that behaviour you should look for another function?

@@ -154,7 +154,7 @@ def home(self):
if self._is_executable(os.path.join(jdk_dir, 'bin', 'javac')):
home = jdk_dir
self._home = home
return self._home
return text_type(self._home)

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

Ideally we should have this guarantee when we construct the object... What codepaths fail if you remove this?

cc @Eric-Arellano

@stuhood
Copy link
Member

stuhood left a comment

Thanks!


@classmethod
def register_options(cls, register):
super(Zinc.Factory, cls).register_options(register)

zinc_rev = '1.0.3'


shader_rules = [

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

A lot of the indentation in here seems to have changed unnecessarily. Would make it easier to review if you reverted that.

JarDependency('org.pantsbuild', 'zinc-extractor_2.11', '0.0.4')
])

# TODO(borja) is this the right way to do it? Alternative is fully specifying the version by hand:

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

This is close, but won't work as it stands.

It's important to differentiate the scala version that Zinc's classpath uses (which is whatever the pantsbuild/pants repo is using: 2.11 currently; hence all of the hardcoded references to 2.11 in here), from the version of scala that someone using pants is using (which is configured on the ScalaPlatform subsystem with the --version flag, which is fully qualified as --scala-version).

Hardcoding 2.11 for this particular case ignores the ScalaPlatform --version . So rather than using ScalaPlatform._create_compiler_jardep('2.11'), you should imitate the example set by the ScalaPlatform._tool_classpath(..) method, which consumes the --version option internally. You will probably need to expose new methods over there to do that.

custom_rules=shader_rules)
'scala-library',
classpath=[
ScalaPlatform._create_runtime_jardep('2.11')

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

Ditto above. This is user-defined.

JarDependency(
org='org.scala-lang',
name='scala-reflect',
rev='2.11.12',

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

Ditto. Needs to match the user-defined value.

# TODO: Kill zinc-cache-dir: https://github.com/pantsbuild/pants/issues/6155
# But for now, this will probably fail remotely because the homedir probably doesn't exist.
zinc_args.extend(['-zinc-cache-dir', self._zinc_cache_dir])
#zinc_args.extend(['-compiler-interface', compiler_interface])

This comment has been minimized.

@stuhood

stuhood added a commit that referenced this pull request Sep 11, 2018

Add bootstrapper jar to compile the compile-bridge. (#6462)
### Problem

The `_zinc_cache_dir` option relies on the homedir existing to work, which is a blocker for remoting.
Discussion in #6155 .

### Solution

Implement a new Scala tool, `bootstrapper`, that takes the relevant jars as arguments and compiles the compiler bridge. This will be called, possibly remotely, from the `zinc` subsystem.

### Result

- A bit of refactoring of the `compiler` project.
- The creation of a `bootstrapper` target.

This PR is related to #6463

### Acknowledgements

This work was made together with @dotordogh and @illicitonion. Even if we used my fork for convenience, we programmed most of the code together. This work is based on @ity's work on the same problem.

@blorente blorente force-pushed the blorente:bdd/6155-python branch 2 times, most recently from 144281b to b07f8ae Sep 11, 2018

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 5f1e1d5 to 8867080 Sep 12, 2018

context._scheduler.materialize_directories((
DirectoryToMaterialize(get_buildroot(), res.output_directory_digest),
))
return bridge_jar

This comment has been minimized.

@illicitonion

illicitonion Sep 12, 2018

Contributor

This should return a ClasspathEntry whose path is bridge_jar, and whose directory_digest is req.output_directory_digest

Then the consumer should merge the directory digest into its set of directory digests :)

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 8867080 to 79cb42b Sep 12, 2018

stuhood added a commit that referenced this pull request Sep 12, 2018

Apply workaround similer to #6409 to bootstrapper (#6498)
## Problem & Solution

The new zinc-bootstrapper jar had the same issue as the compiler, described in #6160.
This PR applies a similar workaround to #6409, as a step towards passing regex-based CI tests in #6463.

This needs a pants committer to publish zinc-bootstrapper.
@stuhood
Copy link
Member

stuhood left a comment

Thanks! For more information on how scala versions are configured, please see: https://www.pantsbuild.org/scala.html#scala-version

.copy(intransitive=True)
])

register_scala_tools_for_versions(

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

Since you're calling this method with lists, it doesn't look like it needs to be a method. Alternatively, could move the loops out and have just the register call be wrapped in a method.

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

:return: The absolute path to the compiled scala-compiler-bridge jar.
"""
bridge_jar = os.path.join(self._compiler_bridge_cache_dir, 'scala-compiler-bridge.jar')
is_executing_remotely = context.options.for_global_scope().remote_execution_server

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

I'm not sure that this decision needs to be made explicitly here.

I think that what this method can do instead is always return a ClasspathEntry for the bridge jar, and have logic that looks approximately like:

if os.path.exists(bridge_jar):
  # Snapshot the bridge jar with `scheduler.capture_snapshot` and return a `ClasspathEntry`.
else:
  # Run `_run_bootstrapper`, materialize locally, and return a `ClasspathEntry`.

The only downside of this simplification is that when we're running completely remotely, we don't technically need to materialize the jar locally. But I think not having the is_executing_remotely logic is worth it.

This comment has been minimized.

@blorente

blorente Sep 12, 2018

Contributor

I honestly could be persuaded either way, don't really have a strong intuition about the cost of materializing.

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

The overheads of either capturing or materializing a Snapshot shouldn't be relevant as long as it is memoized to only occur once per pants run.

# But for now, this will probably fail remotely because the homedir probably doesn't exist.
zinc_args.extend(['-zinc-cache-dir', self._zinc_cache_dir])
compiler_bridge_classpath_entry = self._zinc.compile_compiler_bridge(self.context)
zinc_args.extend(["-compiled-bridge-jar", compiler_bridge_classpath_entry.path])

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

Single quotes for consistency here probably.

)
return context.execute_process_synchronously_or_raise(req, 'zinc-subsystem', [WorkUnitLabel.COMPILER])

def compile_compiler_bridge(self, context):

This comment has been minimized.

@stuhood

stuhood Sep 12, 2018

Member

This method is called for every zinc compile. Either the zinc task should memoize it, or this subsystem should. Following the example from #6491, I'd suggest memoizing it here by marking this method @memoized.

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 10a0cba to 3e26ea8 Sep 13, 2018

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 3e26ea8 to 8b13014 Sep 13, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 13, 2018

Had an offline conversation that indicated that there was one more perf improvement to be made here before landing. Will hold off.

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 6921d8f to 0bdda82 Sep 14, 2018

@blorente blorente force-pushed the blorente:bdd/6155-python branch from 0bdda82 to 5414948 Sep 14, 2018

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Sep 14, 2018

This passes CI with reasonable times, but I'm still a bit hesitant about the way we retrieve the necessary jars. Currently, we register a tool scalac for every version of scala we might want here. Scalac brings with it all the jars we need (scala-compiler, scala-library and scala-reflect). The part I'm iffy about is how to retrieve them. Currently, we just do string matching on scalac's classpath here. I can't think of a situation in which this doesn't work, but string matching is not usually desirable, so I'd like an extra pair of eyes on this (@stuhood )?

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 14, 2018

Currently, we just do string matching on scalac's classpath here. I can't think of a situation in which this doesn't work, but string matching is not usually desirable, so I'd like an extra pair of eyes on this (@stuhood )?

We used to use this zinc code to do it:

/**
* Distinguish the compiler and library jars.
*/
def splitScala(jars: Seq[File], excluded: Set[String] = Set.empty): Option[ScalaJars] = {
val filtered = jars filterNot (excluded contains _.getName)
val (compiler, other) = filtered partition (_.getName matches ScalaCompiler.pattern)
val (library, extra) = other partition (_.getName matches ScalaLibrary.pattern)
if (compiler.nonEmpty && library.nonEmpty) Some(ScalaJars(compiler(0), library(0), extra)) else None
}

So two ways to think about this:

  1. We could continue to have the zinc code do it by just passing in the entire scala compiler classpath
  2. It's fine for the python code to do it the same way the scala code used to.

I'm fine with either.

@stuhood stuhood merged commit 8e340d8 into pantsbuild:master Sep 14, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment