New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compile.rsc] Add strategy for compiling with Rsc and Zinc #6408

Merged
merged 12 commits into from Sep 18, 2018

Conversation

Projects
None yet
5 participants
@baroquebobcat
Copy link
Contributor

baroquebobcat commented Aug 28, 2018

Rsc is an experimental Scala compiler focused on compilation speed (https://github.com/twitter/rsc).

Rsc implements so-called "outlining" for Scala, i.e. the process of computing type signatures for public, protected and private[package] members. These type signatures are saved to disk in SemanticDB format.

Rsc is also part of a toolchain that can produce jars containing type signatures in the format that the Scala compiler can understand. Using that tool chain we hope to be able to parallelize Scala compiles by first computing outlines for the dependency tree and then compiling the entire dependency tree in parallel.

This patch provides a first cut at implementing support for that toolchain.

@xeno-by
Copy link
Contributor

xeno-by left a comment

Super glad to see this. Thank you for your hard work! 🎉

Since my knowledge of Pants internals is superficial, this review focuses on naming, since I believe it would be useful to establish a common dictionary of terms across Scalameta, Rsc and Pants.

I've also left a few implementation-related comments, but those are pretty minor.

log_dir, zinc_args_file, sources)
self.rsc_mjar_file = rsc_mjar_file
self.rsc_outline_dir = rsc_outline_dir
self.outline_dir = outline_dir

This comment has been minimized.

@xeno-by

xeno-by Aug 28, 2018

Contributor

I'd like propose the following directory structure of Rsc-based artifacts and naming scheme for those artifacts:

  • ... referred to as target_workdir (in this PR: the same)
    • rsc_compile (in this PR: mjar)
      • index referred to as rsc_index_dir (in this PR: semanticdb aka rsc_outline_dir)
      • outline referred to as rsc_outline_dir (in this PR: rsc aka outline_dir)
      • mjar.jar referred to as rsc_mjar_file (in this PR: the same)
    • zinc_compile
      • ...

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

So, like this?

rsc_compile/
  - index/ 
       semanticdbs from metacp/metai
  - outline/
       semanticdbs from running rsc/metai
  mjar.jar -- the mjar produced from the outlines
zinc_compile/
  <usual zinc compile elements>

This comment has been minimized.

@xeno-by

xeno-by Aug 29, 2018

Contributor

Yes. Thank you!

Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py
for cp_entry in jvm_lib_jars_abs:
metai_classpath.append(desandboxify_pantsd_loc(status_elements[cp_entry]))

metacp_classpath_entries = metai_classpath

This comment has been minimized.

@xeno-by

xeno-by Aug 28, 2018

Contributor

Perhaps we can replace metacp_classpath_entries with metai_classpath since they seem to do the same thing?

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

That's a good point.

name = "bin",
main = "org.pantsbuild.testproject.mutual.A",
dependencies = [":mutual"],
)

This comment has been minimized.

@xeno-by

xeno-by Aug 28, 2018

Contributor

This doesn't seem to test the situation when one Rsc-based compilation depends on another. Perhaps, that was intended, but I figured it'd make sense to note this.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

no, this just covers the mutually dependent files case. I should add one for multiple rsc compiles too

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

'jvm-platform': {'compiler': 'rsc'}
}

with self.temporary_workdir() as workdir:

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

All pants runs run in a custom workdir by default, so it shouldn't be necessary to use run_pants_with_workdir unless you want to run multiple runs in the same temporary workdir. Can use run_pants instead.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

Fair point. I think this comes from copying from the javac integration tests. Those probably have the same issue.

def relpathify(collection):
# Converts a list of paths into relpath'd paths.
buildroot = get_buildroot()
def relpath(p):

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

The error message from fast_relpath should already give this info (and a bit more), so I don't think this method carries its weight.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

I'll Inline it then. The "x does not contain y" message was confusing for me at the time, but I think the error will be confusing either way.

return [relpath(c) for c in collection]


def relpathalone(path):

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

Ditto.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

k. The purpose of these was to explicitly call out where I was relpathing without adding too much clutter.

scalameta_version = '4.0.0-M10'

# TODO: it would be better to have a less adhoc approach to handling
# optional dependencies. See: https://github.com/pantsbuild/pants/issues/6390

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

Are they actually optional? If they're required to use metacp for a particular usecase, it seems like maybe a separate artifact for the main method that explicitly has these deps would be a good idea.

This comment has been minimized.

@xeno-by

xeno-by Aug 29, 2018

Contributor

They are optional in the sense of <optional>true</optional> in the corresponding POM file. For example: https://oss.sonatype.org/content/repositories/releases/org/scalatest/scalatest_2.11/3.0.0/scalatest_2.11-3.0.0.pom.

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

But why would your non-test code depend on scalatest? Based on my understanding of poms/maven, that should be a "test scoped" dependency, not an optional dependency...?

This comment has been minimized.

@xeno-by

xeno-by Aug 29, 2018

Contributor

@baroquebobcat Please correct me if I'm wrong, but here's I think what's going on here.

Metacp eagerly converts all classfiles on a given classpath into SemanticDB type signatures. As a result, for every jar_library, it typically needs all libraries that it depends on - both regular and optional dependencies.

Now, if I understand correctly, self._zinc.compile_classpath that is used to compute the main argument of Metacp, doesn't include optional dependencies.

Again, if I understand correctly, to work around this, this code gets together the optional dependencies of some popular 3rdparty libraries used in Source (workaround-metacp-dependency-classpath) and passes them to all Metacp invocations - regardless of whether these invocations actually depend on those things or not. In many cases that's an overkill, like with scalatest and non-test targets.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

Originally, this was just a list of optional dependencies for scalac and scala-library. I think I ended up adding more at some point. I'll xx the additional ones.

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

I think that requiring folks to download all optional dependencies to use this compilation strategy is probably going to be a blocker in the long run. Should obviously leave this in place for now, but I don't think the solution described in #6390 is the right one.

(and thanks for the offline discussion to clear this up!)

This comment has been minimized.

@xeno-by

xeno-by Sep 12, 2018

Contributor

Here's a workaround that makes Metacp ignore missing dependencies: scalameta/scalameta#1767.

JarDependency(org = 'org.testng', name = 'testng', rev = '6.8.7'),
JarDependency(org = 'org.scalacheck', name = 'scalacheck_2.11', rev = '1.13.1'),
JarDependency(org = 'org.jmock', name = 'jmock-legacy', rev = '2.5.1'),
JarDependency(org = 'org.easymock', name = 'easymockclassextension', rev = '3.1'),

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

This definitely doesn't seem like it should be a runtime dep of this tool.

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Upgrading to Scalameta 4.0.0-RC1 and using --stub-broken-signatures should fix that. Although I'm not sure whether you guys would prefer to do that in this pull request or in a follow-up.

super(RscCompile, self).create_empty_extra_products()

compile_classpath = self.context.products.get_data('compile_classpath')
classpath_product = self.context.products.get_data('rsc_outline_classpath')

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

It would be good to have an explanation somewhere of what the rsc_outline_classpath product is, and what its relation to the runtime_classpath is. If it's not meant to be exposed outside of the task, should it actually be a product?

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

That's a fair point. I don't expect that the mjars will be used outside of rsc. I mainly made it a product because all of the code that constructs and updates classpaths use products to represent them. I suppose I could see what making it an instance variable does instead.

# rsc is specified.
return BaseZincCompile.execute(self)

def rsc_outline_key_for_target(self, compile_target):

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

Should keep methods private _* where possible.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

Done.

safe_delete(ctx.analysis_file)
# Work around https://github.com/pantsbuild/pants/issues/3670
safe_rmtree(ctx.classes_dir)

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

This method looks like a near-copy of the ZincCompile version of the method. Would be good to dedupe it... maybe before landing this.

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

That's fair.

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

I think that of all of my comments, this is the only real blocker for landing this. The rest would be reasonable for a followup.

def create_compile_context(self, target, target_workdir):
sources = self._compute_sources_for_target(target)
return [
RscCompileContext(

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

Might be sufficiently large to justify using named arguments / kwargs here.

]

def _runtool(self, main, tool_name, args, distribution, tgt=None, input_files=tuple(), output_dir=None):
if self.execution_strategy == self.HERMETIC:

This comment has been minimized.

@stuhood

stuhood Aug 29, 2018

Member

Would it be reasonable to support only one execution mode here?

This comment has been minimized.

@xeno-by

xeno-by Aug 29, 2018

Contributor

What's the difference between hermetic and non-hermetic (?) execution modes? It fair to say that the former relates to v2 and the latter relates to v1? Or it's about the former being compatible with remoting and the latter being incompatible?

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

hermetic is v2 remoting. I like having the fallback of non-hermetic (subprocess or nailgun), but you are right that it makes things more complex.

safe_rmtree(ctx.classes_dir)

dep_context = DependencyContext.global_instance()
tgt, = vts.targets

This comment has been minimized.

@wisechengyi

wisechengyi Aug 29, 2018

Contributor

Is the comma intentional? there's another instance below

>>> t, = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> t, = [5,6]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

It comes from upstream. I'm not sure why it's there but I didn't want to change it here without updating there. I think I'll leave it in when I dedupe the above so as not to further confuse the diff.

metacp_result = json.loads(metacp_stdout)
metai_classpath = []

def desandboxify_pantsd_loc(path):

This comment has been minimized.

@wisechengyi

wisechengyi Aug 29, 2018

Contributor

nit: assuming pantsd means workdir here, can we use workdir or _pants_d_?

This comment has been minimized.

@baroquebobcat

baroquebobcat Aug 29, 2018

Contributor

I tried that and it didn't work with tests because of how tests place their temp workdirs. I have an idea for a different approach now though.

def stdout_contents(wu):
if isinstance(wu, FallibleExecuteProcessResult):
return wu.stdout.rstrip()
with open(wu.output_paths()['stdout']) as f:

This comment has been minimized.

@illicitonion

illicitonion Aug 29, 2018

Contributor

For py3-ness, I think you need a , 'rb' on this open.


epr = ExecuteProcessRequest(
argv=tuple(cmd),
env=dict(),

This comment has been minimized.

@illicitonion

illicitonion Aug 29, 2018

Contributor

A bunch of these are default values and can be dropped :)

from pants_test.backend.jvm.tasks.jvm_compile.base_compile_integration_test import BaseCompileIT


class RscCompileIntegration(BaseCompileIT):

This comment has been minimized.

@illicitonion

illicitonion Aug 29, 2018

Contributor

I'd love to see tests of java, scala, and mixed java+scala targets, as there appears to be code which covers each of them?

pathglobs.extend(input_files)
root = PathGlobsAndRoot(
PathGlobs(tuple(pathglobs)),
text_type(get_buildroot()))

This comment has been minimized.

@illicitonion

illicitonion Aug 29, 2018

Contributor

get_buildroot() should already return a text_type (and below)

@stuhood
Copy link
Member

stuhood left a comment

Thanks! As soon as this is green with a minimal set of tests, lets land it and begin iterating.

classpath=[
JarDependency(
org='org.scalameta',
name='metai_2.11',

This comment has been minimized.

@stuhood

stuhood Sep 17, 2018

Member

As a followup task, we need to figure out which of these tools need to match the scala version that is in use (ie, which mix with the classpath of the code under compilation, and which don't), and extract settings for them.

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

None of these tools need to match the Scala version used to compile Scala targets. When the output of the tool varies between target Scala version, i.e. for Rsc and Mjar, the corresponding tool has an --abi CLI parameter (and an accompanying field in the Settings object if used from Scala or Java).

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 18, 2018

Contributor

I've noted the abi param in my list of things to break out into further tickets.

Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/rsc/rsc_compile.py
@xeno-by
Copy link
Contributor

xeno-by left a comment

I left a few more suggestions, but I don't have a preference for whether they need to be addressed in this PR or in followups.

zinc_args_file,
sources,
index_dir,
outline_dir):

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Let's call these parameters and associated fields rsc_index_dir and rsc_outline_dir? I believe these names would benefit from qualification, since "index" has many meanings, and outlines can be produced by at least three more tools - semanticdb-scalac, semanticdb-javac and metacp.

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 18, 2018

Contributor

Done.

JarDependency(org = 'org.testng', name = 'testng', rev = '6.8.7'),
JarDependency(org = 'org.scalacheck', name = 'scalacheck_2.11', rev = '1.13.1'),
JarDependency(org = 'org.jmock', name = 'jmock-legacy', rev = '2.5.1'),
JarDependency(org = 'org.easymock', name = 'easymockclassextension', rev = '3.1'),

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Upgrading to Scalameta 4.0.0-RC1 and using --stub-broken-signatures should fix that. Although I'm not sure whether you guys would prefer to do that in this pull request or in a follow-up.

classpath=[
JarDependency(
org='org.scalameta',
name='metai_2.11',

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

None of these tools need to match the Scala version used to compile Scala targets. When the output of the tool varies between target Scala version, i.e. for Rsc and Mjar, the corresponding tool has an --abi CLI parameter (and an accompanying field in the Settings object if used from Scala or Java).

elif compile_target.has_sources('.scala'):
return "rsc({})".format(compile_target.address.spec)
else:
raise TaskError('unexpected target for compiling with rsc_outline .... {}'.format(compile_target))

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

"For compiling with rsc"?

'--dependency-classpath', os.pathsep.join(scalac_classpath_path_entries),
# NB: The directory to dump the semanticdb jars generated by metacp.
# TODO: break this out into a separate job so that we can apply
# it once per 3rdparty target.

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Is there a ticket for that? If not, could you open one and mention it in the comment?

# Currently, the metai step depends on the fact that materializing
# ignores existing files. It should write the files to a different
# location, either by providing inputs from a different location,
# or invoking a script that does the copying

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

How much of a blocker is this in the grand scheme of things?

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 18, 2018

Contributor

I thought it'd be a big one, but right now it doesn't seem to be an issue. Once remoting's validation is tightened up further it may become one though.

This comment has been minimized.

@xeno-by

xeno-by Sep 18, 2018

Contributor

Do you think there should be a ticket for this?

)

# Create the zinc compile jobs.
# - Scala zinc compile jobs depend on the results of running rsc_outline on the scala target.

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

"The results of running rsc"?

return ccs[1]

def create_compile_context(self, target, target_workdir):
# workdir layout:

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Thank you for documenting this!

input_files=input_files_directory_digest,
output_files=tuple(),
output_directories=(output_dir,),
timeout_seconds=15*60,

This comment has been minimized.

@xeno-by

xeno-by Sep 17, 2018

Contributor

Is there something special about a timeout of 15 minutes, or it's picked simply as a period of time that is reasonably long?

This comment has been minimized.

@baroquebobcat

baroquebobcat Sep 18, 2018

Contributor

It's the reasonably long thing. It may be a current default somewhere. At some point, we should probably thread config through for it, but this patch doesn't do that.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 18, 2018

There was a definitely-unrelated flake on an integration test shard. Restarted it.

@baroquebobcat baroquebobcat merged commit 0614842 into pantsbuild:master Sep 18, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment