Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc.scaladoc very large command-line option sets passed via external file #7620

Conversation

Projects
None yet
3 participants
@thoward
Copy link
Contributor

commented Apr 25, 2019

Problem

There is an upper limit to the size of commandline options that can be passed to the Scaladoc command. For targets that involve a lot of source files and classpaths, this can go over the limits, causing the job to silently fail. See issue #7435 for details.

Solution

Detect option sets that are larger than the limit, and write them out to a external text file, then pass that text file to the Scaladoc command.

Result

This allows users to run Scaladoc against large dependency graphs, especially when --combined is used (required for cross-linked Scaladocs). It causes an additional transient artifact to be written to the .pants.d/ directory. That file can be large (example, in my use case, which led me to deal with this error, the option set was a 1MB string, and thus, created a 1MB text file as a side effect of the doc build process).

This file is left in place, for debugging purposes, but it's truncated and overwritten on subsequent runs. That means, only one concurrent run can happen in the same directory/repository, but that's already true for --combined builds, which output to a fixed directory location and so aren't concurrency-tolerant.

@cosmicexplorer
Copy link
Contributor

left a comment

This file is left in place, for debugging purposes, but it's truncated and overwritten on subsequent runs. That means, only one concurrent run can happen in the same directory/repository, but that's already true for --combined builds, which output to a fixed directory location and so aren't concurrency-tolerant.

But is it correct that the current iteration of this PR would break some current functionality when --combined is not on? Is it worth considering only enabling this feature when --combined is on?

Show resolved Hide resolved src/python/pants/backend/jvm/tasks/scaladoc_gen.py Outdated
# practical middle ground would be the MAX_ARG_STRLEN fixed value.
#
# So, given that rationale, if the arg list is longer than MAX_ARG_STRLEN
# we should put it in an external file, just in case.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 25, 2019

Contributor

Is there a reason we wouldn't want to dump everything into an argfile on every invocation? It would seem to remove the guesswork here, unless using an argfile removes some other functionality.

This comment has been minimized.

Copy link
@thoward

thoward Apr 26, 2019

Author Contributor

Could use an arg file for every invocation, but it would be unnecessary for most invocations. Running this with --combined across large dependency graphs is kind of a "large Scala monorepo"-specific problem. I'm not sure how Pant's user base looks. Are most users in large monorepos with lots of Scala? How commonly is this task used in this manner?

Not that there's much harm in using an arg file when it's not needed. I was just following the principle of least change (e.g. change the behaviour as little as possible to get the desired effect).

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 26, 2019

Member

I agree with @cosmicexplorer : it's nice to try and avoid this, but I don't think it's worth the extra codepath, so would just go with the argfile approach by default.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 27, 2019

Contributor

The way I usually think about these situations is that command-line tools offer interfaces for a user as well as interfaces for other programs -- while using arguments on the actual command line is likely the more common and preferred way to invoke this tool by hand, in Pants, we can feel free to use approaches such as arg files, or maybe config files generated on the fly, etc.

Show resolved Hide resolved src/python/pants/backend/jvm/tasks/scaladoc_gen.py
Update src/python/pants/backend/jvm/tasks/scaladoc_gen.py
Co-Authored-By: thoward <thoward37@gmail.com>
@thoward

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2019

This file is left in place, for debugging purposes, but it's truncated and overwritten on subsequent runs. That means, only one concurrent run can happen in the same directory/repository, but that's already true for --combined builds, which output to a fixed directory location and so aren't concurrency-tolerant.

But is it correct that the current iteration of this PR would break some current functionality when --combined is not on? Is it worth considering only enabling this feature when --combined is on?

I don't think it would break any functionality when --combined is off. The problem isn't specific to --combined, it's just endemic to it, because when running a combined build, so many different files & classpaths are needed that the arg list gets very large. In theory, a single target could also exceed these limits (though in practice I doubt that would happen), so if we made this behaviour only applicable to --combined runs, then it would not help those hypothetically very large single targets.

# practical middle ground would be the MAX_ARG_STRLEN fixed value.
#
# So, given that rationale, if the arg list is longer than MAX_ARG_STRLEN
# we should put it in an external file, just in case.

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 26, 2019

Member

I agree with @cosmicexplorer : it's nice to try and avoid this, but I don't think it's worth the extra codepath, so would just go with the argfile approach by default.

java_executor = SubprocessExecutor(DistributionLocator.cached())
runner = java_executor.runner(jvm_options=self.jvm_options,
classpath=tool_classpath,
main='scala.tools.nsc.ScalaDoc',
args=args)

self.context.log.debug("SCALADOCS COMMAND: {}".format(runner.command))

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 26, 2019

Member

FTR: Fine to add this, but it should also be rendered in ./pants server --open, and possible to extract that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.