Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build pipelining #5703

Merged
merged 1 commit into from
Aug 6, 2020
Merged

Build pipelining #5703

merged 1 commit into from
Aug 6, 2020

Conversation

eed3si9n
Copy link
Member

Ref sbt/zinc#744

pipelining

This implements ThisBuild / usePipelining, which configures incremental subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags -Ypickle-java and -Ypickle-write.

The other part of magic is the use of Def.promise:

earlyOutputPing := Def.promise[Boolean]

This notifies compileEarly task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full compile task together, compileEarly will never return, forever waiting for the earlyOutputPing.

@eed3si9n
Copy link
Member Author

eed3si9n commented Aug 4, 2020

Rebasing on top of develop since it became off sync. I'll merge this once it passes the tests.

@@ -87,32 +87,38 @@ object ClientTest extends AbstractServerTest {
assert(client("compile;willFail;willSucceed") == 1)
}
test("compi completions") { _ =>
val expected = Vector(
val expected = Set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you switch this back to Vector? Part of the spec for client completions is that the outputs be returned in sorted order.

Copy link
Contributor

@eatkins eatkins Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to merge without re-running CI, I can switch back to Vector in one of my outstanding branches.

> dep/compile

> use/checkPickle
$ sleep 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the sleep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We time wrap the stamp, and some of the platforms do not have the system time resolution, so I think I've put that in there to make sure things validate or invalidate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So maybe we could set the modified time manually instead of sleeping?

@@ -0,0 +1,8 @@
# done this way because last modified times often have ~1s resolution
Copy link
Contributor

@eatkins eatkins Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to define a different read stamps instance that always hashes rather than using sleep to ensure that the modified time changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you could just set the modified times manually of the files after the copy-file. I am nagging about this because CI is already very slow and I'd like to do whatever possible to slow the rate of growth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pipelining is somewhat special case situation because it's new and timing needs to be tested for both Linux and Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where and why timing matters. My assumption is that timestamps matter because the second time you run compile after a copy-file, it isn't guaranteed to run due to file hash caching using timestamps. Is that not the reason for the sleep?

Regardless, it seems to me that we should have a setting to disable timestamp caching since it could break incremental compilation for file systems that don't even support setting the last modified (I think that's a thing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible that I had misattributed some during-development failure to timestamp issue, and sleep is no longer needed. It would be good to probably confirm and smoke out exactly what invalidation gets affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed sleep in 002f97c.

@eatkins
Copy link
Contributor

eatkins commented Aug 4, 2020

Would it be possible to move a lot of the changes in Defaults.scala into a separate file? It already takes forever to format and compile Defaults.scala. It doesn't necessarily have to be in this PR, but I feel like all of the classpath and compile related settings should be moved into different files.

@eed3si9n
Copy link
Member Author

eed3si9n commented Aug 4, 2020

Would it be possible to move a lot of the changes in Defaults.scala into a separate file? It already takes forever to format and compile Defaults.scala. It doesn't necessarily have to be in this PR, but I feel like all of the classpath and compile related settings should be moved into different files.

I actually did try to do that by creating main/src/main/scala/sbt/internal/ClasspathImpl.scala and moving the not-so-relevant implementation details in there. The net change should be negative for Defaults.scala in this PR.

Ref sbt/zinc#744

This implements `ThisBuild / usePipelining`, which configures subproject pipelining available from Zinc 1.4.0.

The basic idea is to start subproject compilation as soon as pickle JARs (early output) becomes available. This is in part enabled by Scala compiler's new flags `-Ypickle-java` and `-Ypickle-write`.

The other part of magic is the use of `Def.promise`:

```
earlyOutputPing := Def.promise[Boolean],
```

This notifies `compileEarly` task, which to the rest of the tasks would look like a normal task but in fact it is promise-blocked. In other words, without calling full `compile` task together, `compileEarly` will never return, forever waiting for the `earlyOutputPing`.
@igor-ramazanov
Copy link

If this setting brings only advantages (speeding up) and no disadvantages, then does it make a sense making it enabled by default?

@eed3si9n
Copy link
Member Author

@igor-ramazanov The speedup effect seem to vary depending on the subproject layout and available CPU + heap. If we imagine a tree 🌴 where there is a big chunk of code in core (without macro or Java) and a bunch of small apps that use the core, pipeline can bring speedup. On the other hand, if the tree is reversed and you have a bunch of small utility subprojects and a big app at the end, pipeline doesn't seem to help much.

It's also not without downsides. Since the compilation tasks would happen almost arbitrary order, managing compilation failure is more complicated, and it's currently implemented by throwing exceptions and shutting down threads, which could get messy.

@igor-ramazanov
Copy link

@eed3si9n Thank you for the description! It's much clearer now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants