New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci:last-only] Remove parallel collections from scala-library #5603

Merged
merged 1 commit into from Mar 15, 2017

Conversation

Projects
None yet
8 participants
@szeiger
Contributor

szeiger commented Dec 16, 2016

They will be moved into a separate module as part of the ongoing
modularization of the Scala standard library.

@szeiger szeiger added the WIP label Dec 16, 2016

@scala-jenkins scala-jenkins added this to the 2.13.0-M1 milestone Dec 16, 2016

szeiger added a commit to scala/scala-parallel-collections that referenced this pull request Dec 16, 2016

Initial version of parallel collections from the Scala standard library
Moved over from scala/scala @ 36d6e48da258648413e50e5a9d46c94cfc062634
with the necessary adaptations to make it work as a separate module
on top of scala/scala#5603.

- Since the sequential collections cannot implement `Parallelizable`
  anymore, all conversions have to be performed via extension methods
  provided by `scala.collection.parallel.CollectionConverters._`

- `ParIterableLike.sequentially` was changed to use a `newCombiner`
  for building the parallel version from the temporary sequential
  collection.

- `Combiner` has a new convenience method `fromSequential`. `par`
  in `Parallelizable` is now implemented on top of this.

The build-related parts were copied from
https://github.com/scala/scala-swing.
@SethTisue

This comment has been minimized.

Show comment
Hide comment

@szeiger szeiger changed the title from Remove parallel collections from scala-library to [ci:last-only] Remove parallel collections from scala-library Jan 11, 2017

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Jan 11, 2017

Contributor

Rebased on current 2.13.x & added a commit to remove all tests for parallel collections

Contributor

szeiger commented Jan 11, 2017

Rebased on current 2.13.x & added a commit to remove all tests for parallel collections

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Jan 11, 2017

Member

@jvican fyi on modularization

Member

SethTisue commented Jan 11, 2017

@jvican fyi on modularization

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Jan 12, 2017

Contributor

Scabot didn't care about my ci:last-only after an earlier build had already failed, but the new one is green, so this is good to go. Removing wip label.

Contributor

szeiger commented Jan 12, 2017

Scabot didn't care about my ci:last-only after an earlier build had already failed, but the new one is green, so this is good to go. Removing wip label.

@szeiger szeiger removed the WIP label Jan 12, 2017

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Jan 12, 2017

Contributor

Here are the removed tests in scala-parallel-colletions: scala/scala-parallel-collections#6

Contributor

szeiger commented Jan 12, 2017

Here are the removed tests in scala-parallel-colletions: scala/scala-parallel-collections#6

szeiger added a commit to scala/scala-parallel-collections that referenced this pull request Jan 18, 2017

Initial version of parallel collections from the Scala standard library
Moved over from scala/scala @ 36d6e48da258648413e50e5a9d46c94cfc062634
with the necessary adaptations to make it work as a separate module
on top of scala/scala#5603.

- Since the sequential collections cannot implement `Parallelizable`
  anymore, all conversions have to be performed via extension methods
  provided by `scala.collection.parallel.CollectionConverters._`

- `ParIterableLike.sequentially` was changed to use a `newCombiner`
  for building the parallel version from the temporary sequential
  collection.

- `Combiner` has a new convenience method `fromSequential`. `par`
  in `Parallelizable` is now implemented on top of this.

- All parallel collections tests are now in two subprojects, for JUnit
  and ScalaCheck tests respectively. The main codebase is moved to the
  `core` subproject. We are currently using a ScalaCheck build for 2.12
  but it will become incompatible once we start getting serious with the
  new collections library, so this is not a long-term solution. We may
  have to insource ScalaCheck into this project as well, just as we did
  for scala/scala. At least the separation of ScalaCheck and JUnit tests
  will make it easy to temporarily disable the ScalaCheck tests if no
  compatible ScalaCheck build is available.

- Running on the old Travis infrastructure to avoid test failures
  (see travis-ci/travis-ci#3775)

- The build-related parts were copied from
  https://github.com/scala/scala-swing.
@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Feb 7, 2017

Member

Proposed next steps in modularization: https://github.com/scala/scala/compare/eb9fdcd...adriaanm:modularize-213-step1?expand=1.

#5677 lays the groundwork in 2.12.x

Member

adriaanm commented Feb 7, 2017

Proposed next steps in modularization: https://github.com/scala/scala/compare/eb9fdcd...adriaanm:modularize-213-step1?expand=1.

#5677 lays the groundwork in 2.12.x

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 14, 2017

Member

Opinions on trying to get this merged before M1? Is the new module (https://github.com/scala/scala-parallel-collections) already in a state where it could be published against M1?

Member

lrytz commented Mar 14, 2017

Opinions on trying to get this merged before M1? Is the new module (https://github.com/scala/scala-parallel-collections) already in a state where it could be published against M1?

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 14, 2017

Contributor

Everything is ready except for the publishing. I haven't yet found the time to work on this again. What's the best way to bootstrap? The code in the module overlaps with what we remove here, so we can't publish the module first. We need a version of Scala without parallel collections to publish the module against.

Contributor

szeiger commented Mar 14, 2017

Everything is ready except for the publishing. I haven't yet found the time to work on this again. What's the best way to bootstrap? The code in the module overlaps with what we remove here, so we can't publish the module first. We need a version of Scala without parallel collections to publish the module against.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 14, 2017

Member

Should the module be part of scala-library-all? Then we need to build it as part of our bootstrap job, I can take care of that. But if scala-library-all goes away in 2.13, we can skip this step. In this case, we could merge this PR, release M1, and then publish the new module afterwards.

Maybe we better run the community build before merging. Could you rebase to fix the merge conflict?

Member

lrytz commented Mar 14, 2017

Should the module be part of scala-library-all? Then we need to build it as part of our bootstrap job, I can take care of that. But if scala-library-all goes away in 2.13, we can skip this step. In this case, we could merge this PR, release M1, and then publish the new module afterwards.

Maybe we better run the community build before merging. Could you rebase to fix the merge conflict?

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 14, 2017

Member

I would support eliminating scala-library-all. (https://github.com/search?p=4&q=scala-library-all&type=Code&utf8=✓ only shows 258 results GitHub-wide, which is not zero but seems like a fairly negligible number to me. And I've never seen anyone recommend using it — books, blogs, chat rooms, etc.)

Member

SethTisue commented Mar 14, 2017

I would support eliminating scala-library-all. (https://github.com/search?p=4&q=scala-library-all&type=Code&utf8=✓ only shows 258 results GitHub-wide, which is not zero but seems like a fairly negligible number to me. And I've never seen anyone recommend using it — books, blogs, chat rooms, etc.)

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 14, 2017

Contributor

If we keep scala-library-all then parallel-collections should be a part of it IMHO, but I don't know why we'd want to keep scala-library-all, either.

I'll rebase this PR to get it ready to merge again and port any changes that have been done in the meantime to the parallel-collections repo.

Contributor

szeiger commented Mar 14, 2017

If we keep scala-library-all then parallel-collections should be a part of it IMHO, but I don't know why we'd want to keep scala-library-all, either.

I'll rebase this PR to get it ready to merge again and port any changes that have been done in the meantime to the parallel-collections repo.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 14, 2017

Contributor

Rebased. There was one merge conflict in the imports of WrappedArray and a bunch of scalacheck tests were moved around in #5661. Nothing new that needs to be ported to the separate module.

Contributor

szeiger commented Mar 14, 2017

Rebased. There was one merge conflict in the imports of WrappedArray and a bunch of scalacheck tests were moved around in #5661. Nothing new that needs to be ported to the separate module.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 14, 2017

Member

scalacheck tests for parallel collections got restored accidentally, probably caused by #5661

[info] Compiling 62 Scala sources to /home/jenkins/workspace/scala-2.13.x-validate-test/target/scalacheck/test-classes...
[error] /home/jenkins/workspace/scala-2.13.x-validate-test/test/scalacheck/scala/collection/parallel/ParallelHashTrieCheck.scala:21: not found: type ParHashMap
[error]   type CollType = ParHashMap[K, V]
[error]                   ^
Member

lrytz commented Mar 14, 2017

scalacheck tests for parallel collections got restored accidentally, probably caused by #5661

[info] Compiling 62 Scala sources to /home/jenkins/workspace/scala-2.13.x-validate-test/target/scalacheck/test-classes...
[error] /home/jenkins/workspace/scala-2.13.x-validate-test/test/scalacheck/scala/collection/parallel/ParallelHashTrieCheck.scala:21: not found: type ParHashMap
[error]   type CollType = ParHashMap[K, V]
[error]                   ^
@dwijnand

This comment has been minimized.

Show comment
Hide comment
@dwijnand

dwijnand Mar 15, 2017

Member

Does the decision for what's in or not scala-library-all determine what's present in the scala REPL?

Or is that a separate decision?

Should scala 2.13 include the parallel collections?

Member

dwijnand commented Mar 15, 2017

Does the decision for what's in or not scala-library-all determine what's present in the scala REPL?

Or is that a separate decision?

Should scala 2.13 include the parallel collections?

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 15, 2017

Member

Does the decision for what's in or not scala-library-all determine what's present in the scala REPL?

Good point, we do ship these modules with the downloadable distribution. If we remove scala-library-all, we have to decide what to do on this front. It would be more elegant to get them from maven-central, have a list of default dependencies, and allow users to add more.

Should scala 2.13 include the parallel collections?

The plan is to move them to a module, out of scala-library.

Member

lrytz commented Mar 15, 2017

Does the decision for what's in or not scala-library-all determine what's present in the scala REPL?

Good point, we do ship these modules with the downloadable distribution. If we remove scala-library-all, we have to decide what to do on this front. It would be more elegant to get them from maven-central, have a list of default dependencies, and allow users to add more.

Should scala 2.13 include the parallel collections?

The plan is to move them to a module, out of scala-library.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 15, 2017

Contributor

Updated and squashed.

Contributor

szeiger commented Mar 15, 2017

Updated and squashed.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 15, 2017

Member
[info] Compiling 44 Scala sources to /home/jenkins/workspace/scala-2.13.x-validate-test/target/scalacheck/test-classes...
[error] /home/jenkins/workspace/scala-2.13.x-validate-test/test/scalacheck/scala/pc.scala:5: object parallel is not a member of package collection
[error] import scala.collection.parallel._
[error]                         ^

I guess you missed a few.. In the meantime, I started a community build:
https://scala-ci.typesafe.com/view/scala-2.13.x/job/scala-2.13.x-integrate-community-build/271/console

Member

lrytz commented Mar 15, 2017

[info] Compiling 44 Scala sources to /home/jenkins/workspace/scala-2.13.x-validate-test/target/scalacheck/test-classes...
[error] /home/jenkins/workspace/scala-2.13.x-validate-test/test/scalacheck/scala/pc.scala:5: object parallel is not a member of package collection
[error] import scala.collection.parallel._
[error]                         ^

I guess you missed a few.. In the meantime, I started a community build:
https://scala-ci.typesafe.com/view/scala-2.13.x/job/scala-2.13.x-integrate-community-build/271/console

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 15, 2017

Contributor

Bah, I thought it was only JUnit tests. I didn't run partest locally.

Contributor

szeiger commented Mar 15, 2017

Bah, I thought it was only JUnit tests. I didn't run partest locally.

@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 15, 2017

Contributor

Oh, I see. I ran the wrong set of JUnit tests.

Contributor

szeiger commented Mar 15, 2017

Oh, I see. I ran the wrong set of JUnit tests.

Remove parallel collections from scala-library
They will be moved into a separate module as part of the ongoing
modularization of the Scala standard library.

Tests specific to parallel collections will be reinstated in the new
scala-parallel-collections project as JUnit tests or pure ScalaCheck
tests (run directly from sbt).

Some test cases here contain useful tests for both, parallel collections
and other features of the standard library. They are split up into
separate JUnit tests, which leads to some new JUnit tests in this
project, too.
@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 15, 2017

Member

There will be some things to handle, as shown by the community build:

[scala-js] [error] /home/jenkins/workspace/scala-2.13.x-integrate-community-build/target-0.9.7-RC1/project-builds/scala-js-7a00b322d46fa67b6a790f73709ab9e9984ef334/tools/jvm/src/main/scala/org/scalajs/core/tools/linker/frontend/optimizer/ParIncOptimizer.scala:14: object parallel is not a member of package collection
[scala-js] [error] import scala.collection.parallel.mutable.{ParTrieMap, ParArray}
[scala-js] [error]                         ^
Member

lrytz commented Mar 15, 2017

There will be some things to handle, as shown by the community build:

[scala-js] [error] /home/jenkins/workspace/scala-2.13.x-integrate-community-build/target-0.9.7-RC1/project-builds/scala-js-7a00b322d46fa67b6a790f73709ab9e9984ef334/tools/jvm/src/main/scala/org/scalajs/core/tools/linker/frontend/optimizer/ParIncOptimizer.scala:14: object parallel is not a member of package collection
[scala-js] [error] import scala.collection.parallel.mutable.{ParTrieMap, ParArray}
[scala-js] [error]                         ^
@szeiger

This comment has been minimized.

Show comment
Hide comment
@szeiger

szeiger Mar 15, 2017

Contributor

As expected. All projects that use parallel collections will need the new project as a dependency. If they use .par they also need an additional import because that is now an extension method.

Contributor

szeiger commented Mar 15, 2017

As expected. All projects that use parallel collections will need the new project as a dependency. If they use .par they also need an additional import because that is now an extension method.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 15, 2017

Member

If we want to merge this and get a green community build, we need to add https://github.com/scala/scala-parallel-collections to the community build. Projects (or our forks) need to add a dependency, probably in a way that it only applies if the Scala version is 2.13*.

Member

lrytz commented Mar 15, 2017

If we want to merge this and get a green community build, we need to add https://github.com/scala/scala-parallel-collections to the community build. Projects (or our forks) need to add a dependency, probably in a way that it only applies if the Scala version is 2.13*.

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 15, 2017

Member

from https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/271/consoleFull

[info] >>> The dbuild result is------------: FAILED (failed: scala-js, twitter-util, fs2, specs2)

lots of projects depend on scala-js and specs2 especially, so those are probably masking additional downstream failures

I think we should bite the bullet here, even if it initially causes the 2.13 community build to contract substantially. we can gradually chip away at growing it again, one project at a time.

Member

SethTisue commented Mar 15, 2017

from https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/271/consoleFull

[info] >>> The dbuild result is------------: FAILED (failed: scala-js, twitter-util, fs2, specs2)

lots of projects depend on scala-js and specs2 especially, so those are probably masking additional downstream failures

I think we should bite the bullet here, even if it initially causes the 2.13 community build to contract substantially. we can gradually chip away at growing it again, one project at a time.

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 15, 2017

Member

Note that that test run still includes 40 SUCCESSful projects (of 94 total)

Member

SethTisue commented Mar 15, 2017

Note that that test run still includes 40 SUCCESSful projects (of 94 total)

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Mar 15, 2017

Member

Ah, you're suggesting to shrink the community build for releasing M1 to the projects that don't depend on parallel collections? That would be easier than trying to fix those projects beforehand, because with M1 they have a 2.13 version that they can use themselves, and there will also have a release of the new module. +1

Member

lrytz commented Mar 15, 2017

Ah, you're suggesting to shrink the community build for releasing M1 to the projects that don't depend on parallel collections? That would be easier than trying to fix those projects beforehand, because with M1 they have a 2.13 version that they can use themselves, and there will also have a release of the new module. +1

@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 15, 2017

Member

Ah, you're suggesting to shrink the community build for releasing M1 to the projects that don't depend on parallel collections

yes...

...except that before M1 I intend to fix 2 or 3 key projects like scala-js and specs2 myself, in forks, so we know that this all works for at least a few projects, and because that will allow more downstream projects to be built

I've made a ticket on it (scala/community-builds#495) and added a checkbox to scala/scala-dev#332, too

Member

SethTisue commented Mar 15, 2017

Ah, you're suggesting to shrink the community build for releasing M1 to the projects that don't depend on parallel collections

yes...

...except that before M1 I intend to fix 2 or 3 key projects like scala-js and specs2 myself, in forks, so we know that this all works for at least a few projects, and because that will allow more downstream projects to be built

I've made a ticket on it (scala/community-builds#495) and added a checkbox to scala/scala-dev#332, too

@SethTisue SethTisue merged commit e2a2cba into scala:2.13.x Mar 15, 2017

5 checks passed

cla @szeiger signed the Scala CLA. Thanks!
Details
integrate-ide [140] SUCCESS. Took 32 s.
Details
validate-main [144] SUCCESS. Took 62 min.
Details
validate-publish-core [144] SUCCESS. Took 5 min.
Details
validate-test [116] SUCCESS. Took 56 min.
Details
@SethTisue

This comment has been minimized.

Show comment
Hide comment
@SethTisue

SethTisue Mar 17, 2017

Member

I wonder how much this knocks off our PR validation time.

Member

SethTisue commented Mar 17, 2017

I wonder how much this knocks off our PR validation time.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jun 1, 2017

Member

I noticed that this PR has improved the performance of our compiler benchmarks. I haven't investigated the root cause the improvement, but I'd wager that reducing the complexity of base type sequences of collections (which are computed any time someone compiles against the collections, which is pretty much every compilation) is a key factor.

https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=ef63cb91bf%7Ce2a2cbacd0&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=now-6M&to=now

image

Member

retronym commented Jun 1, 2017

I noticed that this PR has improved the performance of our compiler benchmarks. I haven't investigated the root cause the improvement, but I'd wager that reducing the complexity of base type sequences of collections (which are computed any time someone compiles against the collections, which is pretty much every compilation) is a key factor.

https://scala-ci.typesafe.com/grafana/dashboard/db/pr-validation?var-scalaSha=ef63cb91bf%7Ce2a2cbacd0&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@&from=now-6M&to=now

image

@jvican

This comment has been minimized.

Show comment
Hide comment
@jvican

jvican Jun 1, 2017

Member

Wow... Do you guys have benchmarks with other projects to check if it's reproducible outside of better-files?

Member

jvican commented Jun 1, 2017

Wow... Do you guys have benchmarks with other projects to check if it's reproducible outside of better-files?

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jun 1, 2017

Member

We also see it in Vector.scala (the third chart), and to a lesser degree in scalap (the second chart).

Given that base type sequences are cached, it would seem large compilation batches would not see the improvement (the one-time extra cost of the more complex BTS computation is amortized against lots of places that use the type).

But, reducing overhead for compiling a single file is still really helpful, as that's what people experience in incremental compilation.

Member

retronym commented Jun 1, 2017

We also see it in Vector.scala (the third chart), and to a lesser degree in scalap (the second chart).

Given that base type sequences are cached, it would seem large compilation batches would not see the improvement (the one-time extra cost of the more complex BTS computation is amortized against lots of places that use the type).

But, reducing overhead for compiling a single file is still really helpful, as that's what people experience in incremental compilation.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jun 1, 2017

Member

We also have a benchmark series for compiling scala-{library,reflect,compiler} jointly to test "big projects", but we that is slower to execute, so we don't collect fine-grained results for it.

Member

retronym commented Jun 1, 2017

We also have a benchmark series for compiling scala-{library,reflect,compiler} jointly to test "big projects", but we that is slower to execute, so we don't collect fine-grained results for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment