Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached resolution falsely evicts netty (spark + cassandra) #2046

Closed
mighdoll opened this issue Jun 3, 2015 · 12 comments
Closed

cached resolution falsely evicts netty (spark + cassandra) #2046

mighdoll opened this issue Jun 3, 2015 · 12 comments
Assignees
Labels

Comments

@mighdoll
Copy link

mighdoll commented Jun 3, 2015

I ran into a failure with cached resolution, below is a minimized example. The bug doesn't occur with cachedResolution disabled.

Spark includes a dependency on netty-3.8.0.Final, and the cassandra driver includes a dependency on netty-handler-4.x. These two variants of netty don't conflict, and both need to be included. If cached resolution is enabled, however, the 3.8 version is incorrectly evicted.

A workaround is to force the version of the cassandra driver with dependencyOverrides. That doesn't change the version of the cassandra driver selected, but it avoids the bug.

A quick way to see the problem is to copy the following build.sbt, and then:

 $ sbt
 > deps netty

then comment out the dependency override and:

> reload
> update
> deps netty

note that netty 3.8 isn't in on the list in the second case.

val vSpark = "1.3.1"
val sparkCore             = "org.apache.spark"         %% "spark-core"                % vSpark
val sparkCassandra        = "com.datastax.spark"       %% "spark-cassandra-connector" % "1.3.0-M1"
val cassandraDriver       = "com.datastax.cassandra"    %  "cassandra-driver-core"    % "2.1.6"

import sbt.complete.DefaultParsers._

lazy val root = (project in file(".")).
  settings(
    scalaVersion := "2.11.6",
    updateOptions := updateOptions.value.withCachedResolution(true),  // this breaks it
    dependencyOverrides += cassandraDriver,   // this fixes it
    libraryDependencies ++= Seq(
      sparkCore,
      cassandraDriver,
      sparkCassandra
    ),
    deps := {
      val matchString = (Space ~> token(StringBasic,"<search string>")).parsed
      val classpath = (fullClasspath in Compile).value
      val found = classpath.map(_.data.toString).filter(_.contains(matchString))
      found foreach println
    }
  )

lazy val deps = inputKey[Unit]("search classpath")
@eed3si9n eed3si9n added the area/library_management library management label Jun 3, 2015
@eed3si9n
Copy link
Member

eed3si9n commented Jun 3, 2015

Thanks for the report!

Spark includes a dependency on netty-3.8.0.Final, and the cassandra driver includes a dependency on netty-handler-4.x. These two variants of netty don't conflict, and both need to be included. If cached resolution is enabled, however, the 3.8 version is incorrectly evicted.

Could you run debug and then update with the cached resolution to see if the log includes some hint on why it did that, and paste it into gitst?

@mighdoll
Copy link
Author

mighdoll commented Jun 3, 2015

Here's the debug update output.

There's this bit:

[debug] - conflict in runtime-internal:com.datastax.cassandra:cassandra-driver-core (com.datastax.cassandra:cassandra-driver-core:2.1.5, com.datastax.cassandra:cassandra-driver-core:2.1.6)
[debug] :: merging module report for com.codahale.metrics:metrics-core:3.0.2 - Vector((Artifact(metrics-core,bundle,jar,None,Vector(master),None,Map()),/usr/local/google/home/mighdoll/.ivy2/cache/com.codahale.metrics/metrics-core/bundles/metrics-core-3.0.2.jar))
[debug] :::: transitively evicted runtime-internal: io.netty:netty

which is consistent with the dependencyOverrides workaround.

But I don't see any real clues. Do you?

@eed3si9n
Copy link
Member

eed3si9n commented Jul 7, 2015

Here's how we can get update report.

$ CONSCRIPT_OPTS=-Dsbt.log.noformat=true # This works if you use conscripted sbt.
$ sbt "show update" 2>&1 | tee update_cachedresolution.txt

Netty

[info]  io.netty:netty
[info]      - 3.8.0.Final
[info]          status: release
[info]          publicationDate: Thu Nov 07 10:23:12 EST 2013
[info]          resolver: sbt-chain
[info]          artifactResolver: sbt-chain
[info]          evicted: true
[info]          evictedReason: latest-revision
[info]          homepage: http://netty.io/
[info]          isDefault: false
[info]          configurations: compile, runtime, compile(*), runtime(*), master(compile), master
[info]          licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info]          callers: org.spark-project.akka:akka-remote_2.11:2.3.4-spark
[info]      - 3.9.0.Final
[info]          status: release
[info]          publicationDate: Sun Dec 22 05:55:40 EST 2013
[info]          resolver: sbt-chain
[info]          artifactResolver: sbt-chain
[info]          evicted: true
[info]          evictedReason: transitive-evict
[info]          homepage: http://netty.io/
[info]          isDefault: false
[info]          configurations: compile, runtime, compile(*), runtime(*), master(compile), master
[info]          licenses: (Apache License, Version 2.0,Some(http://www.apache.org/licenses/LICENSE-2.0))
[info]          callers: com.datastax.cassandra:cassandra-driver-core:2.1.5
[info] 

Netty 3.8.0.Final was evicted with the reason "latest-revision", which means there were newer version on the graph, and Netty 3.8.0.Final was evicted transitively when its caller com.datastax.cassandra:cassandra-driver-core:2.1.5 was evicted by cassandra-driver-core 2.1.6.

This is in fact a bug in cached resolution since Netty 3.8.0.Final should have survived, because org.spark-project.akka:akka-remote_2.11:2.3.4-spark survived.

@eed3si9n eed3si9n added the Bug label Jul 7, 2015
eed3si9n added a commit that referenced this issue Jul 9, 2015
eed3si9n added a commit that referenced this issue Jul 9, 2015
This fixes the minigraph stitching logic by first sorting the graph
based on the level of inter-dependencies, and gradually resolving
conflict from the leaves that do not depend on other libraries.
For each eviction, transitive evictions are propagated right away to
avoid double eviction observed in #2046

For the transitive eviction checking I needed to bring back the caller
information, which is notorious for its size. I am stuffing all
ModuleIDs into one ModuleID for the graph, and recovering them only
during the merging process.
@eed3si9n eed3si9n self-assigned this Jul 10, 2015
@eed3si9n
Copy link
Member

The generalization of bug:

  • X1 depends on an older library A1.
  • Y1 depends on an oder library B1.
  • A1 depends on a newer library B2.
  • a newer library A2 no longer depends on B2.
  • Your project depends on X1, Y1, and A2.
  1. B2 evicts B1.
  2. A2 evicts A1.
  3. The eviction of A1 causes the eviction of B2.

jsuereth added a commit that referenced this issue Jul 10, 2015
Fixes #2046. Cached resolution: Fixes double eviction
@mighdoll
Copy link
Author

Thanks eugene!

Point me at instructions for how to run a locally built sbt? I'll try your
patch.

On Fri, Jul 10, 2015 at 8:31 AM, eugene yokota notifications@github.com
wrote:

The generalization of bug:

  • X1 depends on an older library A1.

  • Y1 depends on an oder library B1.

  • A1 depends on a newer library B2.

  • a newer library A2 no longer depends on B2.

    Your project depends on X1, Y1, and A2.

  1. B2 evicts B1.
  2. A2 evicts A1.
  3. The eviction of A1 causes the eviction of B2.


Reply to this email directly or view it on GitHub
#2046 (comment).

@eed3si9n
Copy link
Member

I just released 0.13.9-RC2. Could you try and see if it fixes your issue?

@edeustace
Copy link

hi @eed3si9n - I tried 0.13.9-RC2 and the graph.json is smaller than before, but are still pretty large (max is ~130mb), if I completely remove the callers the graph.json files are roughly a tenth of the size. Forgive the dumb question, but would it be right to say that ModuleReport.callers property is only of use when ModuleReport.evicted = true? If that's the case could one only populate callers if evicted == true? If you want me to create a separate issue for this let me know.

@eed3si9n
Copy link
Member

@edeustace wrote:

would it be right to say that ModuleReport.callers property is only of use when ModuleReport.evicted = true?

No. What this issue has uncovered is that I need to sort the dependency graph by the degree of dependency before resolving the conflicts. That sorting requires callers. The problem with Ivy is that it seems to include both the direct callers and transitive callers, which result to bloating of of the callers causing #1763.

@edeustace
Copy link

Thanks for the clarification. What issue number should I be tracking for further reduction of the bloat?

@eed3si9n
Copy link
Member

@edeustace I've been using #1763 as the placeholder, but 0.13.9-RC2 should work with lila, so I've closed that one. If you want please file a new issue on this.

@mighdoll
Copy link
Author

thanks @eed3si9n. Verified that 0.13.9-RC2 works, with cached-resolution enabled, for both the sample project (archived here) and the upstream one.

@eed3si9n
Copy link
Member

@mighdoll Awesome. Thanks for confirming the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants