Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Takes a very long time to shadow hive-exec #384

Closed
ericsun2 opened this issue Jun 6, 2018 · 9 comments
Closed

Takes a very long time to shadow hive-exec #384

ericsun2 opened this issue Jun 6, 2018 · 9 comments

Comments

@ericsun2
Copy link

ericsun2 commented Jun 6, 2018

Please check the User Guide before submitting "how do I do 'x'?" questions!

Shadow Version

2.0.3

Gradle Version

4.3.1

Expected Behavior

We have to shadow a few popular dependencies in hive-exec to avoid conflicts, such as Guava, Jline, Jackson...
The shadowJar() task is supposed to finish in seconds.

Actual Behavior

It takes more than 1.5 hours to complete the shadowJar() task.

Gradle Build Script(s)

plugins {
  id 'java'
  id 'com.github.johnrengelman.plugin-shadow' version '2.0.3'
}

ext.hiveVersion = '2.1.1'

dependencies {
  compile('org.apache.hive:hive-exec:${hiveVersion}') {
    exclude group: 'org.apache.spark'
    exclude group: 'org.apache.tez'
    exclude group: 'org.apache.hive', module: 'hive-llap-tez'
    exclude group: 'org.apache.hive', module: 'hive-ant'
    exclude group: 'org.apache.hadoop', module: 'hadoop-archives'
    exclude module: 'hadoop-annotations'
    exclude module: 'hadoop-yarn-api'
    exclude module: 'hadoop-yarn-common'
    exclude module: 'hadoop-yarn-server-applicationhistoryservice'
    exclude module: 'hadoop-yarn-server-common'
    exclude module: 'hadoop-yarn-server-resourcemanager'
    exclude module: 'hadoop-yarn-server-web-proxy'
  }
}

configurations {
  shadow
}

shadowJar {
  zip64 true

  // need to shade "com.google.guava" to avoid Guava conflict
  relocate 'com.google', 'shadow.com.google'
  relocate 'org.codehaus', 'shadow.org.codehaus'
  relocate 'jline', 'shadow.jline'

  classifier 'all'
  // baseName 'hive-exec-shade'
  version "${hiveVersion}"
  mergeServiceFiles()

  exclude 'LICENSE'
  exclude(
      'org/xml/**',
      'javax/**',
      'com/sun/**'
  )
  // archiveName = "${baseName}-${spec.version}.${extension}"
}

artifacts {
  shadow(archives(shadowJar) {
    builtBy shadowJar
  })
}

Content of Shadow JAR (jar tf <jar file> - post link to GIST if too long)

$ sed 's/:/\n/g' hive-exec-shade.classpath |cut -d/ -f8-9 |sort |uniq

antlr/antlr
aopalliance/aopalliance
asm/asm
com.fasterxml.jackson.core/jackson-annotations
com.fasterxml.jackson.core/jackson-core
com.fasterxml.jackson.core/jackson-databind
com.google.code.findbugs/jsr305
com.google.code.gson/gson
com.google.guava/guava
com.google.inject.extensions/guice-servlet
com.google.inject/guice
com.google.protobuf/protobuf-java
commons-cli/commons-cli
commons-codec/commons-codec
commons-collections/commons-collections
commons-dbcp/commons-dbcp
commons-httpclient/commons-httpclient
commons-io/commons-io
commons-lang/commons-lang
commons-logging/commons-logging
commons-pool/commons-pool
com.sun.jersey.contribs/jersey-guice
com.sun.jersey/jersey-client
com.sun.jersey/jersey-core
com.sun.jersey/jersey-json
com.sun.jersey/jersey-server
com.sun.xml.bind/jaxb-impl
hive-exec-shade/libs
io.netty/netty
javax.activation/activation
javax.inject/javax.inject
javax.servlet/servlet-api
javax.xml.bind/jaxb-api
javax.xml.stream/stax-api
jline/jline
log4j/log4j
net.hydromatic/eigenbase-properties
org.antlr/antlr-runtime
org.antlr/ST4
org.antlr/stringtemplate
org.apache.ant/ant
org.apache.ant/ant-launcher
org.apache.calcite/calcite-avatica
org.apache.calcite/calcite-core
org.apache.calcite/calcite-linq4j
org.apache.commons/commons-compress
org.apache.curator/curator-client
org.apache.curator/curator-framework
org.apache.hadoop/hadoop-annotations
org.apache.hadoop/hadoop-yarn-api
org.apache.hadoop/hadoop-yarn-common
org.apache.hadoop/hadoop-yarn-server-applicationhistoryservice
org.apache.hadoop/hadoop-yarn-server-common
org.apache.hadoop/hadoop-yarn-server-resourcemanager
org.apache.hadoop/hadoop-yarn-server-web-proxy
org.apache.hive/hive-exec
org.apache.hive/hive-shims
org.apache.hive.shims/hive-shims-0.23
org.apache.hive.shims/hive-shims-common
org.apache.hive.shims/hive-shims-scheduler
org.apache.httpcomponents/httpclient
org.apache.httpcomponents/httpcore
org.apache.ivy/ivy
org.apache.logging.log4j/log4j-1.2-api
org.apache.logging.log4j/log4j-api
org.apache.logging.log4j/log4j-core
org.apache.logging.log4j/log4j-slf4j-impl
org.apache.thrift/libthrift
org.apache.zookeeper/zookeeper
org.codehaus.groovy/groovy-all
org.codehaus.jackson/jackson-core-asl
org.codehaus.jackson/jackson-jaxrs
org.codehaus.jackson/jackson-mapper-asl
org.codehaus.jackson/jackson-xc
org.codehaus.janino/commons-compiler
org.codehaus.janino/janino
org.codehaus.jettison/jettison
org.datanucleus/datanucleus-core
org.fusesource.leveldbjni/leveldbjni-all
org.mortbay.jetty/jetty
org.mortbay.jetty/jetty-util
org.pentaho/pentaho-aggdesigner-algorithm
org.slf4j/slf4j-api
org.slf4j/slf4j-log4j12
org.sonatype.sisu.inject/cglib
stax/stax-api
@ericsun2 ericsun2 changed the title Takes a long time to shadow hive-exec Takes a very long time to shadow hive-exec Jun 6, 2018
@johnrengelman
Copy link
Owner

Please try version 2.0.4 as it has some internal changes and an update to ASM which was having some performance problems.

@maguro
Copy link
Contributor

maguro commented Aug 25, 2018

There is no 2.0.4 version of plugin-shadow. Please see #381.

@maguro
Copy link
Contributor

maguro commented Aug 25, 2018

I have similar problems with my very small plugin.

@maguro
Copy link
Contributor

maguro commented Aug 26, 2018

I figured out my problem, the gradleApi() dependency was adding a gradle-api-x.x.jar file. This JAR file is over 100MB in size and has over 35k classes. I added this to my build.gradle file:

configurations.compile.dependencies.remove dependencies.gradleApi()

And my builds are now back to sane levels.

@johnrengelman
Copy link
Owner

johnrengelman commented Sep 12, 2018

If you aren't buliding a Gradle plugin ,you shouldn't be using com.github.johnrengelman.plugin-shadow. You should be using com.github.johnrengelman.shadow.

If you are buliding a plugin, you should add gradleApi() and localGroovy() to the shadow configuration instead of to compile.

See https://github.com/johnrengelman/shadow/blob/8af7f0f93b19d8f154798bc2b88bcd4f35656aca/gradle/dependencies.gradle

@maguro
Copy link
Contributor

maguro commented Sep 17, 2018

Adding gradleApi() and localGroovy() to the shadow configuration instead of to compile is not sufficient. Your root build.gradle does the same:

https://github.com/johnrengelman/shadow/blob/master/build.gradle#L31

@johnrengelman
Copy link
Owner

That’s because the java-gradle-plugin adds it to the compile classpath.

If you aren’t using that, then the gradleApi shouldn’t be in the classpath.

@maguro
Copy link
Contributor

maguro commented Sep 18, 2018

The context mythread is that I am building a Gradle plugin and so I am using java-gradle-plugin. When you say

If you are buliding a plugin, you should add gradleApi() and localGroovy() to the shadow configuration instead of to compile.

It leads me to believe that, when building a Gradle plugin, all I need to do is shadow the usual Gradle plugin compile dependencies, which is not the case. This is why I pointed out that one must also remove the compile dependencies explicitly.

@johnrengelman
Copy link
Owner

Yeah, I believe java-gradle-plugin adds gradleApi() to compile configuration. So you do need to remove this yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants