Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using BigQuery connector in HDInsights cluster: Guava version conflict #63

Closed
juliankeppel opened this issue Aug 24, 2017 · 2 comments
Closed

Comments

@juliankeppel
Copy link

I want to use the BigQuery connector in a HDInsights cluster with Spark 2.1.0 (Hortonworks Data Platform 2.6). If I run my job locally it works fine, but if I deploy it to the cluster (via Livy, but this should't matter here) I got:

ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase$ParentTimestampUpdateIncludePredicate.create(GoogleHadoopFileSystemBase.java:789)
...

I had similar issues in the past with Spark and third party libraries (especially libraries which use Google Guava). As far as I know, usually the best solution is to explicitly shade the conflicting libraries. I use maven as build tool thus I use maven shade plugin to shade the conflicting libraries:

<plugin>
	<artifactId>maven-shade-plugin</artifactId>
	<version>3.0.0</version>
	<executions>
		<execution>
			<!-- ...... --> 
			<phase>package</phase>
			<goals>
				<goal>shade</goal>
			</goals>
			<configuration>
				<transformers>
					<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
					</transformer>
				</transformers>
				<filters>
					<filter>
						<artifact>*:*</artifact>
						<excludes>
							<exclude>META-INF/*.SF</exclude>
							<exclude>META-INF/*.DSA</exclude>
							<exclude>META-INF/*.RSA</exclude>
						</excludes>
					</filter>
				</filters>
			</configuration>
		</execution>
	</executions>
</plugin>

A next try was to add and explcit relocate Guava via the maven plugin:

<relocation>
	<pattern>com.google.guava</pattern>
	<shadedPattern>shaded.com.google.guava</shadedPattern>
</relocation>

But with the same error message. I also tried to set spark.{driver, executor}.userClassPathFirst=true, also without success. Then I got something like:

Caused by: java.lang.RuntimeException: java.lang.ClassCastException: cannot assign instance of scala.concurrent.duration.FiniteDuration to field org.apache.spark.rpc.RpcTimeout.duration of type scala.concurrent.duration.FiniteDuration in instance of org.apache.spark.rpc.RpcTimeout
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
...

which is not thrown in Google BigQuery connector. Is this another error, which would mean that this has nothing to do with the lib conflict problem from above? Or does this also originate from the library issue?

I don't have ideas anymore... Does anybody have some more ideas what could go wrong here or what I am missing? I'm happy about any tips or ideas! Thank you very much.

@pmkc
Copy link
Contributor

pmkc commented Aug 24, 2017

We distribute all of our libraries with a shaded classifier. Were you using that version?

@juliankeppel
Copy link
Author

juliankeppel commented Aug 31, 2017

I solved the problem by simply use the maven shade plugin the right way... I had a wrong XML format. Maven doesn't complain about that, but I read the shade plugin documentation again and noticed how to use it the right way.

Something like this works fine:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.1.0</version>
    <configuration>
        <finalName>some-name</finalName>
        <outputDirectory>some-path</outputDirectory>
        <relocations>
            <relocation>
                <pattern>com.google.common</pattern>
                <shadedPattern>com.shaded.google.common</shadedPattern>
            </relocation>
        </relocations>
        <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                <mainClass>your.package.structure.ClassName</mainClass>
            </transformer>
        </transformers>
    </configuration>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
        </execution>
    </executions>
</plugin>

But as little hint: Without shading it doesn't work for me. I really had to shade the com.google.common package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants