Tests fail on spark-1.6.X branch with pyspark 1.6.3 #5

robertjrodger · 2017-05-31T14:37:06Z

Hello,

The maven build as you outline in the README goes fine but the suggested test fails with relevant output:

======================================================================
ERROR: testWorkflow (jpmml_sparkml.tests.JPMMLTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rodgerr/dev/jpmml-sparkml-package/src/main/python/jpmml_sparkml/tests/__init__.py", line 41, in testWorkflow
    pmmlBytes = toPMMLBytes(self.sc, df, pipelineModel)
  File "/Users/rodgerr/dev/jpmml-sparkml-package/src/main/python/jpmml_sparkml/__init__.py", line 22, in toPMMLBytes
    return javaConverter.toPMMLByteArray(javaSchema, javaPipelineModel)
  File "/Users/rodgerr/miniconda2/lib/python2.7/site-packages/py4j/java_gateway.py", line 1154, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/local/opt/apache-spark@1.6/libexec/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/Users/rodgerr/miniconda2/lib/python2.7/site-packages/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling z:org.jpmml.sparkml.ConverterUtil.toPMMLByteArray.
: java.lang.NoClassDefFoundError: com/google/common/collect/Iterables
	at org.jpmml.sparkml.FeatureMapper.getOnlyFeature(FeatureMapper.java:216)
	at org.jpmml.sparkml.feature.StringIndexerModelConverter.encodeFeatures(StringIndexerModelConverter.java:42)
	at org.jpmml.sparkml.FeatureMapper.append(FeatureMapper.java:71)
	at org.jpmml.sparkml.feature.RFormulaModelConverter.encodeFeatures(RFormulaModelConverter.java:60)
	at org.jpmml.sparkml.FeatureMapper.append(FeatureMapper.java:71)
	at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:117)
	at org.jpmml.sparkml.ConverterUtil.toPMMLByteArray(ConverterUtil.java:213)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Iterables
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 18 more

In case it's relevant, I'm using py4j-0.10.5 which was released after the most recent branch commit.

The text was updated successfully, but these errors were encountered:

vruusmann · 2017-05-31T14:50:54Z

I've tested JPMML-SparkML-Package with Apache Spark 1.6.0, 1.6.1 and 1.6.2, but not with 1.6.3. Must be the case that the Google Guava dependency has been relocated between the 1.6.2 and 1.6.3 versions.

You can bypass tests like this:

$ mvn -Dmaven.test.skip=true clean install

During runtime, simply add Google Guava dependency (com.google.guava:guava:[16.0, 20.0]) to your application classpath.

Will investigate potential fixes. Could introduce a build profile, which builds a "fat" JAR (includes Guava) for Apache Spark version 1.6.3, and a "thin" JAR (excludes Guava) for all earlier versions.

robertjrodger · 2017-05-31T15:30:10Z

I removed Apache Spark 1.6.3 and installed 1.6.0 and again the Maven build succeeds but the nosetests do not, with the same traceback.

vruusmann · 2017-05-31T15:39:21Z

Believe it or not, but everything works as advertised in my computer:

$ export SPARK_HOME=/opt/spark-1.6.2/
$ export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python
$ mvn -Ppyspark clean install
$ cd src/main/python
$ nosetests

End of the output:

17/05/31 18:36:45 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
.
----------------------------------------------------------------------
Ran 1 test in 12.075s

OK
17/05/31 18:36:46 INFO util.ShutdownHookManager: Shutdown hook called
17/05/31 18:36:46 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-afe222bc-6d29-4220-b50c-01abaf059e51

vruusmann · 2017-05-31T15:42:22Z

Maybe it's some Apache Spark packaging issue? What is the name of your distribution, is it "with Hadoop" or "without Hadoop" edition?

robertjrodger · 2017-05-31T16:07:06Z

Spark 1.6.0 Pre-built for Apache Hadoop 2.6, tarball downloaded from spark.apache.org/downloads.html; I have the same result with Spark 1.6.2 Pre-built for Apache Hadoop 2.6. Using 2.0.0 Pre-built for Apache Hadoop 2.7 leads to successfully passing the tests on both branches.

Could it have to do with the jpmml-sparkml Maven JAR? Which Spark distribution do you use?

asnare · 2017-06-16T14:34:48Z

Classpath misery, for the win.

I've just been trying to help Robert understand what's going on here. I must confess I'm a little lost:

Guava is declared (upstream) in jpmml-sparkml as 13.0 as provided.
The Spark 1.6 binary packages from the Apache project don't include Guava in a usable form. (They have a shaded version.)

So I'm a bit perplexed about where the runtime Guava dependency should be coming from?

vruusmann · 2017-06-16T20:18:33Z

So I'm a bit perplexed about where the runtime Guava dependency should be coming from?

In your application project directory, execute Apache Maven command mvn dependency:tree, and look for the occurrences of "guava".

The availability of Guava depends on Apache Spark version (1.6.X vs 2.0.X), and packaging ("with hadoop" or "without hadoop").

In Robert's application environment (#5 (comment)) there is no Guava dependency available (as indicated by java.lang.NoClassDefFoundError: com/google/common/collect/Iterables). Therefore, add the following to your pom.xml, rebuild and redeploy:

<dependency>
	<groupId>com.google.guava</groupId>
	<artifactId>guava</artifactId>
	<version>19.0</version>
</dependency>

robertjrodger · 2017-06-19T08:57:47Z

With your suggested addition to the pom.xml, the new build passes the tests. Thank you for having a look into this! Additionally, I will open a pull request for the revised pom.xml for, as you mention, there is no guarantee the user's Spark distribution includes Guava.

See jpmml/pyspark2pmml#5

robertjrodger mentioned this issue Jun 19, 2017

Add Guava dependency to Spark 1.6.x pom.xml #6

Closed

vruusmann closed this as completed in e44ebf9 Jun 25, 2017

vruusmann added a commit to jpmml/jpmml-sparkml that referenced this issue Jun 25, 2017

Updated Guava dependency

d84d8c5

See jpmml/pyspark2pmml#5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests fail on spark-1.6.X branch with pyspark 1.6.3 #5

Tests fail on spark-1.6.X branch with pyspark 1.6.3 #5

robertjrodger commented May 31, 2017

vruusmann commented May 31, 2017

robertjrodger commented May 31, 2017

vruusmann commented May 31, 2017

vruusmann commented May 31, 2017

robertjrodger commented May 31, 2017 •

edited

asnare commented Jun 16, 2017 •

edited by vruusmann

vruusmann commented Jun 16, 2017

robertjrodger commented Jun 19, 2017

Tests fail on spark-1.6.X branch with pyspark 1.6.3 #5

Tests fail on spark-1.6.X branch with pyspark 1.6.3 #5

Comments

robertjrodger commented May 31, 2017

vruusmann commented May 31, 2017

robertjrodger commented May 31, 2017

vruusmann commented May 31, 2017

vruusmann commented May 31, 2017

robertjrodger commented May 31, 2017 • edited

asnare commented Jun 16, 2017 • edited by vruusmann

vruusmann commented Jun 16, 2017

robertjrodger commented Jun 19, 2017

robertjrodger commented May 31, 2017 •

edited

asnare commented Jun 16, 2017 •

edited by vruusmann