spark version validation fails on EMR / EMR Serverless #498

liurong79 · 2023-05-22T01:58:27Z

Guidelines

Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use the Neo4j Connector for Apache Spark,
please ask on the Neo4j Discussion Forum instead of creating an issue here.

Expected Behavior (Mandatory)

Successfully import data to Neo4j using Spark on EMR or EMR Serverless

Actual Behavior (Mandatory)

Found following error in driver's log.

java.lang.NumberFormatException: For input string: "0-amzn-1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
at org.neo4j.spark.util.ValidateSparkVersion.compare(Validations.scala:47)
at org.neo4j.spark.util.ValidateSparkVersion.$anonfun$validate$7(Validations.scala:58)
at org.neo4j.spark.util.ValidateSparkVersion.$anonfun$validate$7$adapted(Validations.scala:58)
at scala.collection.IndexedSeqOptimized.prefixLengthImpl(IndexedSeqOptimized.scala:41)
at scala.collection.IndexedSeqOptimized.forall(IndexedSeqOptimized.scala:46)
at scala.collection.IndexedSeqOptimized.forall$(IndexedSeqOptimized.scala:46)
at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:49)
at org.neo4j.spark.util.ValidateSparkVersion.validate(Validations.scala:58)
at org.neo4j.spark.util.Validations$.$anonfun$validate$1(Validations.scala:12)
at org.neo4j.spark.util.Validations$.$anonfun$validate$1$adapted(Validations.scala:12)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:141)
at org.neo4j.spark.util.Validations$.validate(Validations.scala:12)
at org.neo4j.spark.DataSource.(DataSource.scala:15)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:726)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:864)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)

How to Reproduce the Problem

Happens to any job running on EMR or EMR cluster.

Specifications (Mandatory)

Currently used versions

Versions

Spark: 3.3.0
Scala: 2.12
Neo4j: 5.8.0 Community Edition
Neo4j Connector: neo4j-connector-apache-spark_2.12-5.0.1_for_spark_3

Additional information

The code of the Spark job
the structure of the Dataframe
did you define the constraints/indexes?
if you're you using any Spark Cloud provider please specify it (ie: Databricks) : EMR and EMR Serverless

liurong79 · 2023-05-22T02:01:38Z

On EMR, sparkSession.version returns something like "3.3.0-amzn-1".
Here's the pull request: #499

…Serverless

)

liurong79 added the bug label May 22, 2023

conker84 added a commit to conker84/neo4j-spark-connector that referenced this issue May 23, 2023

fixes neo4j-contrib#498: spark version validation fails on EMR / EMR …

b500840

…Serverless

conker84 mentioned this issue May 23, 2023

fixes #498: spark version validation fails on EMR / EMR Serverless #501

Merged

conker84 closed this as completed in #501 May 23, 2023

conker84 added a commit that referenced this issue May 23, 2023

fixes #498: spark version validation fails on EMR / EMR Serverless (#501

1ebbfdb

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark version validation fails on EMR / EMR Serverless #498

spark version validation fails on EMR / EMR Serverless #498

liurong79 commented May 22, 2023

liurong79 commented May 22, 2023

spark version validation fails on EMR / EMR Serverless #498

spark version validation fails on EMR / EMR Serverless #498

Comments

liurong79 commented May 22, 2023

Guidelines

Expected Behavior (Mandatory)

Actual Behavior (Mandatory)

How to Reproduce the Problem

Specifications (Mandatory)

Versions

Additional information

liurong79 commented May 22, 2023