Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark version validation fails on EMR / EMR Serverless #498

Closed
liurong79 opened this issue May 22, 2023 · 1 comment · Fixed by #501
Closed

spark version validation fails on EMR / EMR Serverless #498

liurong79 opened this issue May 22, 2023 · 1 comment · Fixed by #501
Labels

Comments

@liurong79
Copy link

Guidelines

Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use the Neo4j Connector for Apache Spark,
please ask on the Neo4j Discussion Forum instead of creating an issue here.

Expected Behavior (Mandatory)

Successfully import data to Neo4j using Spark on EMR or EMR Serverless

Actual Behavior (Mandatory)

Found following error in driver's log.

java.lang.NumberFormatException: For input string: "0-amzn-1"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
at org.neo4j.spark.util.ValidateSparkVersion.compare(Validations.scala:47)
at org.neo4j.spark.util.ValidateSparkVersion.$anonfun$validate$7(Validations.scala:58)
at org.neo4j.spark.util.ValidateSparkVersion.$anonfun$validate$7$adapted(Validations.scala:58)
at scala.collection.IndexedSeqOptimized.prefixLengthImpl(IndexedSeqOptimized.scala:41)
at scala.collection.IndexedSeqOptimized.forall(IndexedSeqOptimized.scala:46)
at scala.collection.IndexedSeqOptimized.forall$(IndexedSeqOptimized.scala:46)
at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:49)
at org.neo4j.spark.util.ValidateSparkVersion.validate(Validations.scala:58)
at org.neo4j.spark.util.Validations$.$anonfun$validate$1(Validations.scala:12)
at org.neo4j.spark.util.Validations$.$anonfun$validate$1$adapted(Validations.scala:12)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:141)
at org.neo4j.spark.util.Validations$.validate(Validations.scala:12)
at org.neo4j.spark.DataSource.(DataSource.scala:15)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:726)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:864)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)

How to Reproduce the Problem

Happens to any job running on EMR or EMR cluster.

Specifications (Mandatory)

Currently used versions

Versions

  • Spark: 3.3.0
  • Scala: 2.12
  • Neo4j: 5.8.0 Community Edition
  • Neo4j Connector: neo4j-connector-apache-spark_2.12-5.0.1_for_spark_3

Additional information

  • The code of the Spark job
  • the structure of the Dataframe
  • did you define the constraints/indexes?
  • if you're you using any Spark Cloud provider please specify it (ie: Databricks) : EMR and EMR Serverless
@liurong79 liurong79 added the bug label May 22, 2023
@liurong79
Copy link
Author

On EMR, sparkSession.version returns something like "3.3.0-amzn-1".
Here's the pull request: #499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant