Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch-hadoop Scala version update to 2.12 #1949

Closed
Liana09 opened this issue Apr 8, 2022 · 2 comments
Closed

Elasticsearch-hadoop Scala version update to 2.12 #1949

Liana09 opened this issue Apr 8, 2022 · 2 comments

Comments

@Liana09
Copy link

Liana09 commented Apr 8, 2022

Issue Description:
I hope I am addressing this topic in the right place.
I am trying to connect to an elasticsearch database using spark and my code snippet looks like this:

spark = SparkSession.builder.master("local").appName("Spark").getOrCreate()
reader = spark.read.format("org.elasticsearch.spark.sql").option("es.read.metadata", "true").option("es.nodes.wan.only","true").option("es.port","9200").option("es.net.ssl","false").option("es.nodes", "here-ip-adress")
df = reader.load("my_index")

When calling df = reader.load("my_index") I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.: java.lang.NoClassDefFoundError: scala/Product$class
    at org.elasticsearch.spark.sql.ElasticsearchRelation.<init>(DefaultSource.scala:191)
    at org.elasticsearch.spark.sql.DefaultSource.createRelation(DefaultSource.scala:93)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: scala.Product$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 20 more

The error is caused, because the elasticsearch-hadoop connector still uses Scala Version 2.11, while Spark/Pyspark >=3.0 uses Scala version 2.12. For this reason, I am no longer able to use elasticsearch-hadoop to connect to my elasticsearch database, as I have different Scala Versions.

This problem would be solved, if I would downgrade to spark version <3.0, however I cannot do this, because it will generate other issues.

When will elasticsearch-hadoop support Scala 2.12? Or are there other workarounds to solve this kind of dependency issues?

Version Info

OS: : Fedora
JVM : openjdk 11.0.14.1
Hadoop/Spark: 3.2.1
ES-Hadoop : 8.1.2

Thank you!

@Liana09 Liana09 changed the title Elasticsearch-hadoop not compatible with Elasticsearch-hadoop Scala version update to 2.12 Apr 8, 2022
@Liana09
Copy link
Author

Liana09 commented Apr 8, 2022

Just realized I can use elasticsearch-spark package from here to solve this issue:
https://search.maven.org/artifact/org.elasticsearch/elasticsearch-spark-30_2.12/7.17.2/jar

@Liana09 Liana09 closed this as completed Apr 8, 2022
@abhishekmiet08
Copy link

Thanks @Liana09, it helped me to resolve the same blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants