Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing service files in ES-Hadoop jars #1265

Merged
merged 3 commits into from Mar 19, 2019

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Mar 15, 2019

When creating the jar files for ES-Hadoop, each integration copies the contents of the MR jar into itself since the MR jar contains all the core code. Once each jar is built, they all contribute their contents to the top level elasticsearch-hadoop jar (ignoring duplicate code files). A problem occurs during these jar transitions: The contents of META-INF/services are not copied along. This previously would manifest as not being able to create a Spark SQL dataframe using the short name "es" when using the elasticsearch-hadoop-x.x.x.jar. Creating the dataframe using the short name would work fine when using the elasticsearch-spark-yy_zz-x.x.x.jar because it contains the appropriate service file, which is never copied up to the root jar.

Now that we have Kerberos integrated, there are several items in different projects services directories that all need to be copied around in order for different Kerberos features in Hadoop and Spark to function normally.

We did not encounter these problems because we make use of a separate hadoop testing jar, which is created directly from the sources of the projects instead of from the jar files, and which includes all the test and integration test sources.

This PR ensures that the contents of the mr project's META-INF/services directory are copied into the hive, pig, spark, and storm jars, and that the contents of all of integrations META-INF/services directories are copied into the root elasticsearch-hadoop jar.

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jbaiera jbaiera merged commit f61227c into elastic:master Mar 19, 2019
@jbaiera jbaiera deleted the fix-missing-service-files branch March 19, 2019 15:57
jbaiera added a commit that referenced this pull request Mar 19, 2019
This PR ensures that the contents of the mr project's META-INF/services directory are 
copied into the hive, pig, spark, and storm jars, and that the contents of all of integrations 
META-INF/services directories are copied into the root elasticsearch-hadoop jar.
jbaiera added a commit that referenced this pull request Mar 19, 2019
This PR ensures that the contents of the mr project's META-INF/services directory are 
copied into the hive, pig, spark, and storm jars, and that the contents of all of integrations 
META-INF/services directories are copied into the root elasticsearch-hadoop jar.
jbaiera added a commit that referenced this pull request Mar 19, 2019
This PR ensures that the contents of the mr project's META-INF/services directory are 
copied into the hive, pig, spark, and storm jars, and that the contents of all of integrations 
META-INF/services directories are copied into the root elasticsearch-hadoop jar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants