Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Users to specify Elasticsearch Major Version #9696

Merged
merged 7 commits into from
Nov 14, 2020

Conversation

johnc1231
Copy link
Contributor

@johnc1231 johnc1231 commented Nov 10, 2020

Users want to be able to use elasticsearch 7. This adds an argument to make so that we pull in a different version of elasticsearch-spark-20 depending on what they specify for ELASTIC_MAJOR_VERSION.

Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Need to add an ENV_VAR call.

hail/Makefile Outdated
@@ -34,6 +34,8 @@ JAVAC ?= javac
JAR ?= jar
endif

ELASTIC_MAJOR_VERSION ?= 6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move this up into the hail/spark/scala version block, and add a call to ENV_VAR like on line 20.

return "6.8.13"
}
else if (elasticMajorVersion == "7") {
return "7.1.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use 7.1.1 here? It looks like the latest version is 7.10.0.

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into some version conflict with another dependency when I tried 7.10, just need to sort that out. Changed it to 7.1.1 just to make sure this selection method works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like by running 7.1.1 I can write data to 6.8.x, so I think I just don't trust that matrix at all anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that matrix is "... platform and software configurations that are eligible for support under our subscription offerings".

Yeah, I was also previously using Hail's ES-Hadoop 6.2.4 to write to ES 5.5, which is not officially supported according to that matrix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, found a statement in documentation:

Elasticsearch for Apache Hadoop maintains backwards compatibility with the most recent minor version of Elasticsearch’s previous major release (5.X supports back to 2.4.X, 6.X supports back to 5.6.X, etc…​).
-- https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html#upgrading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, nice! Gnomad uses 6.8 right? I wonder if the right thing to do is just to change the default to use ES-hadoop 7 then. Anyone using 5 is probably pretty out of date already? Idk how much of a lift updating elasticsearch is.

Copy link
Contributor

@nawatts nawatts Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with defaulting to 7. gnomAD is on 6.8, so it's not an issue (and should be upgraded anyway). Everything less than 7 is now end of life (https://www.elastic.co/support/eol).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I landed on 7.8.1. Seems like that's the last one before I start running into py4j issues. I'll figure those out at a later date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan to me.

chrisvittal
chrisvittal previously approved these changes Nov 12, 2020
Copy link
Collaborator

@chrisvittal chrisvittal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to add a dependency to the $(SHADOW_JAR) target to ensure it's rebuilt when the elasticsearch version changes.

@danking danking merged commit 8479bf2 into hail-is:main Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants