Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-1501] Check the case of JAR being attached prior the start of PySparkling #1433

Merged
merged 4 commits into from Aug 14, 2019

Conversation

jakubhava
Copy link
Contributor

No description provided.

Copy link
Collaborator

@mn-mikke mn-mikke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we get the underlying Spark unstable with this change? Upon till now, if there was any conflict in a version of dependencies between Spark and H20-3, Spark dependencies were loaded first. Aren't we changing this behavior?

py/ai/h2o/sparkling/Initializer.py Outdated Show resolved Hide resolved
here = path.abspath(path.dirname(__file__))
if '.zip' in here:
with zipfile.ZipFile(here[:-len("ai/h2o/sparkling/")], 'r') as archive:
return archive.read('ai/h2o/sparkling/version.txt').decode('utf-8').strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can happen that the version file is still in the pysparkling folder?

Copy link
Contributor Author

@jakubhava jakubhava Aug 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, but not anymore, we are generating it into ai/h2o/sparkling. As the ai/h2o/sparkling is our core backend python library & Implementation if initializer is there, I thought it makes sense to have the version file there as well

@jakubhava
Copy link
Contributor Author

@mn-mikke we are not changing the order of dependencies. Spark is still loading the dependencies the same way as before. What we just changed is that we added verification of the Sparkling Water JAR. We check if the JAR is already attached to the cluster and if not, attach it. We have already been attaching the JAR before, but we ignored the fact that the JAR might have been already attached.

What will change for the user that they will see exception when they have conflicting jars, but that is a bug fix as before, they could hit weird issues. This way, we warn them early

@jakubhava jakubhava merged commit e4cbab3 into master Aug 14, 2019
@jakubhava jakubhava deleted the jh/sw-1501 branch August 14, 2019 06:22
jakubhava added a commit that referenced this pull request Aug 14, 2019
jakubhava added a commit that referenced this pull request Aug 14, 2019
jakubhava added a commit that referenced this pull request Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants