Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-539] Fix bug when pysparkling is executed in parallel on the same node #393

Merged
merged 1 commit into from Sep 26, 2017

Conversation

jakubhava
Copy link
Contributor

This bug fix introduce back the cache which @mmalohlava wrote a while ago. However it also changes the python egg cache path for tests so the tests always uses the correct latest artefact ( this was source of issues and also reason why the cache was removed ).

Also when testing SNAPSHOT versions locally on different sparkling-water we make sure to use temporary cache for python eggs, again to be sure we run on the latest code.

The cache is fine if it's used by the users on the released versions.

return sw_jar
cache_path = get_cache_path(zip_filename)
cached_jar = os.path.abspath("{}/sparkling_water/sparkling_water_assembly.jar".format(cache_path))
if os.path.exists(cached_jar) and os.path.isfile(cached_jar):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, can we disable caching totally? It could be dangerous, if we have cached old version of jar and upgraded sw.

@jakubhava
Copy link
Contributor Author

I'm all for not using cache as the cache was also cause of problems before. We can disable it in this case since we need to extract the JAR any way, but we can extract it to temporary directory which will be cleaned at the end of H2OContext.

@jakubhava jakubhava merged commit c93c0ee into master Sep 26, 2017
@jakubhava jakubhava deleted the jh/jira/sw-539 branch September 26, 2017 19:47
jakubhava added a commit that referenced this pull request Oct 10, 2017
jakubhava added a commit that referenced this pull request Oct 10, 2017
jakubhava added a commit that referenced this pull request Oct 10, 2017
jakubhava added a commit that referenced this pull request Oct 18, 2017
@@ -139,12 +139,14 @@ def getOrCreate(spark, conf=None, **kwargs):


def stop_with_jvm(self):
Initializer.clean_temp_dir()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: This seems causing print of stack-traces during spark shutdown. It cannot expect that it will be fully executed. The better solution is simply skip cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! Will work on this tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants