New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Jobserver to 2.4.4 Spark #1283
Conversation
* Hive doesn't cleanup some of the data after the tables are dropped. As Jobserver project has for now 2 tests, which use Hive, add additional parameter LOCATION to table creation and use different paths for metastore. * Also add own warehouse configuration into hive-site.xml, to distinguish data in the future. * Enable REST api (spark.master.rest.enabled) for master explicitly because the default has been changed (to false) in 2.4.4. Change-Id: Id8604d0156f60970c494dd373bfaaafdbcb4d63f
Change-Id: Ic1c66785ea8c8645789c86dae83154e80458b7ec
C* connector 2.4 indirectly depends on common-configuration which is brought in the classpath by Hadoop 2.7. This dependency has been changed in Hadoop 3.x, so C* connector 2.4 is broken. Until it is fixed, jobserver puts the dependency on the classpath. https://datastax-oss.atlassian.net/browse/SPARKC-566 Change-Id: I532ab22d2bb97dc5fd118c7178f67207b06bf885
Change-Id: Ic5128c5f306250c289f8c59d22f53d31bf87674e
After upgrade to 2.4.4, python tests and context started to through warnings like "You are trying to pass an insecure Py4j gateway to Spark. This presents a security risk." This change is addressing the above problem by passing a token to the python subprocess. Subprocess uses the token for communication and is only allowed by the py4j gateway if the token is valid. Change-Id: I61e82b2996fd830315db1dc72af549578fc9a7a4
96990c1
to
c96c4f8
Compare
Spark 2.4.5 released |
@noorul great news but I will prefer to first get 2.4.4 in, since it is a big change. Going from 2.4.4 to 2.4.5 should be really easy. Btw, this current change blocked due to a hive test failure. I am trying to fix it. Locally it works for me but on travis somehow it is failing. |
The tests related to checking if hive is disabled were failing because the context from previous testcase was not shutdown properly and had hive enabled. This fix cleans the context properly and makes sure that context is stopped. Change-Id: If6f9cb26fcc6f8f2af3243057825ef75585378d8
Jobserver in opensource is using "pycodestyle" to make the python files PEP8 complaint. subprocess.py was not complaint and due to it the opensource build failed. Change-Id: I93f9718ed30e122441d6e775045fff0711342f08
Change-Id: I692a8ff1387aee99ea4db7863d4676f6dd8fa5c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pull Request checklist
Current behavior : (link exiting issues here : https://help.github.com/articles/basic-writing-and-formatting-syntax/#referencing-issues-and-pull-requests)
Only supports uptil 2.3.2
New behavior :
Support for 2.4.4 added
The PR contains 4 commits and each commit has a commit message with more details.
BREAKING CHANGES
Disabling of Hive is not currently done for Python based contexts. I will push this change soon.
Other information:
This change is