-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DCOS-38980] Hive integration test for Sentry CREATETABLE bug fix #399
base: master
Are you sure you want to change the base?
Conversation
Otherwise hadoop-hive-kerberos marathon app fails to deploy because resources offered are not under role: slave_public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @samvantran.
I have made some quick comments. I do thing that the test can be cleaned up a little bit so that the dependencies are a little clearer. A lot of this comes down to naming. It is, for example, not 100% what the purpose of the each of the *Setup
classes are especially since we access the members directly.
Another question I have is about the templating of the xml
files as part of the tests. Should these not be available in the Docker image (and then templated in the running container) directly. I do recall seeing them in Susan's PR, but may have missed some context along the way.
"container": { | ||
"type": "MESOS", | ||
"docker": { | ||
"image": "susanxhuynh/cdh5-hive-kerberos:latest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this to the list of docker images that we need to move to the mesosphere
account.
@@ -0,0 +1,58 @@ | |||
<configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this file not be downloaded from the running hive image?
@@ -0,0 +1,26 @@ | |||
<configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here. Can this not be downloaded from the running hive container?
class KerberosSetup: | ||
def __init__(self, kerberos_env, hive_agent_ip, hive_agent_hostname): | ||
self.kerberos_env = kerberos_env | ||
self.hive_agent_ip = hive_agent_ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the hive_agent_ip
part of the KerberosSetup
. Also, should this not be KerberosConfig
?
self.hive_config_url = hive_config_url | ||
|
||
|
||
def _get_agent_ip(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite clear what this function is supposed to do? Get the IP of the first private agent?
What about using this: https://github.com/mesosphere/dcos-commons/blob/master/testing/sdk_agents.py#L32
time.sleep(120) | ||
|
||
# HDFS client configuration | ||
core_site_url = _upload_hadoop_config("core-site.xml", kerberos_setup.hive_agent_hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC these are already part of the Docker image?
core_site_url = _upload_hadoop_config("core-site.xml", kerberos_setup.hive_agent_hostname) | ||
_upload_hadoop_config("hdfs-site.xml", kerberos_setup.hive_agent_hostname) | ||
hive_config_url = _upload_hadoop_config("hive-site.xml", kerberos_setup.hive_agent_hostname) | ||
hdfs_base_url = os.path.dirname(core_site_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're mixing paths and URLS here?
|
||
|
||
@pytest.mark.xfail(sdk_utils.is_strict_mode(), | ||
reason="hadoop_setup requires root access; won't work in strict mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite think that this reasoning is valid? Which part of the application requires root
access? Can we not then run the application as the root
user (giving Marathon the required permissions to launch it successfully).
|
||
kerberos_args = [ | ||
"--kerberos-principal", ALICE_PRINCIPAL, | ||
"--keytab-secret-path", "/__dcos_base64___keytab", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there some issue with regards to accessing secrets at the root level?
tests/test_hive.py
Outdated
] + kerberos_args | ||
|
||
spark_utils.run_tests(app_url=spark_utils.dcos_test_jar_url(), | ||
app_args="thrift://{}:9083".format(hadoop_setup.kerberos_setup.hive_agent_hostname), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not read this from the hive-site.xml
file instead?
Looks like something else is going beyond just adding the upstream Spark commit.
This reverts commit 627581a.
Test for running a Hive Spark job against Hive in a (kerberized) Sentry-secured cluster, where a non-admin user has all privileges to a certain database.
The test:
-- The job creates several Spark SQL tables with a Hive backend.
-- HDFS is used as an intermediate data store.
-- The job is based on an example from Apache Spark.
This PR also adds support for passing in a client hive-site.xml config file when running a job. The config file may be specified as a Mesos URI in the submit args:
--conf spark.mesos.uris=http://path/to/hive-site.xml
Known issues: