Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCOS-38980] Hive integration test for Sentry CREATETABLE bug fix #399

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

susanxhuynh
Copy link
Contributor

Test for running a Hive Spark job against Hive in a (kerberized) Sentry-secured cluster, where a non-admin user has all privileges to a certain database.

The test:

  • Sets up a KDC.
  • Adds a "alice" principal to the KDC.
  • Runs the kerberized Hive Sentry Docker image.
  • Grants ALL privileges on Hive database "default" to "alice".
  • Prepares the hdfs-site.xml, core-site.xml, and hive-site.xml client config files.
  • Runs a Spark job - a Hive job - with "alice" as the Kerberos principal.
    -- The job creates several Spark SQL tables with a Hive backend.
    -- HDFS is used as an intermediate data store.
    -- The job is based on an example from Apache Spark.

This PR also adds support for passing in a client hive-site.xml config file when running a job. The config file may be specified as a Mesos URI in the submit args:
--conf spark.mesos.uris=http://path/to/hive-site.xml

Known issues:

  • Instead of sleeping for a fixed period, the test should add a readiness check to determine when the Hive image is ready.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @samvantran.

I have made some quick comments. I do thing that the test can be cleaned up a little bit so that the dependencies are a little clearer. A lot of this comes down to naming. It is, for example, not 100% what the purpose of the each of the *Setup classes are especially since we access the members directly.

Another question I have is about the templating of the xml files as part of the tests. Should these not be available in the Docker image (and then templated in the running container) directly. I do recall seeing them in Susan's PR, but may have missed some context along the way.

"container": {
"type": "MESOS",
"docker": {
"image": "susanxhuynh/cdh5-hive-kerberos:latest",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this to the list of docker images that we need to move to the mesosphere account.

@@ -0,0 +1,58 @@
<configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file not be downloaded from the running hive image?

@@ -0,0 +1,26 @@
<configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. Can this not be downloaded from the running hive container?

class KerberosSetup:
def __init__(self, kerberos_env, hive_agent_ip, hive_agent_hostname):
self.kerberos_env = kerberos_env
self.hive_agent_ip = hive_agent_ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the hive_agent_ip part of the KerberosSetup. Also, should this not be KerberosConfig?

self.hive_config_url = hive_config_url


def _get_agent_ip():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite clear what this function is supposed to do? Get the IP of the first private agent?

What about using this: https://github.com/mesosphere/dcos-commons/blob/master/testing/sdk_agents.py#L32

time.sleep(120)

# HDFS client configuration
core_site_url = _upload_hadoop_config("core-site.xml", kerberos_setup.hive_agent_hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC these are already part of the Docker image?

core_site_url = _upload_hadoop_config("core-site.xml", kerberos_setup.hive_agent_hostname)
_upload_hadoop_config("hdfs-site.xml", kerberos_setup.hive_agent_hostname)
hive_config_url = _upload_hadoop_config("hive-site.xml", kerberos_setup.hive_agent_hostname)
hdfs_base_url = os.path.dirname(core_site_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're mixing paths and URLS here?



@pytest.mark.xfail(sdk_utils.is_strict_mode(),
reason="hadoop_setup requires root access; won't work in strict mode")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite think that this reasoning is valid? Which part of the application requires root access? Can we not then run the application as the root user (giving Marathon the required permissions to launch it successfully).


kerberos_args = [
"--kerberos-principal", ALICE_PRINCIPAL,
"--keytab-secret-path", "/__dcos_base64___keytab",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there some issue with regards to accessing secrets at the root level?

] + kerberos_args

spark_utils.run_tests(app_url=spark_utils.dcos_test_jar_url(),
app_args="thrift://{}:9083".format(hadoop_setup.kerberos_setup.hive_agent_hostname),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not read this from the hive-site.xml file instead?

Looks like something else is going beyond just adding the upstream Spark
commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants