Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparkMagic and Livy user impersonation #460

Open
ranjitiyer opened this issue May 21, 2018 · 2 comments
Open

SparkMagic and Livy user impersonation #460

ranjitiyer opened this issue May 21, 2018 · 2 comments

Comments

@ranjitiyer
Copy link

ranjitiyer commented May 21, 2018

Background

When a Spark notebook is executed in Jupyter, SparkMagic sends code (via REST API) to Livy which then creates a Spark job and submits it to a YARN cluster for execution. Ordinarily YARN jobs thus submitted run as user livy but many enterprise organizations want Jupyter users to be impersonated in Livy. This can be achieved by enabling Livy impersonation and adding the proxyUser property in the spark magic configuration for each user that needs to be impersonated.

  "session_configs": {
    "driverMemory": "1000M",
    "executorCores": 2
    "proxyUser": "bob"
  },

The result is of this config change is that if bob is the Notebook instance user, they are now also the user running the YARN application.

Application-Id	    Application-Name	    Application-Type	      User	     Queue
application_1526925378944_0005	      livy-session-1	               SPARK	       bob	   default

Proposal

Since the proxyUser value cannot be known a-priori, it must be set individually for every user in their spark magic config json. This is not ideal because it increases configuration complexity for a multi-user enterprise requiring them to inject this property when a new user(s) is added to the system.

I'm proposing that SparkMagic support user impersonation by default - meaning it always sends proxyUser with its value as the user name of process SparkMagic is running in to Livy when creating a new Livy session. This avoids configuration complexity for user users and makes spark magic more amenable for enterprise use. An administrator can always explicitly set a value for proxyUser in sessions_configs JSON object and that will take precedence over the proposed default behavior of using the OS user name for impersonation.

I envision this to be low complexity change in /sparkmagic/utils/configuration.py combined with a configuration property "livy_user_impersonation": true|false. For NO_AUTH it sends the user of the current process as the proxyUser.

Happy to get some feedback on this proposal.

@maziyarpanahi
Copy link

This is also an issue when you launch an EMR cluster with Jupyter. When you set the Livy configuration for the cluster, in Jupyter SparkSession cannot be created:

configs:

[
    {
      "Classification": "livy-conf",
      "Properties": {
        "livy.impersonation.enabled": "true"
      }
    },
    {
      "Classification": "core-site",
      "Properties": {
        "hadoop.proxyuser.livy.groups": "*",
        "hadoop.proxyuser.livy.hosts": "*"
      }
    }
  ]

error:

unexpected parameter proxyUser can not be empty.

@josechudev
Copy link

Hey! I am currently needing this change. How can I help with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants