Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Kerberos authentication #284

Conversation

praveen-kanamarlapudi
Copy link
Contributor

We can connect to kerberos enabled Livy now.

Copy link
Contributor

@aggFTW aggFTW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the PR.

My general comment would be that I think we should discuss the model for getting new tickets and deleting them once the user cleans up. Could you please clarify your thoughts on this?

It'd also be great to see:

  • Modification to README.md so that users know how to setup different authentication methods.
  • Ability to specify which authentication method to use on the payload for https://github.com/jupyter-incubator/sparkmagic#reconnectsparkmagic
  • Unit tests for any new logic introduced (e.g. selecting different authentication model triggers different behaviors).

# Time in seconds
KERBEROS_TIME_INTERVAL = 6000

NO_AUTH = "no auth"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"None"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW, we are changing this and updated PR will contain these changes.

KERBEROS_TIME_INTERVAL = 6000

NO_AUTH = "no auth"
AUTH_KERBEROS = "kerberos"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Kerberos"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW, we are changing this and updated PR will contain these changes.


NO_AUTH = "no auth"
AUTH_KERBEROS = "kerberos"
AUTH_SSL = "username/password"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Basic Access"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW, we are changing this and updated PR will contain these changes.


KERBEROS_KINIT = 'kinit'
# Time in seconds
KERBEROS_TIME_INTERVAL = 6000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a constant, but a configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please name it so that the user knows what time unit to use: kerberos_renew_time_interval_seconds or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW, we are changing this and updated PR will contain these changes.

options={constants.AUTH_KERBEROS: constants.AUTH_KERBEROS, constants.AUTH_SSL: constants.AUTH_SSL,
constants.NO_AUTH: constants.NO_AUTH},
description=u"Auth type:"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On change, we should hide/show the appropriate fields. So:

  • On change to NO_AUTH: hide username and password.
  • On change to AUTH_KERBEROS: show username and password.
  • On change to AUTH_SSL: show username and password.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW We thought of implementing this change in later. If it's javascript, can you help me understand where can I add javascript for these king of changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -122,3 +127,12 @@ def get_info_endpoint_widget(self, endpoint, url):
text = "No sessions on this endpoint."

return self.ipywidget_factory.get_html(text, width=width)

def initialize_kerberos(self, endpoint):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about getting the ticket directly from ReliableHttpClient? Then, upon creation of the object, the ticket will be obtained (and renewed when needed) and when the client is not to be used anymore, the client would also call kdestroy.

We could probably achieve this by creating a start and stop method on ReliableHttpClient. We can also call those methods from __enter__ and __exit__ so that the object can also be used with a with clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggFTW, @languy I think it's better not to destroy the kerberos ticket. There may be multiple processes using the kerberos ticket. Consider livy and jupyter running on the same server, destroying the ticket will effect livy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a security concern to leave the ticket open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prabhu1984 can you put your thoughts on this issue.

# Stop flag can be used to stop renewing kerberos ticket for the user
stop_flag = Event()
KerberosThread(stop_flag, endpoint).start()
self.kerberos_info[endpoint.username] = stop_flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the user use the stop_flag? This sounds like something we could automate for the user as soon as the kernel dies or the endpoint is removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on this. When endpoint is removed, we can use stop_flag to stop renewing ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thread should be self-contained in the ReliableHttpClient, so the user doesn't have to do any clean up manually.

@@ -17,6 +19,10 @@ def __init__(self, endpoint, headers, retry_policy):
self._endpoint = endpoint
self._headers = headers
self._retry_policy = retry_policy
if self._endpoint.auth_type == constants.AUTH_KERBEROS:
self._auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is not being unit tested at the moment. Could you please add tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated PR will contain unit tests for the logic.

@@ -77,6 +77,7 @@ def version(path):
'pandas>=0.17.1',
'numpy',
'requests',
'requests_kerberos',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add it to the requirements.txt file too please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated PR will contain this change



class Endpoint(object):
def __init__(self, url, username="", password=""):
def __init__(self, url, auth_type=constants.NO_AUTH, username="", password=""):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be None. If None, then get the value from a configuration called default_authentication_method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do auth_type needs a default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think default value should be None. If None, then check for the user default configuration.

@aggFTW
Copy link
Contributor

aggFTW commented Oct 12, 2016

Let me talk to some people on our team because I'm no Kerberos expert. Thanks!

KerberosThread(stop_flag, endpoint).start()
self.kerberos_info[endpoint.username] = stop_flag
# Wait for kerberos ticket to be obtained
time.sleep(3)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define hardcoded value along with the other Kerberos constants in suggested configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated PR will contain this change

@languy
Copy link

languy commented Oct 13, 2016

Great elegant way to do kerberos auth!
A few comments/suggestions;

  • as aggFTW commented: it would be nice to bubble up errors back to the user:
    When getting the ticket: wrong credentials, unable to connect to hostname
    What happens if the ticket expires (before it was renewed)? Can the end-user understand the issue and recover?
  • if I understand the code correctly, a thread is spawned that renews the ticket in an open loop:
    • Once the ticket gets renewed, is there a way to verify that the ticket is actually there? (maybe by checking the ticket cache, or does the request actually return the ticket?) This would help error handling.
    • If you can access the ticket after kinit: check its expiration date in order to adjust the renewal time. Your defined 6000 sec renewal delay seems much shorter than the default TGT validity. This would be ideal but at a minimum. making it configurable and setting this to a more typical value would allow users to adapt it to their specific kerberos setup.

@languy
Copy link

languy commented Oct 13, 2016

On deleting tickets: does this require installing kerberos5 on the host? If so, the krb5 utilities will take care of eventually purging expired tickets. However, it won't invalidate tickets if that's what we want to do upon shutdown.

@joychak
Copy link

joychak commented Oct 13, 2016

What happens when the client already had a Kerberos ticket (by doing kinit) before logging into Jupyter-hub and opening a notebook? In this scenario the Jupyter's KDC Authentication module will use the HTTP service principal to validate/verify the client's Kerberos ticket through HTTP-authenticate/spnego and pass it to SparkMagic (or SparkMagic has to pick it up from jupyter's spawner process) to use it for further communication to Livy?

What I am talking about is the single integrated kerberos authentication when the ticket flows from client-machine to JupyterHub/SparkMagic to Livy to Spark.

@praveen-kanamarlapudi
Copy link
Contributor Author

@languy

as aggFTW commented: it would be nice to bubble up errors back to the user: When getting the ticket: wrong credentials, unable to connect to hostname What happens if the ticket expires (before it was renewed)? Can the end-user understand the issue and recover?

We are working on checking kerberos ticket expiration time and adjust the renewal time as per it. There is another scenario, some other process might have destroyed the ticket in between the renewal. So we can renew the ticket when we see http response code as 401.

Once the ticket gets renewed, is there a way to verify that the ticket is actually there? (maybe by checking the ticket cache, or does the request actually return the ticket?) This would help error handling.

We can check the subprocess exits status. Exit status can help to know the ticket status

If you can access the ticket after kinit: check its expiration date in order to adjust the renewal time. Your defined 6000 sec renewal delay seems much shorter than the default TGT validity. This would be ideal but at a minimum. making it configurable and setting this to a more typical value would allow users to adapt it to their specific kerberos setup.

Sure. I will change the renewal delay.

@aggFTW
Copy link
Contributor

aggFTW commented Oct 13, 2016

@joychak do you have a suggestion on how to address that use case?

@joychak
Copy link

joychak commented Oct 14, 2016

@aggFTW, we are currently working on it and will submit the PR very soon which involves new KDCAuthenticator for JupyterHub, KDCSpawner for JupyterHub, changes in SparkMagic to use the client's kerberos ticket (or optionally using SparkMagic specific keytab) to authenticate with Livy along with adding the proxy-user info in the request for Livy.

@praveen-kanamarlapudi
Copy link
Contributor Author

@joychak Thanks for the update.
I think it's good have kerberos enabled in JupyterHub than spark magic. I am closing the PR and waiting to see kerberos in JupyterHub..

@aggFTW
Copy link
Contributor

aggFTW commented Oct 14, 2016

Thanks @joychak and @praveenkanamarlapudi!
Looking forward to the PR to continue the discussion.

@pkasinathan
Copy link

Hi @joychak,

That's good to hear. One quick question.

How will it work if user wants to run only with Jupyter (spark magic) without Jupyterhub?

I.e. It's very easy to setup for anyone to deploy Jupyter and Sparkmagic package on any host without root/admin access and starting jupyter. Whereas, Installing jupyterhub requires pre-requisites like node installation, configurable-http-proxy installation etc.

Thanks
Prabhu

@joychak
Copy link

joychak commented Oct 14, 2016

@prabhu1984, let me look into the Jupyter notebook (without hub) authentication part and get back. But good point indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants