-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Kerberos authentication #284
Added Kerberos authentication #284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the PR.
My general comment would be that I think we should discuss the model for getting new tickets and deleting them once the user cleans up. Could you please clarify your thoughts on this?
It'd also be great to see:
- Modification to
README.md
so that users know how to setup different authentication methods. - Ability to specify which authentication method to use on the payload for https://github.com/jupyter-incubator/sparkmagic#reconnectsparkmagic
- Unit tests for any new logic introduced (e.g. selecting different authentication model triggers different behaviors).
# Time in seconds | ||
KERBEROS_TIME_INTERVAL = 6000 | ||
|
||
NO_AUTH = "no auth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"None"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggFTW, we are changing this and updated PR will contain these changes.
KERBEROS_TIME_INTERVAL = 6000 | ||
|
||
NO_AUTH = "no auth" | ||
AUTH_KERBEROS = "kerberos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kerberos"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggFTW, we are changing this and updated PR will contain these changes.
|
||
NO_AUTH = "no auth" | ||
AUTH_KERBEROS = "kerberos" | ||
AUTH_SSL = "username/password" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Basic Access"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggFTW, we are changing this and updated PR will contain these changes.
|
||
KERBEROS_KINIT = 'kinit' | ||
# Time in seconds | ||
KERBEROS_TIME_INTERVAL = 6000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be a constant, but a configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please name it so that the user knows what time unit to use: kerberos_renew_time_interval_seconds
or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggFTW, we are changing this and updated PR will contain these changes.
options={constants.AUTH_KERBEROS: constants.AUTH_KERBEROS, constants.AUTH_SSL: constants.AUTH_SSL, | ||
constants.NO_AUTH: constants.NO_AUTH}, | ||
description=u"Auth type:" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On change, we should hide/show the appropriate fields. So:
- On change to NO_AUTH: hide username and password.
- On change to AUTH_KERBEROS: show username and password.
- On change to AUTH_SSL: show username and password.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aggFTW We thought of implementing this change in later. If it's javascript, can you help me understand where can I add javascript for these king of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not. Ipywidgets has a way to make widgets visible or not:
https://github.com/jupyter-incubator/sparkmagic/blob/master/autovizwidget/autovizwidget/widget/encodingwidget.py#L126
@@ -122,3 +127,12 @@ def get_info_endpoint_widget(self, endpoint, url): | |||
text = "No sessions on this endpoint." | |||
|
|||
return self.ipywidget_factory.get_html(text, width=width) | |||
|
|||
def initialize_kerberos(self, endpoint): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about getting the ticket directly from ReliableHttpClient
? Then, upon creation of the object, the ticket will be obtained (and renewed when needed) and when the client is not to be used anymore, the client would also call kdestroy
.
We could probably achieve this by creating a start
and stop
method on ReliableHttpClient
. We can also call those methods from __enter__
and __exit__
so that the object can also be used with a with
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a security concern to leave the ticket open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prabhu1984 can you put your thoughts on this issue.
# Stop flag can be used to stop renewing kerberos ticket for the user | ||
stop_flag = Event() | ||
KerberosThread(stop_flag, endpoint).start() | ||
self.kerberos_info[endpoint.username] = stop_flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the user use the stop_flag
? This sounds like something we could automate for the user as soon as the kernel dies or the endpoint is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on this. When endpoint is removed, we can use stop_flag to stop renewing ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the thread should be self-contained in the ReliableHttpClient, so the user doesn't have to do any clean up manually.
@@ -17,6 +19,10 @@ def __init__(self, endpoint, headers, retry_policy): | |||
self._endpoint = endpoint | |||
self._headers = headers | |||
self._retry_policy = retry_policy | |||
if self._endpoint.auth_type == constants.AUTH_KERBEROS: | |||
self._auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is not being unit tested at the moment. Could you please add tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated PR will contain unit tests for the logic.
@@ -77,6 +77,7 @@ def version(path): | |||
'pandas>=0.17.1', | |||
'numpy', | |||
'requests', | |||
'requests_kerberos', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add it to the requirements.txt
file too please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated PR will contain this change
|
||
|
||
class Endpoint(object): | ||
def __init__(self, url, username="", password=""): | ||
def __init__(self, url, auth_type=constants.NO_AUTH, username="", password=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be None. If None, then get the value from a configuration called default_authentication_method
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do auth_type needs a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think default value should be None. If None, then check for the user default configuration.
Let me talk to some people on our team because I'm no Kerberos expert. Thanks! |
KerberosThread(stop_flag, endpoint).start() | ||
self.kerberos_info[endpoint.username] = stop_flag | ||
# Wait for kerberos ticket to be obtained | ||
time.sleep(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define hardcoded value along with the other Kerberos constants in suggested configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Updated PR will contain this change
Great elegant way to do kerberos auth!
|
On deleting tickets: does this require installing kerberos5 on the host? If so, the krb5 utilities will take care of eventually purging expired tickets. However, it won't invalidate tickets if that's what we want to do upon shutdown. |
What happens when the client already had a Kerberos ticket (by doing kinit) before logging into Jupyter-hub and opening a notebook? In this scenario the Jupyter's KDC Authentication module will use the HTTP service principal to validate/verify the client's Kerberos ticket through HTTP-authenticate/spnego and pass it to SparkMagic (or SparkMagic has to pick it up from jupyter's spawner process) to use it for further communication to Livy? What I am talking about is the single integrated kerberos authentication when the ticket flows from client-machine to JupyterHub/SparkMagic to Livy to Spark. |
We are working on checking kerberos ticket expiration time and adjust the renewal time as per it. There is another scenario, some other process might have destroyed the ticket in between the renewal. So we can renew the ticket when we see http response code as 401.
We can check the subprocess exits status. Exit status can help to know the ticket status
Sure. I will change the renewal delay. |
@joychak do you have a suggestion on how to address that use case? |
@aggFTW, we are currently working on it and will submit the PR very soon which involves new KDCAuthenticator for JupyterHub, KDCSpawner for JupyterHub, changes in SparkMagic to use the client's kerberos ticket (or optionally using SparkMagic specific keytab) to authenticate with Livy along with adding the proxy-user info in the request for Livy. |
@joychak Thanks for the update. |
Thanks @joychak and @praveenkanamarlapudi! |
Hi @joychak, That's good to hear. One quick question. How will it work if user wants to run only with Jupyter (spark magic) without Jupyterhub? I.e. It's very easy to setup for anyone to deploy Jupyter and Sparkmagic package on any host without root/admin access and starting jupyter. Whereas, Installing jupyterhub requires pre-requisites like node installation, configurable-http-proxy installation etc. Thanks |
@prabhu1984, let me look into the Jupyter notebook (without hub) authentication part and get back. But good point indeed. |
We can connect to kerberos enabled Livy now.