Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish Shibboleth Integration #48

Merged
merged 13 commits into from
Aug 1, 2019
Merged

Finish Shibboleth Integration #48

merged 13 commits into from
Aug 1, 2019

Conversation

ArmaanT
Copy link
Member

@ArmaanT ArmaanT commented Jul 29, 2019

This PR finalizes the Shibboleth login integration.
When a user logs in through Shibboleth, if it is their first time logging in a new user will be created with the supplied username, first name, last name, email. Note it appears Penn does not send an email through Shibboleth
On every login, user's affiliations are updated.

This PR should be heavily reviewed to ensure no authentication bypasses are possible.

@ArmaanT ArmaanT requested review from davish and ezwang July 29, 2019 16:51
@coveralls
Copy link

coveralls commented Jul 29, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling b75f0af on fix/shibboleth into 301adcc on master.

@kirubarajan
Copy link
Contributor

Is there a TLDR on how this is tested?

@kirubarajan
Copy link
Contributor

I see the unit tests for exception handling but wondering if its rigorous enough for route authorization and whatnot

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

The unit tests are primarily to ensure that a user can log in though Shibboleth. Testing to make sure no one can break it should just involve a code review (including nginx configs) and sending requests to accounts/login/ with fake data to determine if we can log in as another user, or change user data.

I've done a few quick tests, and I'm pretty sure it's secure, but a few additional eyes wouldn't hurt.

@ArmaanT ArmaanT requested a review from kirubarajan July 29, 2019 18:11
@@ -36,8 +33,7 @@ def get(self, request):
user = auth.authenticate(remote_user=pennkey, shibboleth_attributes=shibboleth_attributes)
if user:
auth.login(request, user)
params = request.get_full_path().split('next=')[1]
return redirect('https://platform.pennlabs.org' + params)
return redirect(request.GET.get('next'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GET['next'] contain the redirect URL for success? Is pennlabs.org hardcoded anywhere? Just curious about design choices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's it. It's a relative URL to redirect once logged in. There's no reference to pennlabs.org. In the normal login process next should be accounts/authorize?x=..., Since users will be logging in through OAuth2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. If we don't hardcode pennlabs.org is there any way somebody could spoof the redirect URL back to ours with their own hosted instance of Shibboleth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the redirect doesn't contain any user information. At that point the user is already logged into platform and a cookie is set. The redirect is just to let the user continue the OAuth2 flow. There shouldn't be any way to leak data through this.

@kirubarajan
Copy link
Contributor

On every login, user's affiliations are updated.

Why is it on every login?

@kirubarajan
Copy link
Contributor

Note it appears Penn does not send an email through Shibboleth

When should Shibboleth send emails? Is this functionality we need for forgotten passwords or something? Do we handle that?

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

On every login, user's affiliations are updated.

Why is it on every login?

I don't really have a great reason for it, but my thinking was when people become alumni their affiliations will change. Additionally if a student teaches a class like CIS 19x, then their affiliations may change.

Eventually we may want to use this affiliations to determine what people can do. An example being letting professors choose comments on PCR

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Note it appears Penn does not send an email through Shibboleth

When should Shibboleth send emails? Is this functionality we need for forgotten passwords or something? Do we handle that?

Sorry, this is poor wording. I meant Penn does not send us a user's email through Shibboleth. In certain configurations (not Penn's) Shibboleth provides additional metadata on users who are logging in. It would be nice for us if we got students emails through Shibboleth, but it seems like we won't be able to do that. Penn would have to make that change. If we can't get a directory API key, then it's worth asking them to change, if possible.

@kirubarajan
Copy link
Contributor

Got it. Can't find the affiliation changing code in this PR but it should be fine if we're using the standard Django abstractions. I can't really think of a better way anyway to update affiliation so this flow LGTM

@kirubarajan
Copy link
Contributor

Is searchPennDirectory not implemented yet? Not sure why you want to include the stub but not important enough to request changes so up to your discretion.

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Got it. Can't find the affiliation changing code in this PR but it should be fine if we're using the standard Django abstractions. I can't really think of a better way anyway to update affiliation so this flow LGTM

This block in backends.py handles the affiliation updating. It just adds the correct affiliation in a many-to-many field in the user.

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Is searchPennDirectory not implemented yet? Not sure why you want to include the stub but not important enough to request changes so up to your discretion.

We don't have a directory API key that lets us search for the data we want (the data is hidden behind a PennKey login). So I can't yet implement the method. I've requested a key with elevated privileges, so we'll have to wait and see if we can get one.

@kirubarajan
Copy link
Contributor

This block in backends.py handles the affiliation updating. It just adds the correct affiliation in a many-to-many field in the user.

Gotcha. I would personally abstract that into a helper function somewhere, just because from a dev perspective I guess it's not authentication code specifically? But non-breaking of course so maybe a later refactor could handle this

@kirubarajan
Copy link
Contributor

Also for the sake of security updating the Django version number in requirements.txt?

@kirubarajan
Copy link
Contributor

Overall, LGTM. Follows standard Shibboleth flow and OAuth2 conventions/best practices with no glaring vulnerabilities. From a functional standpoint ready to merge, but awaiting approval from @davish and @ezwang.

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Also for the sake of security updating the Django version number in requirements.txt?

That should be done, but I was waiting for coveralls to update. It pins the urllib3 version to something that is not supported by requests. I've already made a PR on that project to fix it (TheKevJames/coveralls-python#204).

Edit: updated in 37f8d6b

Copy link
Member

@davish davish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me! Most pressing open question for me is how this works as part of the reusable app. But assuming that that's fine, this looks good! Excited for integrating this.

setattr(user, key, self.searchPennDirectory(username, key))

if key != 'affiliation':
setattr(user, key, value if value else self.searchPennDirectory(username, key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we worried about latency issues for an external request to the Penn directory? What if the directory API is down, but shibboleth is still up? We should probably still accept logins in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the thread with @kirubarajan, but still think this is something to think about at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Polling the directory only happens on a user's first login, so I don't think latency will be a real issue. We probably should add some error checking if the directory is down, but since we don't have access to it yet, I think we should leave it as is.

user.save()
user = self.configure_user(request, user)

# Update affiliations with every log in
if shibboleth_attributes is not None and 'affiliation' in shibboleth_attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could move the not-None check to encompass the if created branch as well. Not sure exactly how shibboleth_attributes is populated, but would it be a successful login if it was None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below

pennkey = request.META.get('HTTP_EPPN', '').lower().split('@')[0]
first_name = request.META.get('HTTP_GIVENNAME', '').lower().capitalize()
last_name = request.META.get('HTTP_SN', '').lower().capitalize()
email = request.META.get('HTTP_MAIL', '').lower()
shibboleth_attributes = {'first_name': first_name, 'last_name': last_name, 'email': email}
affiliation = request.META.get('HTTP_UNSCOPED_AFFILIATION', '').split(';')
shibboleth_attributes = {'first_name': first_name, 'last_name': last_name, 'email': email,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my comment above, looks like shibboleth_attributes being None would mean that something went wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I added the check since I had previously written a test where shibboleth_attributes was empty, but you're right, it would be impossible for that to happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b75f0af

"""
Log in a user.
WARNING: You must ensure this page is protected by Shibboleth and Clean Headers
See https://github.com/nginx-shib/nginx-http-shibboleth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cursory read of the linked page, I'm guessing this means that not everyone can hit the LoginView endpoint. I see this is in the accounts package... will we have to add a custom nginx rule on every dokku app that uses this package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to make sure that when platform is (re)deployed in production, we configure nginx to require Shibboleth authentication on the accounts/login/ route. This only needs to be done on platform though.

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Overall looks good to me! Most pressing open question for me is how this works as part of the reusable app. But assuming that that's fine, this looks good! Excited for integrating this.

The reusable app portion will be django-labs-accounts which I can finish once this PR is merged. The reusable app will just act as an OAuth2 client to interface with platform.

@ArmaanT ArmaanT requested a review from davish July 29, 2019 20:37
@davish
Copy link
Member

davish commented Jul 29, 2019

Overall looks good to me! Most pressing open question for me is how this works as part of the reusable app. But assuming that that's fine, this looks good! Excited for integrating this.

The reusable app portion will be django-labs-accounts which I can finish once this PR is merged. The reusable app will just act as an OAuth2 client to interface with platform.

Does this repo pennlabs/platform not use django-labs-accounts? If it doesn't, I think it might be a good idea to switch it over to the reusable app for the sake of dogfooding once the app has been updated as well.

@kirubarajan
Copy link
Contributor

Does this repo pennlabs/platform not use django-labs-accounts? If it doesn't, I think it might be a good idea to switch it over to the reusable app for the sake of dogfooding once the app has been updated as well.

Completely forgot about django-labs-accounts but 100% agree with this.

@ArmaanT
Copy link
Member Author

ArmaanT commented Jul 29, 2019

Does this repo pennlabs/platform not use django-labs-accounts? If it doesn't, I think it might be a good idea to switch it over to the reusable app for the sake of dogfooding once the app has been updated as well.

So this is where I need to document things better. The way things are set up, platform lets users log in through Shibboleth. Users then have a valid session with platform which is also an OAuth2 provider. django-labs-accounts doesn't interface with Shibboleth at all. It is installed on other apps like clubs-backend and allows clubs-backend to log in users through platform's master database of users. django-labs-accounts gets its information about users from platform. So it doesn't really make sense to put it on platform. Not sure if I'm doing a good job explaining. I'll document the entire auth flow soon.

user = auth.authenticate(remote_user=pennkey, shibboleth_attributes=shibboleth_attributes)
if user:
auth.login(request, user)
params = request.get_full_path().split('next=')[1]
return redirect('https://platform.pennlabs.org' + params)
return redirect(request.GET.get('next', '/'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would protect this using a whitelist, since it could possibly open us up to phishing attacks by making the URL appear to come from us.

It might be a bit difficult, since there are a lot of different products and domains we could use this with. Maybe a simple list of domain names for now and expanding that into a model later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the redirect exposes us to any potential attack vectors. the user.authenticate method creates a cookie valid only on platform.pennlabs.org to verify the user is logged in, that information doesn't get sent through the redirect.

@ArmaanT
Copy link
Member Author

ArmaanT commented Aug 1, 2019

Note CircleCI tests are failing due to the updated version of shortener. This is fixed in #49.

@ArmaanT ArmaanT merged commit ee2d2ed into master Aug 1, 2019
@ArmaanT ArmaanT deleted the fix/shibboleth branch August 1, 2019 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants