Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize all proxito slashes #8028

Merged
merged 3 commits into from Mar 18, 2021
Merged

Conversation

ericholscher
Copy link
Member

This is currently broken,
eg. this URL works:
https://docs.readthedocs.io/en/latest///index.html

@ericholscher ericholscher requested a review from a team March 17, 2021 21:43
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I just made a comment about the re module.

@@ -173,6 +174,10 @@ def process_request(self, request): # noqa
if hasattr(ret, 'status_code'):
return ret

if '//' in request.path:
# Remove multiple slashes from URL's
return redirect(re.sub('//+', '/', request.get_full_path()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to mention that may be better to use regex module if we are applying the regex on user's input. IIRC, there was some security/performance in the re module that are not present in regex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex is dangerous when the pattern is arbitrary (ReDOS), here the matching is the one that is arbitrary, so all good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this regex is super simple, so I don't think it should be an issue.

def test_slash_redirect(self):
host = 'project.dev.readthedocs.io'

url = '/en/latest////awesome.html'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it will work the same way, but we could add a test with trailing slashes.

  • /en/latest/awesome.html/// should give 404
  • '/en/latest//// should work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't doing any 404 checking here, it will always return a 302 for any path in /, since that's happening before any checking of the code. I added a test checking the // at the end tho.

url_parsed = urlparse(request.get_full_path())
clean_path = re.sub('//+', '/', url_parsed.path)
new_parsed = url_parsed._replace(path=clean_path)
return redirect(urlunparse(new_parsed))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you can use new_parsed.geturl()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using it the same way in other places in the code, so I'll keep it for now. We can update it all later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm seems like that is a special case, and we're using geturl() too. Will update 👍

@ericholscher ericholscher enabled auto-merge (squash) March 18, 2021 14:43
@ericholscher ericholscher merged commit 0b3d41b into master Mar 18, 2021
3 checks passed
@ericholscher ericholscher deleted the proxito-slash-normalization branch March 18, 2021 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants