New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonicalize all proxito slashes #8028
Conversation
This is currently broken, eg. this URL works: https://docs.readthedocs.io/en/latest///index.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I just made a comment about the re
module.
readthedocs/proxito/middleware.py
Outdated
@@ -173,6 +174,10 @@ def process_request(self, request): # noqa | |||
if hasattr(ret, 'status_code'): | |||
return ret | |||
|
|||
if '//' in request.path: | |||
# Remove multiple slashes from URL's | |||
return redirect(re.sub('//+', '/', request.get_full_path())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to mention that may be better to use regex
module if we are applying the regex on user's input. IIRC, there was some security/performance in the re
module that are not present in regex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex is dangerous when the pattern is arbitrary (ReDOS), here the matching is the one that is arbitrary, so all good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this regex is super simple, so I don't think it should be an issue.
def test_slash_redirect(self): | ||
host = 'project.dev.readthedocs.io' | ||
|
||
url = '/en/latest////awesome.html' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it will work the same way, but we could add a test with trailing slashes.
/en/latest/awesome.html///
should give 404'/en/latest////
should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't doing any 404 checking here, it will always return a 302 for any path in /
, since that's happening before any checking of the code. I added a test checking the //
at the end tho.
readthedocs/proxito/middleware.py
Outdated
url_parsed = urlparse(request.get_full_path()) | ||
clean_path = re.sub('//+', '/', url_parsed.path) | ||
new_parsed = url_parsed._replace(path=clean_path) | ||
return redirect(urlunparse(new_parsed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think you can use new_parsed.geturl()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using it the same way in other places in the code, so I'll keep it for now. We can update it all later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm seems like that is a special case, and we're using geturl()
too. Will update
This is currently broken,
eg. this URL works:
https://docs.readthedocs.io/en/latest///index.html