-
-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that host_whitelist is not a string #349
Conversation
Attacker can use `https:///evil.com` to make a malformed "hostless" URL that would have a `netloc == ''` -- which is `in` any string. Strings are not documented to be allowed in this config variable anyhow, so just raise a type error if someone passes in a string by accident. (This is a breaking change for people who didn't follow the documented types, but shouldn't affect anyone else.) New test fails on current master.
813e921
to
5b2de2f
Compare
src/lxml/html/clean.py
Outdated
host_whitelist = () | ||
host_whitelist = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to change, but I saw a lot of code out there passing in an empty tuple. Passing an empty list would make it less likely that someone accidentally passes in ("example.com")
and is confused about the error. But the guard later on should catch it anyhow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, up to you! I'm fine backing it out if you think it should stay a tuple.
src/lxml/html/clean.py
Outdated
if not isinstance(self.host_whitelist, collection_types): | ||
raise TypeError("host_whitelist must be a collection type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to always turn the value into a frozenset (if non-empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As defensive copying, and for speed of lookups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both, yes. We should generally own config collections that the user passed into the constructor, and not let future modifications leak in. I'm aware that that wasn't done before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, just noticed that you said non-empty -- why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think -- any further changes needed?
Hmm, just noticed that you said non-empty -- why so?
Because it wasn't necessary until I understood that copying the input container would generally be a good idea. Now it's generally necessary. ;-)
|
Thanks |
Attacker can use
https:///evil.com
to make a malformed "hostless" URLthat would have a
netloc == ''
-- which isin
any string. Stringsare not documented to be allowed in this config variable anyhow, so just
raise a type error if someone passes in a string by accident.
(This is a breaking change for people who didn't follow the documented
types, but shouldn't affect anyone else.)
New test fails on current master.