-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dependencies with local impls #32
Conversation
src/matcher.rs
Outdated
@@ -0,0 +1,160 @@ | |||
/// Check if input is a valid domain label | |||
pub fn is_label(input: &str) -> bool { | |||
if input.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNS labels also have maximum length of 63 octets (RFC1035 2.3.4) so we could check that here also?
if input.is_empty() { | |
if input.is_empty() || input.len() > 63 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this in terms of code separation. The number of DNS labels are checked here at the call site: https://github.com/rushmorem/publicsuffix/blob/master/src/lib.rs#L613
The empty check is required for the following logic to be correct, but I'm wondering whether the 63 octet check should be done where we call matcher::is_label
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is already checked for here in verify_dns_length
:
https://github.com/rushmorem/publicsuffix/blob/master/src/lib.rs#L706
fn to_ascii(domain: &str) -> Result<String> {
let result = idna::Config::default()
.transitional_processing(true)
.verify_dns_length(true)
.to_ascii(domain);
result.map_err(|error| ErrorKind::Uts46(error).into())
}
And there's a test checking for it here: https://github.com/rushmorem/publicsuffix/blob/master/src/tests.rs#L148
It might be better relying on this mechanism, cause I'm a bit uncertain about ASCII vs UTF-8 here. The idna package should do the right thing with regards to that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNS addresses (aka hostnames) have two limits: The total length and the label length. The amount of labels can then be derived from this: 253 chars / 2 = 126.5 ~= 127.
is_label()
is validating a DNS label and the specification says the maximum length is 63. So for me it feels it's the proper place to do the validation of this maximum length.
And you are right about idna
crate. It does the validation, but I found it to be slow when given anomalous input (from fuzzing). Issue here: servo/rust-url#595
And IDNA (ASCII vs. Unicode hostnames) makes things more difficult. I'm trying to figure out best way to do the most cheap and simple checks first before going to expensive conversions (like IDNS to_ascii()) to validate everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I don't mind hugely where the test happens, however as long as that slow idna
test is there, maybe we shouldn't double up the logic?
Maybe cleaner to make a commit that both adds the new check – with proper thinking around ASCII vs Unicode – and disable the idna
at the same time?
src/matcher.rs
Outdated
let first = chars.next().unwrap(); | ||
if !first.is_ascii_alphanumeric() { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can eliminate unwrap
by using match
for example like below. That unwrap
should never panic because the length is checked above, but it's also possible to implement same thing without it, so why not?
let first = chars.next().unwrap(); | |
if !first.is_ascii_alphanumeric() { | |
return false; | |
match chars.next() { | |
Some(c) if c.is_ascii_alphanumeric() => (), | |
_ => return false, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced the unwrap
with match which removes the need for checking for is_empty()
FWIW in my hobby project I have ~200 dependencies total and this is the only one bringing error-chain into the mix, so I'd really welcome this change :) |
Is there any update on this? I can also attest that it would be nice to get rid of |
Hi! I'm looking at using publicsuffix in hreq. In hreq I want to minimize the number of dependencies and after some investigation into dependencies that are pulled in for one crate only, I found that publicsuffix was a good candidate.
Looking at publicsuffix dependencies, I figured both
error-chain
andregex
could quite easily be replaced by local implementations.This PR is broken into two commits:
regex
andlazy_static
and uses a simple iterator approach to validate domain names and email local parts.error-chain
by making a hand written impl ofError
.The second commit might be controversial cause it changes the surface area API of the
Error
type – i.e. breaking change. Happy to discuss/work the commits separately or as one.