Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dependencies with local impls #32

Merged
merged 4 commits into from
Feb 22, 2021
Merged

Conversation

algesten
Copy link
Contributor

@algesten algesten commented May 23, 2020

Hi! I'm looking at using publicsuffix in hreq. In hreq I want to minimize the number of dependencies and after some investigation into dependencies that are pulled in for one crate only, I found that publicsuffix was a good candidate.

Looking at publicsuffix dependencies, I figured both error-chain and regex could quite easily be replaced by local implementations.

This PR is broken into two commits:

  • The first removes the need for regex and lazy_static and uses a simple iterator approach to validate domain names and email local parts.
  • The second removes error-chain by making a hand written impl of Error.

The second commit might be controversial cause it changes the surface area API of the Error type – i.e. breaking change. Happy to discuss/work the commits separately or as one.

├── publicsuffix v1.5.4
│   ├── error-chain v0.12.2
│   │   [build-dependencies]
│   │   └── version_check v0.9.1
│   ├── idna v0.2.0
│   │   ├── matches v0.1.8
│   │   ├── unicode-bidi v0.3.4
│   │   │   └── matches v0.1.8 (*)
│   │   └── unicode-normalization v0.1.12
│   │       └── smallvec v1.4.0
│   ├── lazy_static v1.4.0
│   ├── regex v1.3.7
│   │   └── regex-syntax v0.6.17
│   └── url v2.1.1
│       ├── idna v0.2.0 (*)
│       ├── matches v0.1.8 (*)
│       └── percent-encoding v2.1.0 (*)

src/matcher.rs Outdated
@@ -0,0 +1,160 @@
/// Check if input is a valid domain label
pub fn is_label(input: &str) -> bool {
if input.is_empty() {
Copy link

@oherrala oherrala May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS labels also have maximum length of 63 octets (RFC1035 2.3.4) so we could check that here also?

Suggested change
if input.is_empty() {
if input.is_empty() || input.len() > 63 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this in terms of code separation. The number of DNS labels are checked here at the call site: https://github.com/rushmorem/publicsuffix/blob/master/src/lib.rs#L613

The empty check is required for the following logic to be correct, but I'm wondering whether the 63 octet check should be done where we call matcher::is_label instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is already checked for here in verify_dns_length:

https://github.com/rushmorem/publicsuffix/blob/master/src/lib.rs#L706

    fn to_ascii(domain: &str) -> Result<String> {
        let result = idna::Config::default()
            .transitional_processing(true)
            .verify_dns_length(true)
            .to_ascii(domain);
        result.map_err(|error| ErrorKind::Uts46(error).into())
    }

And there's a test checking for it here: https://github.com/rushmorem/publicsuffix/blob/master/src/tests.rs#L148

It might be better relying on this mechanism, cause I'm a bit uncertain about ASCII vs UTF-8 here. The idna package should do the right thing with regards to that though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS addresses (aka hostnames) have two limits: The total length and the label length. The amount of labels can then be derived from this: 253 chars / 2 = 126.5 ~= 127.

is_label() is validating a DNS label and the specification says the maximum length is 63. So for me it feels it's the proper place to do the validation of this maximum length.

And you are right about idna crate. It does the validation, but I found it to be slow when given anomalous input (from fuzzing). Issue here: servo/rust-url#595

And IDNA (ASCII vs. Unicode hostnames) makes things more difficult. I'm trying to figure out best way to do the most cheap and simple checks first before going to expensive conversions (like IDNS to_ascii()) to validate everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I don't mind hugely where the test happens, however as long as that slow idna test is there, maybe we shouldn't double up the logic?

Maybe cleaner to make a commit that both adds the new check – with proper thinking around ASCII vs Unicode – and disable the idna at the same time?

src/matcher.rs Outdated
Comment on lines 9 to 11
let first = chars.next().unwrap();
if !first.is_ascii_alphanumeric() {
return false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can eliminate unwrap by using match for example like below. That unwrap should never panic because the length is checked above, but it's also possible to implement same thing without it, so why not?

Suggested change
let first = chars.next().unwrap();
if !first.is_ascii_alphanumeric() {
return false;
match chars.next() {
Some(c) if c.is_ascii_alphanumeric() => (),
_ => return false,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the unwrap with match which removes the need for checking for is_empty()

@JordiPolo
Copy link

FWIW in my hobby project I have ~200 dependencies total and this is the only one bringing error-chain into the mix, so I'd really welcome this change :)

@coolreader18
Copy link

coolreader18 commented Nov 16, 2020

Is there any update on this? I can also attest that it would be nice to get rid of regex, as this crate (as a transitive dependency of reqwest) is the only one that depends on it.

@rushmorem rushmorem merged commit bc74978 into rushmorem:master Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants