Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: parser should not add trailing slash unconditionally #808

Closed
wcarmon opened this issue Feb 2, 2023 · 9 comments
Closed

Bug: parser should not add trailing slash unconditionally #808

wcarmon opened this issue Feb 2, 2023 · 9 comments

Comments

@wcarmon
Copy link

wcarmon commented Feb 2, 2023

The parser is incorrectly adding trailing slash here:

self.serialization.push('/');

Urls with and without a trailing slash are different, (although some servers choose to treat them the same way)

If this wasn't a mistake, this behavior should be a ParseOption, so we can interop with other languages.

Most other languages don't do this.

Here are some little test programs in other popular languages which demonstrate expected behavior

Golang example

package main

import (
	"fmt"
	"net/url"
)

func main() {
	uStr := "https://www.abc.net"
	parsed, _ := url.Parse(uStr)

	got := fmt.Sprintf("%v", parsed)

	if uStr != got {
		fmt.Printf("oops, should be the same: got=%v", got)  // never executes
	}
	fmt.Printf("matches exactly as expected")
}

Java example

  public static void main(String[] args) throws Exception {
        var uStr = "http://www.abc.net";
        java.net.URL url = new URL(uStr);
        java.net.URI uri = URI.create(uStr);

        if (!uStr.equals(url.toString())) {
            throw new RuntimeException("url should match exactly"); // never executes
        }

        if (!uStr.equals(uri.toString())) {
            throw new RuntimeException("uri should match exactly"); // never executes
        }

        System.out.println("both match exactly as expected");
  }

Rust example showing the bug

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn should_round_trip() -> Result<(), anyhow::Error> {
        let want = "https://www.abc.net";

        let u = Url::parse(want)?;  // bug happens deep in here

        let got = u.to_string();
        assert_eq!(want, got); // boom!

        Ok(())
    }
}
@valenting
Copy link
Collaborator

According the the URL spec and the reference implementation this is the correct behaviour.

@tmccombs
Copy link
Contributor

tmccombs commented Feb 3, 2023

A request to "http://example.com" would always have a path of / in the http request. An http request with an empty string as the path isn't valid.

@wcarmon
Copy link
Author

wcarmon commented Feb 3, 2023

If you'd rather not change your code, that is 100% your call; It's your repo.

I don't see any standard mentioning what either of you described. The standards either don't mention it or are counter to your comments above.

@tmccombs
Copy link
Contributor

tmccombs commented Feb 3, 2023

At first I thought the same as you, that the whatwg spec didn't add the trailing slash . But on a closer reading, if you follow the algorithm, it does, but it is subtle.

Basically, when parsing in the "path" state, if it reaches the end of the url, it appends the current buffer (in this case an empty string) to the array of path segments. So the deserialiazed path is a single empty segment. Then during serialization, that single segment is added with a "/" prefix .

Now, that said, it's impossible to know just from the text of the spec if this behavior was intentional, or not. It might be a bug in the spec. Or maybe the algorithm was specifically designed not to allow an empty path. It isn't clear to me.

You could file a ticket for the whatwg repo to either explicitly call out how the current algorithm handles this case, or change it to not add an empty segment to the path if the path doesn't contain any "/”.

I'm not sure how the nginx and apache links are relevant. Rust-url is only adding this trailing slash if the path is empty. Having a server that expects an a path of "" instead of "/" would be very unusual. And I think such a request wouldn't comply with the HTTP specs (although I haven't investigated that).

I'm curious what your use case is where this causes problems.

@0xNF
Copy link

0xNF commented Feb 19, 2024

Here's one use case where this causes issues:

Google OAuth2 checks the RedirectURL byte-for-byte. If a user registers a redirect url without a trailing slash, then it becomes impossible to use the Rust URL library to successfully authenticate with Google.

@tmccombs
Copy link
Contributor

That sounds like a bug in google oauth2 to me. It should probably treat example.com and example.com/ the same.

@0xNF
Copy link

0xNF commented Feb 20, 2024

I agree, it does sound like a bug in Google's Oauth implementation. Sadly, Box.com suffers from the same problem.

Much of the internet is broken, and the strictness of the url library makes interfacing with the broken parts of the internet impossible.

@valenting
Copy link
Collaborator

Google OAuth2 checks the RedirectURL byte-for-byte. If a user registers a redirect url without a trailing slash, then it becomes impossible to use the Rust URL library to successfully authenticate with Google.

As far as I know it's the developer who registers the redirect-url, so they should make sure the path is correct, no?

@0xNF
Copy link

0xNF commented Feb 21, 2024

Sure, it'd be nice if everyone did the right thing all the time. But then, Rust itself probably wouldn't exist either, let alone this URL library.

My application isn't in control of what the user does on other systems. I can only work with the information they give me. If the user gives me a URL with no trailing slash, and my program adds one and tries to interface with Google (and box, and others), and they say "no", then my program is broken from their perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants