Bug: parser should not add trailing slash unconditionally #808

wcarmon · 2023-02-02T23:53:36Z

The parser is incorrectly adding trailing slash here:

rust-url/url/src/parser.rs

Line 1143 in 1c1e406

self.serialization.push('/');

Urls with and without a trailing slash are different, (although some servers choose to treat them the same way)

If this wasn't a mistake, this behavior should be a ParseOption, so we can interop with other languages.

Most other languages don't do this.

Here are some little test programs in other popular languages which demonstrate expected behavior

Golang example

package main

import (
	"fmt"
	"net/url"
)

func main() {
	uStr := "https://www.abc.net"
	parsed, _ := url.Parse(uStr)

	got := fmt.Sprintf("%v", parsed)

	if uStr != got {
		fmt.Printf("oops, should be the same: got=%v", got)  // never executes
	}
	fmt.Printf("matches exactly as expected")
}

Java example

  public static void main(String[] args) throws Exception {
        var uStr = "http://www.abc.net";
        java.net.URL url = new URL(uStr);
        java.net.URI uri = URI.create(uStr);

        if (!uStr.equals(url.toString())) {
            throw new RuntimeException("url should match exactly"); // never executes
        }

        if (!uStr.equals(uri.toString())) {
            throw new RuntimeException("uri should match exactly"); // never executes
        }

        System.out.println("both match exactly as expected");
  }

Rust example showing the bug

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn should_round_trip() -> Result<(), anyhow::Error> {
        let want = "https://www.abc.net";

        let u = Url::parse(want)?;  // bug happens deep in here

        let got = u.to_string();
        assert_eq!(want, got); // boom!

        Ok(())
    }
}

The text was updated successfully, but these errors were encountered:

valenting · 2023-02-03T07:45:12Z

According the the URL spec and the reference implementation this is the correct behaviour.

tmccombs · 2023-02-03T15:26:05Z

A request to "http://example.com" would always have a path of / in the http request. An http request with an empty string as the path isn't valid.

wcarmon · 2023-02-03T17:02:11Z

If you'd rather not change your code, that is 100% your call; It's your repo.

I don't see any standard mentioning what either of you described. The standards either don't mention it or are counter to your comments above.

Standard-1: https://url.spec.whatwg.org/#url-serializing (link provided by @valenting) doesn't mention trailing slash required
Standard-2: rfc2732
Standard-3: rfc2396
Standard-4: How nginx handles it, link shows trailing slash urls are distinct from non-trailing-slash, both valid (nginx is arguably the most popular web server)
Standard-5: How apache handles it: link shows trailing slash urls are distinct from non-trailing-slash, both valid (apache is arguably the most popular web server too)
Standard-6: How Java handles it (Java currently being a more heavily used language than Rust), and sample code above
Standard-7: How Golang handles it, and sample code above

tmccombs · 2023-02-03T18:57:59Z

At first I thought the same as you, that the whatwg spec didn't add the trailing slash . But on a closer reading, if you follow the algorithm, it does, but it is subtle.

Basically, when parsing in the "path" state, if it reaches the end of the url, it appends the current buffer (in this case an empty string) to the array of path segments. So the deserialiazed path is a single empty segment. Then during serialization, that single segment is added with a "/" prefix .

Now, that said, it's impossible to know just from the text of the spec if this behavior was intentional, or not. It might be a bug in the spec. Or maybe the algorithm was specifically designed not to allow an empty path. It isn't clear to me.

You could file a ticket for the whatwg repo to either explicitly call out how the current algorithm handles this case, or change it to not add an empty segment to the path if the path doesn't contain any "/”.

I'm not sure how the nginx and apache links are relevant. Rust-url is only adding this trailing slash if the path is empty. Having a server that expects an a path of "" instead of "/" would be very unusual. And I think such a request wouldn't comply with the HTTP specs (although I haven't investigated that).

I'm curious what your use case is where this causes problems.

0xNF · 2024-02-19T05:18:45Z

Here's one use case where this causes issues:

Google OAuth2 checks the RedirectURL byte-for-byte. If a user registers a redirect url without a trailing slash, then it becomes impossible to use the Rust URL library to successfully authenticate with Google.

tmccombs · 2024-02-19T20:15:33Z

That sounds like a bug in google oauth2 to me. It should probably treat example.com and example.com/ the same.

0xNF · 2024-02-20T01:35:54Z

I agree, it does sound like a bug in Google's Oauth implementation. Sadly, Box.com suffers from the same problem.

Much of the internet is broken, and the strictness of the url library makes interfacing with the broken parts of the internet impossible.

valenting · 2024-02-21T09:24:07Z

Google OAuth2 checks the RedirectURL byte-for-byte. If a user registers a redirect url without a trailing slash, then it becomes impossible to use the Rust URL library to successfully authenticate with Google.

As far as I know it's the developer who registers the redirect-url, so they should make sure the path is correct, no?

0xNF · 2024-02-21T11:14:18Z

Sure, it'd be nice if everyone did the right thing all the time. But then, Rust itself probably wouldn't exist either, let alone this URL library.

My application isn't in control of what the user does on other systems. I can only work with the information they give me. If the user gives me a URL with no trailing slash, and my program adds one and tries to interface with Google (and box, and others), and they say "no", then my program is broken from their perspective.

valenting closed this as completed Feb 3, 2023

rkarp mentioned this issue Jul 10, 2024

Use provided GITHUB_* URL variables snok/container-retention-policy#88

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: parser should not add trailing slash unconditionally #808

Bug: parser should not add trailing slash unconditionally #808

wcarmon commented Feb 2, 2023 •

edited

Loading

valenting commented Feb 3, 2023

tmccombs commented Feb 3, 2023

wcarmon commented Feb 3, 2023 •

edited

Loading

tmccombs commented Feb 3, 2023

0xNF commented Feb 19, 2024

tmccombs commented Feb 19, 2024

0xNF commented Feb 20, 2024

valenting commented Feb 21, 2024

0xNF commented Feb 21, 2024 •

edited

Loading

Bug: parser should not add trailing slash unconditionally #808

Bug: parser should not add trailing slash unconditionally #808

Comments

wcarmon commented Feb 2, 2023 • edited Loading

Golang example

Java example

Rust example showing the bug

valenting commented Feb 3, 2023

tmccombs commented Feb 3, 2023

wcarmon commented Feb 3, 2023 • edited Loading

tmccombs commented Feb 3, 2023

0xNF commented Feb 19, 2024

tmccombs commented Feb 19, 2024

0xNF commented Feb 20, 2024

valenting commented Feb 21, 2024

0xNF commented Feb 21, 2024 • edited Loading

wcarmon commented Feb 2, 2023 •

edited

Loading

wcarmon commented Feb 3, 2023 •

edited

Loading

0xNF commented Feb 21, 2024 •

edited

Loading