Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour of Url::join #333

Closed
theduke opened this issue May 10, 2017 · 22 comments
Closed

Behaviour of Url::join #333

theduke opened this issue May 10, 2017 · 22 comments

Comments

@theduke
Copy link

theduke commented May 10, 2017

Url::join has a very confusing name.
I would expect it to append a path segment.
Instead, it replaces the whole base path, which is quite awkward.

If you want to actually join, you have to:

let segments = url.path_segments_mut().unwrap()pop_if_empty().push("new-segment");

Personally I think join should append a new segment, but changing this might be too late...
If so, the time is now, before a 1.0.

Otherwise, I would suggest adding a new method, something like add_segment.

@dtolnay
Copy link
Contributor

dtolnay commented May 12, 2017

From the example code in the documentation, it looks like it appends a path segment as you expected.

extern crate url;

use url::Url;

fn main() {
    let url = Url::parse("https://example.net/foo/").unwrap();
    let url = url.join("bar").unwrap();
    assert_eq!(url.as_str(), "https://example.net/foo/bar");
}

Could you share the input you were using and the output you were expecting?

@theduke
Copy link
Author

theduke commented May 12, 2017

f the joined path does not start with a slash, it replaces the last path segment which works fine if the url ends with a slash, but works unexpectedly (at least for me) when it does not.

If the joined path does start with a slash, it replaces the entire path, regardless of whether the url ends with one.

let u = url::Url::parse("http://domain.com/a/b").unwrap();
  let u2 = u.join("c").unwrap();
  println!("{}", u2);
  
  let u3 = u.join("d/e").unwrap();
  println!("{}", u3);
  
  let u3 = u.join("/f").unwrap();
  println!("{}", u3);

Output:

http://domain.com/a/c
http://domain.com/a/d/e
http://domain.com/f

@theduke
Copy link
Author

theduke commented May 12, 2017

As a comparison, this is the behaviour when joining with PahtBuf:

Joining a path starting with a slash also replaces the whole path, but otherwise it appends a path segment and does not replace the last one.

  let p = PathBuf::from("/a/b");
  let p2 = p.join("c");
  println!("{}", p2.to_str().unwrap());
  
  let p3 = p.join("d/e");
  println!("{}", p3.to_str().unwrap());
  
  let p4 = p.join("/f");
  println!("{}", p4.to_str().unwrap());
/a/b/c
/a/b/d/e
/f

@dtolnay
Copy link
Contributor

dtolnay commented May 12, 2017

Makes sense. I agree that is confusing.

@theduke
Copy link
Author

theduke commented May 12, 2017

I'd also expect .join("/b") to append /b, not to replace the whole path.

Path in std behaves this way as well, though.

@jongiddy
Copy link

jongiddy commented May 12, 2017

Evidently, for both PathBuf and Url, join means "return parameter 2 as a non-contextual (absolute) path, given the context named by parameter 1". It might have better been called move_to. For a path name, that is the chdir behavior. For a URL, it is the HTML follow-link behavior.

Given this similar behavior of join for both paths and URL's, I'd say keep it as-is and choose a new name for a method that actually appends at all times. Since the word "append" appears several times in the descriptions of the expected behavior, that seems the right name.

@theduke theduke changed the title Url::join: Confusing name Behaviour of Url::join May 12, 2017
@SimonSapin
Copy link
Member

For what it’s worth some_base_url.join(some_string) is the same as Url::options().base_url(&some_base_url).parse(some_string), which in HTML is the same as resolving the link in <a href="{some_string}">…</a><base href="{some_base_url.as_str()}">.

For example, foo.html with base http://example.net/bar/baz.html resolves to http://example.net/bar/foo.html, not http://example.net/bar/baz.html/foo.html.

Given backward-compatibility constraints, either or both of these are acceptable:

  • Add a new method with a different name and a different behavior.
  • Add a new method with a different name and the same behavior as join, and add #[deprecated] to join.

Changing the behavior of join would be a breaking change. Even in a future 2.0 version of url it doesn’t seem like a good idea because existing callers expect the current behavior.

@theduke
Copy link
Author

theduke commented May 12, 2017

Yeah I agree, hence my but changing this might be too late... comment.

What would be a good name for a new method?

  • append
  • path_append
  • append_path

@SimonSapin
Copy link
Member

Note that join is not just about the path. In pseudo-code, "https://a/b".join("//c") is https://c/, for example. "http://d/e?f#g".join("?h") is http://d/e?h. How do you want to handle these cases in this new method?

If you only want to manipulate the path without touching other components, path_segments_mut is already there.

@nox
Copy link
Contributor

nox commented Jul 18, 2019

Nobody suggested a better alternative name, so I'm closing this.

@nox nox closed this as completed Jul 18, 2019
@theduke
Copy link
Author

theduke commented Jul 18, 2019

My suggestion would be renaming join to merge, which expresses the behaviour a bit better.

But ,due to the popularity of url, that is probably unwise at this point.

@vpzomtrrfrt
Copy link

Looks like Node.js' url module calls this "resolve".

At the very least, the documentation on join should direct users to path_segments_mut if that's the recommended workaround

@tmccombs
Copy link
Contributor

I do think resolve is a better description of what it does than join. It is possible to rename as resolve and make join a deprecated alias to it. You just can't remove the alias anytime soon

@nox
Copy link
Contributor

nox commented Jul 27, 2020

"Resolve" in the node.js thing is a legacy API, the actual API is just the URL constructor with 2 arguments. I still think join is a better name for this.

@DanielJoyce
Copy link

DanielJoyce commented Mar 26, 2021

This just bit me too. The behavior makes no sense compared to every other url library I've used.

Also the lack of use of enums compared to the weird "can be a base boolean" seems an oversight given Rust powerful type system.

Enum Url{
Base,
Relative,
etc...
}

This api feels rough in places.

@Overdash
Copy link

Overdash commented Aug 28, 2021

"Resolve" in the node.js thing is a legacy API, the actual API is just the URL constructor with 2 arguments. I still think join is a better name for this.

If you intend to keep the name the same, I suggest adding an example to the doc-string explaining the use case of when the joining path (new/path to be appended) is preceded with a slash (it completely replaces the existing path). Theres an example of when the existing path lacks a trailing-slash, but non for the case I proposed (as far as I can see).
Also, is there anything that exists that allows us to append() a path (which is what a lot of people in this ticket use join() for)

@Sytten
Copy link

Sytten commented Oct 1, 2021

I also think there should be an append function that adds it to the end without replacing it.

@SimonSapin
Copy link
Member

What would append do when self has a query string or fragment identifier? If its argument contains a ? or #? If its argument contains /?

To append path segments one at a time (before any query of fragment without affecting them) there’s url.path_segments_mut()?.push(segment) https://docs.rs/url/2.2.2/url/struct.PathSegmentsMut.html#method.push

@Overdash
Copy link

Overdash commented Oct 2, 2021

@SimonSapin that's up to the caller surely. If it contains bad URI symbols it shouldnt be the problem of the library but rather bad usage.
Also, I current use the path_segment_mut() approach as currently it's only way. But that's not good API. The return type is unexpected and overall awkward to deal with. Have a simple wrapper would benefit many people

@RobWalt
Copy link

RobWalt commented Jan 29, 2023

Just adding my two cent comment here to give the discussion more relevance again. I agree with everyone here stating that the interface name join is unintuitive. The word join kind of implies (at least for me) that the two parts that get joined don't really change. Just to highlight that a bit further here are examples of the use of the word join in different domains

  • join two blocks of wood with glue
  • join two points to form a line
  • join two strings into one

I would vote for the long term deprecation solution @tmccombs suggested. In the meantime it would be nice to add a new utility function which wraps the behavior of

url.path_segments_mut()?.push(segment)

which comes closest to what join seems to imply for a lot of people here. Additionally here is a name suggestion: concat_route.

Just to provide some further ideas, here is another alternative name for the join method: switch_to.

@SimonSapin
Copy link
Member

This issue has been closed for 4 years. There is no plan of renaming the method. That said:

I did not make up the name join from first principles. It is from this prior art:

https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin

Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.

https://docs.python.org/3/library/os.path.html#os.path.join

Join one or more path segments intelligently. The return value is the concatenation of path and all members of *paths, with exactly one directory separator following each non-empty part, except the last. That is, the result will only end in a separator if the last part is either empty or ends in a separator. If a segment is an absolute path (which on Windows requires both a drive and a root), then all previous segments are ignored and joining continues from the absolute path segment.

https://doc.rust-lang.org/std/path/struct.Path.html#method.join

Creates an owned PathBuf with path adjoined to self.

See PathBuf::push for more details on what it means to adjoin a path.

(following that link)

Extends self with path.

If path is absolute, it replaces the current path.

@RobWalt
Copy link

RobWalt commented Jan 31, 2023

Thanks for searching that up! These are actually good reasons for why the name is what it is. I must admit that I was wrong and I'm sorry for the inconvenience of bringing up that long closed issue.

cbeck88 added a commit to cbeck88/rust-url that referenced this issue Jun 2, 2024
this function is an alternative to `Url::join` which addresses issues
discussed in servo#333, mainly that `Url::join` is sensitive to trailing
slashes is in the `Url`, and if the trailing slash is missing, may
remove segments from the base url and replace them with segments
from the joined `Url`.

There are good reasons for `Url::join` to behave that way, because
that is was is specified in the `Url` standard.

However it's still inconvenient because it often leads to situations
where, a service takes some base-url for some API as a config parameter,
uses `Url::join` to append various routes to it and make requests, and if
a trailing `/` is omitted in a config file, you don't figure it out until
deploying and looking at logs and seeing nonsense requests failing.
In many situations in web development these trailing `/` are not significant
so this is easy to forget and can become just an annoying papercut.

One suggestion in servo#333 was to add an alternative utility function that
isn't sensitive to the trailing `/`'s in this way.
This commit adds such a utility function with tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests