Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several questionable .relativeTo() results. #78

Closed
wants to merge 3 commits into from

Conversation

djcsdy
Copy link
Contributor

@djcsdy djcsdy commented Apr 15, 2013

Thanks for fixing #75 – that fixed the specific cases that I raised.

However, there are several more things that I think aren’t quite right about .relativeTo().

May I humbly suggest the attached changeset?

Hopefully the test cases that I have added and changed speak for themselves, but I’m happy to explain my rationale if anything is controversial, or to be corrected if I’m being silly :).

@rodneyrehm
Copy link
Member

I think you went a bit overboard with the merging thing. RFC 3986 celarly states in Section 5.2.2 - Transform References that there are 5 components to be considered: scheme, authority(username, password, hostname, port), path, query, fragment.

As for normalizing both URIs prior to figuring out their relation: yep, good idea.

if there's any other subtlety I've missed, please point me to it.

Cheers,
Rod

@djcsdy
Copy link
Contributor Author

djcsdy commented Apr 15, 2013

With one exception, url === url.relativeTo(base).absoluteTo(base) holds true for every test case, and I think the test cases provide good coverage, so I’m reasonably comfortable that this is sane behaviour.

The one exception is where url is denormalized, so obviously you don’t get the original URI back, but you do get back a URI that is canonically equivalent. Perhaps it’s worth me adjusting the test to prove that rather than skipping the absoluteTo step for that case.

Thanks for all your time today, btw. I’m sorry if I’m distracting you from more important work :).

EDIT: I accidentally posted this too soon, so I’m about to post an additional comment with some more stuff I wanted to say.

@djcsdy
Copy link
Contributor Author

djcsdy commented Apr 15, 2013

The cases that you didn’t mention are the ones I think are the most important:

'same scheme':

URI("http://example.org/foo/bar/bat").
    relativeTo("http://example.com/foo/").
    toString();
  • Your result: "bar/bat" – resolving this will fetch the resource from the wrong host :(.
  • My result: "//example.org/foo/bar/bat" – this is a valid relative URI, honest :).

'credentials' and 'base credentials' demonstrate a similar cases where the hostname is the same but the credentials are different. If any part of the authority differs between the URIs, then the only part that can be made relative is the scheme.

'different scheme':

URI("http://example.org/foo/bar").
    relativeTo("https://example.org/foo/").
    toString();
  • Your result: "bar" – this will use the https scheme instead of http, potentially really bad.
  • My result: "http://example.org/foo/bar" – since the scheme is different, the absolute URI is the best you can do.

'parent top level':

URI("/relative/path?blubber=1#hash1").
    relativeTo("/path/to/file?some=query#hash").
    toString()
  • Your result: "/relative/path?blubber=1#hash1"
  • My result: "../../relative/path?blubber=1#hash1" – Other similar cases use .. to step up to the parent; I don’t see any reason not to use it just because we have to step all the way up to root in this case.

@djcsdy
Copy link
Contributor Author

djcsdy commented Apr 15, 2013

I will check tomorrow to make sure that my merging behaviour is the inverse of the relative URI resolution algorithm defined by RFC3986. One place where I have concerns is that I’m considering the credentials and hostname separately, whereas RFC3986 considers them together as the authority. Possibly my approach leads to some incorrect edge cases, although it should still be more correct than the existing behaviour, which ignores the authority altogether.

@djcsdy
Copy link
Contributor Author

djcsdy commented Apr 16, 2013

Hi! I’m just dropping in to say that I will be having another look at this next week. I don’t think you should merge this PR as-is, as there are a couple of issues I need to look at more closely.

@djcsdy
Copy link
Contributor Author

djcsdy commented Jul 1, 2013

Just leaving this here as a note to myself:

Since the URI standard doesn’t define relativization, we have to define it for ourselves, preferably in a way that makes useful sense in the context of the existing URI standard.

Since the URI standard does define relative URI resolution (.absoluteTo()) and canonical equivalence (.equals()), I think it makes sense to define URI relativization in terms of those parts of the standard.

Therefore, my goal is that uri.relativeTo(base) will return the shortest possible relative or absolute URI result that fulfils the condition result.absoluteTo(base).equals(uri).

At the moment the code for this pull request doesn’t achieve this goal, although it is pretty close.

Here are the known problems:

  • URI("http://example.org/a?foo").relativeTo("http://example.org/a") -> a?foo should be ?foo.
  • URI("http://example.org/a#foo").relativeTo("http://example.org/a") -> a#foo should be #foo.
  • Anything involving a change in credentials but not hostname, or vice-versa, is possibly incorrect. Need to investigate.

@rodneyrehm
Copy link
Member

An additional though here: whatever .relativeTo() should be acceptable by <base href=""> - it should, by itself, be parseable as a valid URI. If the shortest possible URL is not, maybe we should introduce a second function .shortestRelativeTo()

@djcsdy
Copy link
Contributor Author

djcsdy commented Jul 2, 2013

I think I might be missing something...

I took it as read that any URI returned by .relativeTo() has to be valid, otherwise it isn’t very useful :).

(I’m not really sure how a URI could be invalid but still meet the conditions above – how do you absolutize a URI that isn’t representable?)

(?foo and #foo are both valid URIs, by the way. They’re relative-refs with an empty relative-part, which is permitted by RFC 3986.)

@djcsdy
Copy link
Contributor Author

djcsdy commented Jul 2, 2013

There is one case where it might be worth returning a longer-than-necessary URI, though:

URI("http://example.org/a").relativeTo("http://example.org/a")

The shortest possible result here is the empty string, which is a valid URI, but I can imagine that it might cause trouble in some contexts. The next shortest canonically-equivalent URI is ., which might be a better idea.

@rodneyrehm
Copy link
Member

What's the status on this? are there still things missing? I've started merging the open stuff into master

@djcsdy
Copy link
Contributor Author

djcsdy commented Aug 3, 2013

IMHO this is unambiguously better than the implementation it replaces:

  • uri === uri.relativeTo(base).absoluteTo(base) holds true for any normalized uri, so I haven’t broken reasonable expectations.
  • It’s able to handle denormalized paths, which the old implementation can’t do (I can’t remember exactly what the problem is, but it shows up if you run the new tests against the old code).
  • It creates relative paths in some cases where the old implementation gives up and returns a URI with an absolute path.
  • It’s able to create scheme-relative URIs, which the existing implementation isn’t able to do.
  • It passes all the existing tests, except in one case where the test expected relativeTo() to return an absolute path even though it could be relativized. In that one case I fixed the test to expect a relative path.
  • It adds some new tests and passes them; the old implementation would fail the new tests.

There are still some things that could be improved, but they are problems with the existing implementation too:

  • Where a URI could be relativized down to just a query component and/or fragment, this implementation unnecessarily keeps the last path segment (see Sub-optimal result in relativeTo with fragment #95). The result still resolves to the correct URI, but it should be shorter. However, this is still slightly better than the old implementation.
  • Since URI-relativization isn’t standardized anywhere, we could do with tightening up what this function is supposed to do, for the purpose of documentation, and update the documentation accordingly. I’ll raise this as a separate issue.

tldr: I think you’re quite safe to merge this pull request.

@djcsdy
Copy link
Contributor Author

djcsdy commented Aug 3, 2013

Oh wait, I need to check that I haven’t broken anything by attempting to merge the authority component.

I’ll add tests for that now, and come back.

@djcsdy
Copy link
Contributor Author

djcsdy commented Aug 3, 2013

OK, I added some more tests and they pass. I think this PR should be good to merge.

rodneyrehm added a commit that referenced this pull request Aug 4, 2013
@rodneyrehm
Copy link
Member

I've merged this into master - it will be included in the next release. thank you for your help!

@rodneyrehm rodneyrehm closed this Aug 4, 2013
@rodneyrehm
Copy link
Member

I've fixed Issue #95 URI("http://www.example.com:8080/dir/file#abcd").relativeTo('http://www.example.com:8080/dir/file').toString() === "#abcd" (in master)

please go ahead and write up the relativization docs as proposed :)

@djcsdy djcsdy deleted the relative-to branch August 4, 2013 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants