Add urlmaster and testAutomated() for 4225 tests in url_resolution. #550

Closed
wants to merge 16 commits into
from

Conversation

Projects
None yet
4 participants
@deitch
Contributor

deitch commented Jan 6, 2013

Includes automated url testing. Needs to be reviewed for agreement that these are correct.

See tmpvar#531

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 12, 2013

Member

OK, took a bit of time to review this. Sorry for the delay. Now that I've started to sink my teeth into it I should be able to correspond more frequently.

On major thing that sticks out to me: I think testing all those possible bases is probably not the way to go. Base, which is in my mind equivalent to document.location.href, should probably always be an absolute URL (whether file:// \or not.) That should cut down on the number of failures immensely.

More specific points from #531:

  1. If an href is just an authority '//www.not.com', which is legit according to the RFC, it errors out on doc.getElementById("theID").href

This is definitely a bug.

  1. Blank base: jsdom assigns the base of the current js file, which is definitely incorrect. It probably isn't a realistic scenario, but the RFC doesn't make it illegitimate. Either way, the URL of the JS file definitely is not right.

Well, I think the intent here is that the base should always be an absolute URL, as I say above, and if you haven't given it one, it should just use something reasonable like the JS file.

I guess the real issue here is that the detection logic is something like if (!config.url) instead of if (!isAbsoluteUrl(config.url)).

  1. Hash base: base that is only a hash/fragment and a relative ref, e.g. base is '#abc' and ref is 'q/r'. In that case, jsdom just takes 'q/r', but my read of the RFC is that the URL should be absolutized.

I don't think this is important though, since document base's should always be absolute URLs.

Member

domenic commented Jan 12, 2013

OK, took a bit of time to review this. Sorry for the delay. Now that I've started to sink my teeth into it I should be able to correspond more frequently.

On major thing that sticks out to me: I think testing all those possible bases is probably not the way to go. Base, which is in my mind equivalent to document.location.href, should probably always be an absolute URL (whether file:// \or not.) That should cut down on the number of failures immensely.

More specific points from #531:

  1. If an href is just an authority '//www.not.com', which is legit according to the RFC, it errors out on doc.getElementById("theID").href

This is definitely a bug.

  1. Blank base: jsdom assigns the base of the current js file, which is definitely incorrect. It probably isn't a realistic scenario, but the RFC doesn't make it illegitimate. Either way, the URL of the JS file definitely is not right.

Well, I think the intent here is that the base should always be an absolute URL, as I say above, and if you haven't given it one, it should just use something reasonable like the JS file.

I guess the real issue here is that the detection logic is something like if (!config.url) instead of if (!isAbsoluteUrl(config.url)).

  1. Hash base: base that is only a hash/fragment and a relative ref, e.g. base is '#abc' and ref is 'q/r'. In that case, jsdom just takes 'q/r', but my read of the RFC is that the URL should be absolutized.

I don't think this is important though, since document base's should always be absolute URLs.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 13, 2013

Contributor

Except for the bug listed as #1 above, all of these relate to having weird/illegit base hrefs. According to http://www.w3.org/TR/html401/struct/links.html#h-12.4 you get your base from (in descending order):

  • base href in HTML
  • base href given by http headers (seriously, I have never seen this before, but apparently rfc2616, just strange)
  • location of current document

So it is very possible to have an illegit base, set by html or http header, and still be at a legit window.location.href.

On the other side of it, though, you are right, if the base really is blank, even if it is set to href="" in html, then it should default to current location, so you are right on the second one above. I will change the tests to show it. Looking forward to your feedback on the other.

Contributor

deitch commented Jan 13, 2013

Except for the bug listed as #1 above, all of these relate to having weird/illegit base hrefs. According to http://www.w3.org/TR/html401/struct/links.html#h-12.4 you get your base from (in descending order):

  • base href in HTML
  • base href given by http headers (seriously, I have never seen this before, but apparently rfc2616, just strange)
  • location of current document

So it is very possible to have an illegit base, set by html or http header, and still be at a legit window.location.href.

On the other side of it, though, you are right, if the base really is blank, even if it is set to href="" in html, then it should default to current location, so you are right on the second one above. I will change the tests to show it. Looking forward to your feedback on the other.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 13, 2013

Contributor

I just realized I didn't explain that very well (or maybe I did?):

  1. Blank base: should fall back to document location, which is what you said, I will fix it.
  2. Illegit (but non-blank) base: is possible by setting html base node or http header, so should be tested, is what I am positing.
Contributor

deitch commented Jan 13, 2013

I just realized I didn't explain that very well (or maybe I did?):

  1. Blank base: should fall back to document location, which is what you said, I will fix it.
  2. Illegit (but non-blank) base: is possible by setting html base node or http header, so should be tested, is what I am positing.
@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 13, 2013

Member

It looks like illegit base URLs set using <base> (or HTTP I suppose) are converted to absolute URLs:

http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#base-urls

So I don't think it'll be necessary to test those cases, although it will be necessary for jsdom to ensure that the document's URL is always an absolute URL.

Member

domenic commented Jan 13, 2013

It looks like illegit base URLs set using <base> (or HTTP I suppose) are converted to absolute URLs:

http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#base-urls

So I don't think it'll be necessary to test those cases, although it will be necessary for jsdom to ensure that the document's URL is always an absolute URL.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 17, 2013

Contributor

It appears you are right. It looks like the steps for resolving any URL are:

  1. Determine the base URL:
    a) if there is no base element with an href attr, then use document URL
    b) if there is a base element with an href attr, then resolve it relative to the document URL (exactly as if it were an anchor tag relative to the document or a base)
  2. Resolve the anchor tag href relative to the base URL determined in 1 above.

I'll fix up the tests. Nice catch!

Contributor

deitch commented Jan 17, 2013

It appears you are right. It looks like the steps for resolving any URL are:

  1. Determine the base URL:
    a) if there is no base element with an href attr, then use document URL
    b) if there is a base element with an href attr, then resolve it relative to the document URL (exactly as if it were an anchor tag relative to the document or a base)
  2. Resolve the anchor tag href relative to the base URL determined in 1 above.

I'll fix up the tests. Nice catch!

Use latest urlmaster with proper testing for valid location URLs. Upd…
…ate url_resolution() tests automated to use better urlmaster.
@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 21, 2013

Contributor

OK, I am updating the pull request. urlmaster has moved up a number of revisions (0.2.5) to add numerous tests and add support for location+base+ref, and validated the root (location if provided, else base). Essentially, it conforms to the spec as discussed earlier.

Pushing to github now, take a look.

Contributor

deitch commented Jan 21, 2013

OK, I am updating the pull request. urlmaster has moved up a number of revisions (0.2.5) to add numerous tests and add support for location+base+ref, and validated the root (location if provided, else base). Essentially, it conforms to the spec as discussed earlier.

Pushing to github now, take a look.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 25, 2013

Member

Looking at this now. Having some minor issue with Windows backward slashes, will try to fix locally. Just wanted to let you know I didn't forget.

Member

domenic commented Jan 25, 2013

Looking at this now. Having some minor issue with Windows backward slashes, will try to fix locally. Just wanted to let you know I didn't forget.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 25, 2013

For Windows compat, this should be locn = toFileUrl(__filename)

For Windows compat, this should be locn = toFileUrl(__filename)

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Owner

Agreed. I just pushed the change in. Makes it a lot easier and cleaner.

Owner

deitch replied Jan 26, 2013

Agreed. I just pushed the change in. Makes it a lot easier and cleaner.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 25, 2013

Member

After the above toFileUrl fix, I took a chance to look over them. Most look solid. Here's things I ran into:

ERROR file://www.not.com vs file://www.not.com/

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref //www.not.com should resolve to file://www.not.com instead of file://www.not.com/

I think the extra slash is correct, especially since later tests on http/https include it. Similarly:

ERROR file://www.not.com#xyz vs file://www.not.com/#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base /a/b with ref //www.not.com#xyz should resolve to file://www.not.com#xyz instead of file://www.not.com/#xyz

ERROR https://?when=now vs https:///?when=now

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref https:?when=now should resolve to https://?when=now instead of https:///?when=now

Is it really correct that it throws away the entire path? Same for all the protocol ones, e.g. https: and https:#xyz


ERROR https:///q/r/c/f?when=now#xyz vs https://q/r/c/f?when=now#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base //www.why.com/a/b/./c/d/.././e/../f?foo=bar with ref https:/q/r/./c/d/.././e/../f?when=now#xyz should resolve to
 https:///q/r/c/f?when=now#xyz instead of https://q/r/c/f?when=now#xyz

You don't seem to get the right number of slashes when transitioning from file to http.


ERROR file:/// vs file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref / should resolve to file:/// instead of ...

You need to take into account Windows drive letters. It should actually be file:///C:/, not file:///.


But generally, this is in the right direction. It's uncovering lots of bugs.

Here's some patched jsdom code you can test against, in lib/level2/html.js:

  baseUrl: function(document) {
    var baseElements = document.getElementsByTagName('base'),
        baseUrl      = document.URL;

    if (baseElements.length > 0) {
      var baseHref = baseElements.item(0).href;
      if (baseHref) {
        baseUrl = URL.resolve(baseUrl, baseHref);
      }
    }

    return baseUrl;
  },
  resolve: function(document, href) {
    // When switching protocols, the path doesn't get canonicalized (i.e. .s and ..s are still left)
    var intermediate = URL.resolve(this.baseUrl(document), href);

    // This canonicalizes the path, at the cost of overwriting the hash.
    var nextStep = URL.resolve(intermediate, '#');

    // So, insert the hash back in, if there was one.
    var parsed = URL.parse(intermediate);
    return nextStep.slice(0, -1) + (parsed.hash || '');
  },

It seems to give the right thing most of the time?

Member

domenic commented Jan 25, 2013

After the above toFileUrl fix, I took a chance to look over them. Most look solid. Here's things I ran into:

ERROR file://www.not.com vs file://www.not.com/

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref //www.not.com should resolve to file://www.not.com instead of file://www.not.com/

I think the extra slash is correct, especially since later tests on http/https include it. Similarly:

ERROR file://www.not.com#xyz vs file://www.not.com/#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base /a/b with ref //www.not.com#xyz should resolve to file://www.not.com#xyz instead of file://www.not.com/#xyz

ERROR https://?when=now vs https:///?when=now

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref https:?when=now should resolve to https://?when=now instead of https:///?when=now

Is it really correct that it throws away the entire path? Same for all the protocol ones, e.g. https: and https:#xyz


ERROR https:///q/r/c/f?when=now#xyz vs https://q/r/c/f?when=now#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base //www.why.com/a/b/./c/d/.././e/../f?foo=bar with ref https:/q/r/./c/d/.././e/../f?when=now#xyz should resolve to
 https:///q/r/c/f?when=now#xyz instead of https://q/r/c/f?when=now#xyz

You don't seem to get the right number of slashes when transitioning from file to http.


ERROR file:/// vs file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js base  with ref / should resolve to file:/// instead of ...

You need to take into account Windows drive letters. It should actually be file:///C:/, not file:///.


But generally, this is in the right direction. It's uncovering lots of bugs.

Here's some patched jsdom code you can test against, in lib/level2/html.js:

  baseUrl: function(document) {
    var baseElements = document.getElementsByTagName('base'),
        baseUrl      = document.URL;

    if (baseElements.length > 0) {
      var baseHref = baseElements.item(0).href;
      if (baseHref) {
        baseUrl = URL.resolve(baseUrl, baseHref);
      }
    }

    return baseUrl;
  },
  resolve: function(document, href) {
    // When switching protocols, the path doesn't get canonicalized (i.e. .s and ..s are still left)
    var intermediate = URL.resolve(this.baseUrl(document), href);

    // This canonicalizes the path, at the cost of overwriting the hash.
    var nextStep = URL.resolve(intermediate, '#');

    // So, insert the hash back in, if there was one.
    var parsed = URL.parse(intermediate);
    return nextStep.slice(0, -1) + (parsed.hash || '');
  },

It seems to give the right thing most of the time?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Contributor

One of these days, I have to learn github-flavoured markdown correctly. I know how you did the code blocks, with 4 backticks before and after, or indented each line 4 times, but how did you get that cool separator between sections?

And how do you quote my comment in yours?

Contributor

deitch commented Jan 26, 2013

One of these days, I have to learn github-flavoured markdown correctly. I know how you did the code blocks, with 4 backticks before and after, or indented each line 4 times, but how did you get that cool separator between sections?

And how do you quote my comment in yours?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Contributor
ERROR file://www.not.com vs file://www.not.com/

I think the extra slash is correct, especially since later tests on http/https include it. Similarly:

This one is interesting. I just spent time hunting through RFC 3986 again. The scheme is followed by the authority. After that, it is followed by 4 path options. The first one is path-abempty, i.e. an empty path, which can be either '/' or empty. So both www.not.com and www.not.com/ are correct. This is true with a fragment/query and without a fragment/query. Neither of those are part of the path, so it is still path-abempty with or without them. Would be nice if the RFC just said, path-abempty is always '/' or is always empty. Probably lots of divergent browser behaviour history here.m

In our context, it is not correct of the tests to say that the above is an error, when either is legit. I think the test needs to be more tolerant, and accept both if the path is empty. I will change the tests to support it.


ERROR https:///q/r/c/f?when=now#xyz vs https://q/r/c/f?when=now#xyz

You don't seem to get the right number of slashes when transitioning from file to http.

I will investigate this.


ERROR https://?when=now vs https:///?when=now

Is it really correct that it throws away the entire path? Same for all the protocol ones, e.g. https: and https:#xyz

Yes, strange as it sounds. http://tools.ietf.org/html/rfc3986#section-5.2.2

The first section says that if the Reference has a scheme (e.g. https:) then we completely ignore the base.


ERROR file:/// vs file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js

You need to take into account Windows drive letters. It should actually be file:///C:/, not file:///.

Should it? The absolute path is not /Users/Domenic/... but /C:/Users/Domenic/... which means that if the reference is /, then it should replace the entire thing. AFAIK, the file URL has no special understanding of drive letters. The problem is URLs have no way of taking it into account.

Contributor

deitch commented Jan 26, 2013

ERROR file://www.not.com vs file://www.not.com/

I think the extra slash is correct, especially since later tests on http/https include it. Similarly:

This one is interesting. I just spent time hunting through RFC 3986 again. The scheme is followed by the authority. After that, it is followed by 4 path options. The first one is path-abempty, i.e. an empty path, which can be either '/' or empty. So both www.not.com and www.not.com/ are correct. This is true with a fragment/query and without a fragment/query. Neither of those are part of the path, so it is still path-abempty with or without them. Would be nice if the RFC just said, path-abempty is always '/' or is always empty. Probably lots of divergent browser behaviour history here.m

In our context, it is not correct of the tests to say that the above is an error, when either is legit. I think the test needs to be more tolerant, and accept both if the path is empty. I will change the tests to support it.


ERROR https:///q/r/c/f?when=now#xyz vs https://q/r/c/f?when=now#xyz

You don't seem to get the right number of slashes when transitioning from file to http.

I will investigate this.


ERROR https://?when=now vs https:///?when=now

Is it really correct that it throws away the entire path? Same for all the protocol ones, e.g. https: and https:#xyz

Yes, strange as it sounds. http://tools.ietf.org/html/rfc3986#section-5.2.2

The first section says that if the Reference has a scheme (e.g. https:) then we completely ignore the base.


ERROR file:/// vs file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js

You need to take into account Windows drive letters. It should actually be file:///C:/, not file:///.

Should it? The absolute path is not /Users/Domenic/... but /C:/Users/Domenic/... which means that if the reference is /, then it should replace the entire thing. AFAIK, the file URL has no special understanding of drive letters. The problem is URLs have no way of taking it into account.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Contributor

Screw it, writing the tests this way is way too hard. I am just shifting it to adding the slash for path-abempty.

Drat, it gets even worse. The resolution is actually different depending on the platform. I don't even get that error on Mac. On Mac, the resolution by jsdom gives www.not.com and so complains the other way around.

I am going to have to leave the urlmaster one way or the other consistently, and change the testing in jsdom to be more tolerant, which is just a pain. Need to parse the URL, check for a blank path, and then modify the test.

Contributor

deitch commented Jan 26, 2013

Screw it, writing the tests this way is way too hard. I am just shifting it to adding the slash for path-abempty.

Drat, it gets even worse. The resolution is actually different depending on the platform. I don't even get that error on Mac. On Mac, the resolution by jsdom gives www.not.com and so complains the other way around.

I am going to have to leave the urlmaster one way or the other consistently, and change the testing in jsdom to be more tolerant, which is just a pain. Need to parse the URL, check for a blank path, and then modify the test.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Contributor

Just pushed changes to handle the first trailing slash issue (1 of 4 you raised).
Still working on the second.
Looking for your feedback on the 3rd and 4th.

Contributor

deitch commented Jan 26, 2013

Just pushed changes to handle the first trailing slash issue (1 of 4 you raised).
Still working on the second.
Looking for your feedback on the 3rd and 4th.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 26, 2013

Contributor

Actually, it looks like urlmaster is correct for the second, and jsdom is incorrect.

The ref is https:/q/r/./c/d/.././e/../f?when=now#xyz which says scheme = https, authority (i.e. host) = '', and the path = '/q/r/./c/d/.././e/../f' (I am ignoring the hash and query for now).

If we use what jsdom gives https://q/r/c/f?when=now#xyz it would imply that the scheme = https (correct), host = 'q' and path = '/r/c/f' (incorrect). '/q' is the beginning of the path and there is no hostname. The URL-correct way to do this is either https:/q/... or (more correctly) https:///q/r/....

Contributor

deitch commented Jan 26, 2013

Actually, it looks like urlmaster is correct for the second, and jsdom is incorrect.

The ref is https:/q/r/./c/d/.././e/../f?when=now#xyz which says scheme = https, authority (i.e. host) = '', and the path = '/q/r/./c/d/.././e/../f' (I am ignoring the hash and query for now).

If we use what jsdom gives https://q/r/c/f?when=now#xyz it would imply that the scheme = https (correct), host = 'q' and path = '/r/c/f' (incorrect). '/q' is the beginning of the path and there is no hostname. The URL-correct way to do this is either https:/q/... or (more correctly) https:///q/r/....

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 28, 2013

Member

When I run the following test in all my web browsers, it gives two slashes, not three:

<html>
<head>
<base href="//www.why.com/a/b/./c/d/.././e/../f?foo=bar" />
</head>
<body>
<a href="https:/q/r/./c/d/.././e/../f?when=now#xyz" id="x"></a>
<script>
alert(document.getElementById("x").href);
</script>
</body>
</html>

It's possible the RFC and browser behavior conflict; for this I would look at the in-progress URL spec maybe?


Sounds good on throwing things away.


For file:// URLs on Windows: there are special rules for drive letters, at least in web browsers. They're in the URL spec I mentioned above, although I'm sad they're not in the RFC. Probably because nobody bothered to spell out what exactly the path was and what "/" means. But yes, in Windows, we need to consider drive letters :-/.

Member

domenic commented Jan 28, 2013

When I run the following test in all my web browsers, it gives two slashes, not three:

<html>
<head>
<base href="//www.why.com/a/b/./c/d/.././e/../f?foo=bar" />
</head>
<body>
<a href="https:/q/r/./c/d/.././e/../f?when=now#xyz" id="x"></a>
<script>
alert(document.getElementById("x").href);
</script>
</body>
</html>

It's possible the RFC and browser behavior conflict; for this I would look at the in-progress URL spec maybe?


Sounds good on throwing things away.


For file:// URLs on Windows: there are special rules for drive letters, at least in web browsers. They're in the URL spec I mentioned above, although I'm sad they're not in the RFC. Probably because nobody bothered to spell out what exactly the path was and what "/" means. But yes, in Windows, we need to consider drive letters :-/.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 29, 2013

Contributor

It's possible the RFC and browser behavior conflict; for this I would look at the in-progress URL spec maybe?

Possible. I couldn't figure out how to pull this out of the in-progress spec. But https://q/r means go to host q via protocol https on the default port to get path /r. It cannot mean anything else.


I see the drive letters now in the whatwg; they're not in the RFC including 3986, which is authoritative. The WHATWG seems to be saying:

If you are using an absolute path on a file URL relative to a base path, and the base path has a drive letter while the ref does not, then you need to grab the drive letter from the base and prepend it to the absolute path in the ref.

So the WHATWG recommends this, while the RFC doesn't do it? What do we do?

Contributor

deitch commented Jan 29, 2013

It's possible the RFC and browser behavior conflict; for this I would look at the in-progress URL spec maybe?

Possible. I couldn't figure out how to pull this out of the in-progress spec. But https://q/r means go to host q via protocol https on the default port to get path /r. It cannot mean anything else.


I see the drive letters now in the whatwg; they're not in the RFC including 3986, which is authoritative. The WHATWG seems to be saying:

If you are using an absolute path on a file URL relative to a base path, and the base path has a drive letter while the ref does not, then you need to grab the drive letter from the base and prepend it to the absolute path in the ref.

So the WHATWG recommends this, while the RFC doesn't do it? What do we do?

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 29, 2013

Member

Possible. I couldn't figure out how to pull this out of the in-progress spec. But https://q/r means go to host q via protocol https on the default port to get path /r. It cannot mean anything else.

Let's look at this in more detail:

  • filename: file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
  • base //www.why.com/a/b/./c/d/.././e/../f?foo=bar
  • ref https:/q/r/./c/d/.././e/../f?when=now#xyz

The question is, what does it even mean to resolve the relative URL https:/q/r/... from file:///www.why.com? I guess you switch to https, then use /q/r/... to build a new absolute URL? To me, on a https protocol, /q/r/... would result in hostname q, path r/.... Maybe you can see this in the same way the browser eliminates triple slashes when you type them in a URL bar (try it).

Pinging @annevk for more insight.

So the WHATWG recommends this, while the RFC doesn't do it? What do we do?

We're trying to emulate browsers, so WHATWG is the way to go :).

Member

domenic commented Jan 29, 2013

Possible. I couldn't figure out how to pull this out of the in-progress spec. But https://q/r means go to host q via protocol https on the default port to get path /r. It cannot mean anything else.

Let's look at this in more detail:

  • filename: file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
  • base //www.why.com/a/b/./c/d/.././e/../f?foo=bar
  • ref https:/q/r/./c/d/.././e/../f?when=now#xyz

The question is, what does it even mean to resolve the relative URL https:/q/r/... from file:///www.why.com? I guess you switch to https, then use /q/r/... to build a new absolute URL? To me, on a https protocol, /q/r/... would result in hostname q, path r/.... Maybe you can see this in the same way the browser eliminates triple slashes when you type them in a URL bar (try it).

Pinging @annevk for more insight.

So the WHATWG recommends this, while the RFC doesn't do it? What do we do?

We're trying to emulate browsers, so WHATWG is the way to go :).

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 29, 2013

Contributor

The question is, what does it even mean to resolve...

That's the real problem. https:/q/r makes no sense on its own. Logic would dictate, "take the https scheme, and then go back a level and grab the authority (host) from the base URL." But the RFC explicitly does not say that.

To me, on a https protocol, /q/r/... would result in hostname q, path r/..
Pretty sure that isn't correct, because a hostname always starts with //, whereas a single / means the beginning of an absolute path.

the same way the browser eliminates triple slashes when you type them in a URL bar
It might depend on the browser, but it doesn't matter: file:///a/b and file:/a/b are identical. If there are two slashes right after the : that ends the scheme, then the next characters until a / are the authority. If there is no authority, then /// and / are the same.


So the WHATWG recommends this, while the RFC doesn't do it? What do we do?
We're trying to emulate browsers, so WHATWG is the way to go :).

I can go for that. We'll be a little more intelligent about the Windows drive letters. The logic in the WHATWG is pretty sound.

Contributor

deitch commented Jan 29, 2013

The question is, what does it even mean to resolve...

That's the real problem. https:/q/r makes no sense on its own. Logic would dictate, "take the https scheme, and then go back a level and grab the authority (host) from the base URL." But the RFC explicitly does not say that.

To me, on a https protocol, /q/r/... would result in hostname q, path r/..
Pretty sure that isn't correct, because a hostname always starts with //, whereas a single / means the beginning of an absolute path.

the same way the browser eliminates triple slashes when you type them in a URL bar
It might depend on the browser, but it doesn't matter: file:///a/b and file:/a/b are identical. If there are two slashes right after the : that ends the scheme, then the next characters until a / are the authority. If there is no authority, then /// and / are the same.


So the WHATWG recommends this, while the RFC doesn't do it? What do we do?
We're trying to emulate browsers, so WHATWG is the way to go :).

I can go for that. We'll be a little more intelligent about the Windows drive letters. The logic in the WHATWG is pretty sound.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Jan 29, 2013

In browsers https:/q/r becomes https://q/r, except when the base URL is https itself. Say the base URL is https://x/ then https:/q/r becomes https://x/q/r. The URL Standard defines this in detail and annevk/url implements it to some extent, but needs updating as it does not yet cover the file URL scheme like the URL Standard does.

annevk commented Jan 29, 2013

In browsers https:/q/r becomes https://q/r, except when the base URL is https itself. Say the base URL is https://x/ then https:/q/r becomes https://x/q/r. The URL Standard defines this in detail and annevk/url implements it to some extent, but needs updating as it does not yet cover the file URL scheme like the URL Standard does.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 29, 2013

Contributor

Hey @annevk, welcome to the discussion!

What you are saying is: if (ref has scheme but not authority && base-scheme == ref-scheme) then result should use ref-scheme + base-authority + ref-path

I have no problem with that logic, but I don't see if in the URL Standard. http://tools.ietf.org/html/rfc3986#section-5.2.2 says

if defined(R.scheme) then
         T.scheme    = R.scheme;
         T.authority = R.authority;
         T.path      = remove_dot_segments(R.path);
         T.query     = R.query;
Contributor

deitch commented Jan 29, 2013

Hey @annevk, welcome to the discussion!

What you are saying is: if (ref has scheme but not authority && base-scheme == ref-scheme) then result should use ref-scheme + base-authority + ref-path

I have no problem with that logic, but I don't see if in the URL Standard. http://tools.ietf.org/html/rfc3986#section-5.2.2 says

if defined(R.scheme) then
         T.scheme    = R.scheme;
         T.authority = R.authority;
         T.path      = remove_dot_segments(R.path);
         T.query     = R.query;
@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Jan 30, 2013

http://url.spec.whatwg.org/#scheme-state step 2, substep 4 and 5. (I was not talking about RFC 3986, as @domenic said, it does not match browsers.) And just to link it, my implementation of that is https://github.com/annevk/url but it is out of date at the moment, especially for file URLs, but there are also some other issues.

annevk commented Jan 30, 2013

http://url.spec.whatwg.org/#scheme-state step 2, substep 4 and 5. (I was not talking about RFC 3986, as @domenic said, it does not match browsers.) And just to link it, my implementation of that is https://github.com/annevk/url but it is out of date at the moment, especially for file URLs, but there are also some other issues.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 30, 2013

Contributor

Ugh, I hate reading state machine logic. The RFC's if-then is so much cleaner.

Either way, I'll read it through and get back here.

The bigger question is, should we be RFC compliant or WHATWG compliant?

Contributor

deitch commented Jan 30, 2013

Ugh, I hate reading state machine logic. The RFC's if-then is so much cleaner.

Either way, I'll read it through and get back here.

The bigger question is, should we be RFC compliant or WHATWG compliant?

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Jan 30, 2013

Member

We should be browser-compliant, which is WHATWG, not RFC.

Agreed that the state machine logic is undecipherable. Good luck, brave sir!?

Member

domenic commented Jan 30, 2013

We should be browser-compliant, which is WHATWG, not RFC.

Agreed that the state machine logic is undecipherable. Good luck, brave sir!?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 30, 2013

Contributor

Or, you could spend a few days trying to decipher the federal tax code. After that, the state machine may look like child's play!

Contributor

deitch commented Jan 30, 2013

Or, you could spend a few days trying to decipher the federal tax code. After that, the state machine may look like child's play!

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 31, 2013

Contributor

Added windows drive support, per the WHATWG and your comments above.

Contributor

deitch commented Jan 31, 2013

Added windows drive support, per the WHATWG and your comments above.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Jan 31, 2013

Contributor

@annevk, I don't see it in scheme-state step 2, substeps 4-5. Can you explain what you see?

Contributor

deitch commented Jan 31, 2013

@annevk, I don't see it in scheme-state step 2, substeps 4-5. Can you explain what you see?

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Feb 2, 2013

What happens is that if input's scheme is equal to base's scheme and that scheme is a relative scheme (one of http/https/...) you continue parsing input as either a relative or authority (to be determined later). If they're not equal, but it's still a relative scheme you continue parsing input expecting authority, and failing the relative scheme criteria you continue parsing input as scheme data.

I'm not sure however if you can trivially patch a parser written towards the RFC to match what browsers do. I have not attempted to do such a thing myself.

annevk commented Feb 2, 2013

What happens is that if input's scheme is equal to base's scheme and that scheme is a relative scheme (one of http/https/...) you continue parsing input as either a relative or authority (to be determined later). If they're not equal, but it's still a relative scheme you continue parsing input expecting authority, and failing the relative scheme criteria you continue parsing input as scheme data.

I'm not sure however if you can trivially patch a parser written towards the RFC to match what browsers do. I have not attempted to do such a thing myself.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 5, 2013

Contributor

Seriously, could write a Big Bang Theory episode just around the logic in the WHATWG!

Contributor

deitch commented Feb 5, 2013

Seriously, could write a Big Bang Theory episode just around the logic in the WHATWG!

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 5, 2013

Contributor

In all seriousness, here is how I read it in scheme-state:

2.5: once I have my scheme (and it isn't "file"), go to 2 expected slashes, then from here until EOF or the next ending of authority character (slash, etc.) is the authority (including host, port, username, password, etc.)

2.4: once I have my scheme, it isn't "file", and I have a real base, and my scheme === base-scheme:

  • if I start with '//' then do like 2.5
  • if I do not start with '//' then I am in relative state, which means I use the scheme, authority, and even path of base, modified by my own path

Which appears to be what you said.

So if I have no authority in my ref, and ref-scheme === base-scheme, just use base-scheme + base-authority + ref-path.

You know, that WHATWG is a hell of a convoluted way to say something that simple! Modifying now...

Contributor

deitch commented Feb 5, 2013

In all seriousness, here is how I read it in scheme-state:

2.5: once I have my scheme (and it isn't "file"), go to 2 expected slashes, then from here until EOF or the next ending of authority character (slash, etc.) is the authority (including host, port, username, password, etc.)

2.4: once I have my scheme, it isn't "file", and I have a real base, and my scheme === base-scheme:

  • if I start with '//' then do like 2.5
  • if I do not start with '//' then I am in relative state, which means I use the scheme, authority, and even path of base, modified by my own path

Which appears to be what you said.

So if I have no authority in my ref, and ref-scheme === base-scheme, just use base-scheme + base-authority + ref-path.

You know, that WHATWG is a hell of a convoluted way to say something that simple! Modifying now...

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 5, 2013

Contributor

Added. Ball back in your court.

Contributor

deitch commented Feb 5, 2013

Added. Ball back in your court.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Feb 5, 2013

Well in your "so simple" you didn't explain error reporting and all the other number of possible conditions, so yeah, that is a simpler answer to your specific question :-)

annevk commented Feb 5, 2013

Well in your "so simple" you didn't explain error reporting and all the other number of possible conditions, so yeah, that is a simpler answer to your specific question :-)

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Feb 13, 2013

Member

I am still seeing urlmaster wanting three slashes in some https cases:

ERROR https:///q/r#xyz vs https://q/r#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base http://www.why.com/a/b/./c/d/.././e/../f
with ref https:/q/r#xyz

should resolve to https:///q/r#xyz instead of https://q/r#xyz

I'll try to be more responsive about testing after you fix it. We're closing in on this!!

Member

domenic commented Feb 13, 2013

I am still seeing urlmaster wanting three slashes in some https cases:

ERROR https:///q/r#xyz vs https://q/r#xyz

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base http://www.why.com/a/b/./c/d/.././e/../f
with ref https:/q/r#xyz

should resolve to https:///q/r#xyz instead of https://q/r#xyz

I'll try to be more responsive about testing after you fix it. We're closing in on this!!

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 13, 2013

Contributor

Yeah, that is probably the last piece. Working on it right now. Could use some input here, though.

According to the RFC: http:/q/r resolves to scheme = 'http', host = '', path = '/q/r'.
According to the WHATWG (and most browsers): http:/q/r resolves to scheme = 'http', host = 'q', path = '/r'

We agreed to follow WHATWG, which is fine, doing it now.

But FYI, nodejs require('url').parse('http:/q/r') resolves per the RFC, not the WHATWG. Does it matter?

Contributor

deitch commented Feb 13, 2013

Yeah, that is probably the last piece. Working on it right now. Could use some input here, though.

According to the RFC: http:/q/r resolves to scheme = 'http', host = '', path = '/q/r'.
According to the WHATWG (and most browsers): http:/q/r resolves to scheme = 'http', host = 'q', path = '/r'

We agreed to follow WHATWG, which is fine, doing it now.

But FYI, nodejs require('url').parse('http:/q/r') resolves per the RFC, not the WHATWG. Does it matter?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

Actually, those don't surprise me at all. According to the specs, the steps to resolve a locn,base,ref are essentially:

  1. resolve base relative to locn to get my real base (call it B2)
  2. resolve ref relative to B2 to get my final result

If you resolve http:?foo=bar#abc relative to file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js you get http:///?foo=bar#abc which is an illegitimate complete URL (it is OK as a ref, or even a base, if the locn has the same scheme http and has an authority, as @annevk pointed out with the WHATWG).

So it shows an illegit base and kicks it back.

The question is, how should we resolve it? In the real world, this can happen with a legit (as a base) URL base and a legit locn (always must be legit by definition) with a ref. So how would you resolve it?

Contributor

deitch commented Feb 19, 2013

Actually, those don't surprise me at all. According to the specs, the steps to resolve a locn,base,ref are essentially:

  1. resolve base relative to locn to get my real base (call it B2)
  2. resolve ref relative to B2 to get my final result

If you resolve http:?foo=bar#abc relative to file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js you get http:///?foo=bar#abc which is an illegitimate complete URL (it is OK as a ref, or even a base, if the locn has the same scheme http and has an authority, as @annevk pointed out with the WHATWG).

So it shows an illegit base and kicks it back.

The question is, how should we resolve it? In the real world, this can happen with a legit (as a base) URL base and a legit locn (always must be legit by definition) with a ref. So how would you resolve it?

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Feb 19, 2013

Member

Well, I feel like if there's an absolute ref, it should always use that, no matter what the base---right? So the second case seems obvious.

As for the first, not sure---time to fire up Chrome! It says... empty string! So null doesn't seem so bad, I guess.

Member

domenic commented Feb 19, 2013

Well, I feel like if there's an absolute ref, it should always use that, no matter what the base---right? So the second case seems obvious.

As for the first, not sure---time to fire up Chrome! It says... empty string! So null doesn't seem so bad, I guess.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

FWIW, I just checked it out in Safari, Chrome and FF.

Safari & FF: Just accept it as is and resolve the ref compared to the new base, even though the base is illegit.
Chrome: with the full ref URL, accept as is; with the relative, ignores it entirely and just goes to about:blank

Contributor

deitch commented Feb 19, 2013

FWIW, I just checked it out in Safari, Chrome and FF.

Safari & FF: Just accept it as is and resolve the ref compared to the new base, even though the base is illegit.
Chrome: with the full ref URL, accept as is; with the relative, ignores it entirely and just goes to about:blank

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

Ha, crossed wires! :-)

Contributor

deitch commented Feb 19, 2013

Ha, crossed wires! :-)

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

Seriously, when B2 is illegit, you can get some really weird stuff happening.

Contributor

deitch commented Feb 19, 2013

Seriously, when B2 is illegit, you can get some really weird stuff happening.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Feb 19, 2013

I don't really understand what the question is.

annevk commented Feb 19, 2013

I don't really understand what the question is.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

@annevk 2 questions:

Slash after Windows drive

When you have

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base /
with ref ?when=now#xyz
should resolve to file:///C:?when=now#xyz

Should it resolve to file:///C:/?when=now or file:///C:?when=now

In other words, should there always be a slash after the Windows drive letter?

Illegit Base

If you have

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base http:?foo=bar#abc
with ref /

Such that resolving base relative to locn gives a non-complete URL, and then you resolve ref against that, does it give null (like Chrome) or resolve it anyways, even though the resulting URL is non-usable (like FF & Safari)?

Contributor

deitch commented Feb 19, 2013

@annevk 2 questions:

Slash after Windows drive

When you have

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base /
with ref ?when=now#xyz
should resolve to file:///C:?when=now#xyz

Should it resolve to file:///C:/?when=now or file:///C:?when=now

In other words, should there always be a slash after the Windows drive letter?

Illegit Base

If you have

locn file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js
base http:?foo=bar#abc
with ref /

Such that resolving base relative to locn gives a non-complete URL, and then you resolve ref against that, does it give null (like Chrome) or resolve it anyways, even though the resulting URL is non-usable (like FF & Safari)?

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Feb 19, 2013

In "Illegit Base" base would become http:///?foo=bar#abc and ref would then be http:///.

I think for "Slash after Windows drive" the WHATWG standard currently calls for removal of the slash as the drive occurs after three slashes rather than two, but that might be in error.

annevk commented Feb 19, 2013

In "Illegit Base" base would become http:///?foo=bar#abc and ref would then be http:///.

I think for "Slash after Windows drive" the WHATWG standard currently calls for removal of the slash as the drive occurs after three slashes rather than two, but that might be in error.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Feb 19, 2013

(BTW, following the RFC except where WHATWG deviates seems like a weird strategy as the latter replaces the former.)

annevk commented Feb 19, 2013

(BTW, following the RFC except where WHATWG deviates seems like a weird strategy as the latter replaces the former.)

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

@annevk you are totally wrong. It doesn't seem like a weird strategy; it is a weird strategy!

Except there are two mitigating factors:

  1. The RFC is far easier to understand, while the WHATWG is not only dense, but very densely written.
  2. I had already implemented the RFC, so modifying to suit WHATWG where necessary (an incremental strategy) led to a saner and survivable approach; redoing from scratch would have taken forever (i.e. never done).

End result is it works, I can live with it.

Contributor

deitch commented Feb 19, 2013

@annevk you are totally wrong. It doesn't seem like a weird strategy; it is a weird strategy!

Except there are two mitigating factors:

  1. The RFC is far easier to understand, while the WHATWG is not only dense, but very densely written.
  2. I had already implemented the RFC, so modifying to suit WHATWG where necessary (an incremental strategy) led to a saner and survivable approach; redoing from scratch would have taken forever (i.e. never done).

End result is it works, I can live with it.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

@annevk yes, correct, on the "Illegit Base." Right now it just says, "that base is illegit, so why bother resolving, null response." That is roughly what Chrome does in some circumstances. @domenic is arguing, and I am starting to lean that way, that we should still resolve it completely (as much as possible), and if the resolved URL is not one we can go to, deal with it later.

Yeah, I just convinced myself. :-)

For "Windows Drive", I can go either way. "file:///C:" and "file:///C:/" mean the same thing. What should we do?

Contributor

deitch commented Feb 19, 2013

@annevk yes, correct, on the "Illegit Base." Right now it just says, "that base is illegit, so why bother resolving, null response." That is roughly what Chrome does in some circumstances. @domenic is arguing, and I am starting to lean that way, that we should still resolve it completely (as much as possible), and if the resolved URL is not one we can go to, deal with it later.

Yeah, I just convinced myself. :-)

For "Windows Drive", I can go either way. "file:///C:" and "file:///C:/" mean the same thing. What should we do?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

Pushed out to handle invalid bases. Try now.

Contributor

deitch commented Feb 19, 2013

Pushed out to handle invalid bases. Try now.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Feb 19, 2013

Member

Let's do file:///C:/ since it matches browsers.

Also... I think we're finally done!!! :O

Time to work on fixing resourceLoader.baseUrl and resourceLoader.resolve, I think!

Member

domenic commented Feb 19, 2013

Let's do file:///C:/ since it matches browsers.

Also... I think we're finally done!!! :O

Time to work on fixing resourceLoader.baseUrl and resourceLoader.resolve, I think!

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

I can go for that.

Contributor

deitch commented Feb 19, 2013

I can go for that.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Feb 19, 2013

Contributor

Trailing slash fixed (as well as tests in urlmaster for it, of course).

Contributor

deitch commented Feb 19, 2013

Trailing slash fixed (as well as tests in urlmaster for it, of course).

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Mar 2, 2013

Member

So, tests look good. You want to take a shot at the implementation that makes them pass, or shall I?

Member

domenic commented Mar 2, 2013

So, tests look good. You want to take a shot at the implementation that makes them pass, or shall I?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch Mar 9, 2013

Contributor

My travel schedule is nothing short of insane, done 18,000 miles in 4.5 days this past week, and it keeps going. I would love to, but don't rely on me until sometime in April at earliest. I am sorry.

Contributor

deitch commented Mar 9, 2013

My travel schedule is nothing short of insane, done 18,000 miles in 4.5 days this past week, and it keeps going. I would love to, but don't rely on me until sometime in April at earliest. I am sorry.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic May 7, 2013

Member

@deitch poke :).

Member

domenic commented May 7, 2013

@deitch poke :).

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 7, 2013

Contributor

I guess neither of us has gotten to it yet, eh?

I'll see if I can get to it.

Contributor

deitch commented May 7, 2013

I guess neither of us has gotten to it yet, eh?

I'll see if I can get to it.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 9, 2013

Contributor

Since it has been a while, can you review before I attack this?

As I understand it, we now all agree that the tests are valid and should pass, we just need to change the source jsdom code to allow the tests to pass, correct? Please confirm.

Since it has been a while, and there have been several commits, should I do a new fork and pull, and re-add the same tests, so the branch is pretty close to current master? Does it matter?

Contributor

deitch commented May 9, 2013

Since it has been a while, can you review before I attack this?

As I understand it, we now all agree that the tests are valid and should pass, we just need to change the source jsdom code to allow the tests to pass, correct? Please confirm.

Since it has been a while, and there have been several commits, should I do a new fork and pull, and re-add the same tests, so the branch is pretty close to current master? Does it matter?

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic May 9, 2013

Member

Confirmed!

You can either do that, or rebase (maybe interactive rebase to do some squashing) and force-push to this branch. Either way works for me.

Awesome, thanks for taking this on!!

On May 9, 2013, at 3:40, "Avi Deitcher" <notifications@github.commailto:notifications@github.com> wrote:

Since it has been a while, can you review before I attack this?

As I understand it, we now all agree that the tests are valid and should pass, we just need to change the source jsdom code to allow the tests to pass, correct? Please confirm.

Since it has been a while, and there have been several commits, should I do a new fork and pull, and re-add the same tests, so the branch is pretty close to current master? Does it matter?


Reply to this email directly or view it on GitHubhttps://github.com/tmpvar/jsdom/pull/550#issuecomment-17653148.

Member

domenic commented May 9, 2013

Confirmed!

You can either do that, or rebase (maybe interactive rebase to do some squashing) and force-push to this branch. Either way works for me.

Awesome, thanks for taking this on!!

On May 9, 2013, at 3:40, "Avi Deitcher" <notifications@github.commailto:notifications@github.com> wrote:

Since it has been a while, can you review before I attack this?

As I understand it, we now all agree that the tests are valid and should pass, we just need to change the source jsdom code to allow the tests to pass, correct? Please confirm.

Since it has been a while, and there have been several commits, should I do a new fork and pull, and re-add the same tests, so the branch is pretty close to current master? Does it matter?


Reply to this email directly or view it on GitHubhttps://github.com/tmpvar/jsdom/pull/550#issuecomment-17653148.

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 9, 2013

Contributor

OK, so I know that it was just as much of a pain to merge in CVS and any other old system when you branched a while back, and then had to merge in, but not matter how many times I do it, I can never get it right and always get confused.

Seems to me the easiest process here is:

  1. git checkout upstream
  2. git fetch upstream (which points to tmpvar/jsdom)
  3. rerun tests to have a clean baseline
  4. git checkout url-testing
  5. git merge upstream
  6. rerun test to make sure it matches the baseline plus just the expected failures
  7. make the repairs

Does that make sense?

Contributor

deitch commented May 9, 2013

OK, so I know that it was just as much of a pain to merge in CVS and any other old system when you branched a while back, and then had to merge in, but not matter how many times I do it, I can never get it right and always get confused.

Seems to me the easiest process here is:

  1. git checkout upstream
  2. git fetch upstream (which points to tmpvar/jsdom)
  3. rerun tests to have a clean baseline
  4. git checkout url-testing
  5. git merge upstream
  6. rerun test to make sure it matches the baseline plus just the expected failures
  7. make the repairs

Does that make sense?

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 9, 2013

Contributor

Yes, hats off, 0.6.1 works like a charm, all tests pass.

Contributor

deitch commented May 9, 2013

Yes, hats off, 0.6.1 works like a charm, all tests pass.

deitch added some commits May 9, 2013

URL parsing properly handles all urlmaster test cases.
1) add changes recommended by @domenic at #550 (comment) to properly handle all URL resolution
2) Make sure to URL.resolve() even absolute href URLs, so that /./ and /../ get resolved
url_testing should standardize resultant paths for absolute URL testi…
…ng via um.addPathEmpty()

Different platforms resolve absolute root paths, e.g. http://www.example.com - some add trailing /, some do not. Either way, we add to empty path in test results so we can test that it is correct.
@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 9, 2013

Contributor

It was mostly your fixes above, with some tweaks, plus some test changes to get it to work. But it now includes all potential URLs... and all tests pass 100%.

Pull! Pull! Pull! :-)

Contributor

deitch commented May 9, 2013

It was mostly your fixes above, with some tweaks, plus some test changes to get it to work. But it now includes all potential URLs... and all tests pass 100%.

Pull! Pull! Pull! :-)

@tmpvar

This comment has been minimized.

Show comment
Hide comment
@tmpvar

tmpvar May 9, 2013

Collaborator

+1

Collaborator

tmpvar commented May 9, 2013

+1

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic May 10, 2013

Member

Still getting lots of test failures having to do with Windows drive letters :(. E.g.

ERROR file:///C:/ vs file:///

locn 'file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js' base 'a/b' with ref '/' should resolve to 'file:///C:/' (file:///C:/) instead of 'file:///' (file:///)

I pushed a cleaner version of this branch up at https://github.com/tmpvar/jsdom/compare/url-resolution. In particular I squashed all the test commits down to one, and converted tabs to spaces.

Let me know if you think you can crack the Windows drive letters; otherwise I'll try to give it a shot this weekend (since I am, after all, the one with the Windows machine).

Member

domenic commented May 10, 2013

Still getting lots of test failures having to do with Windows drive letters :(. E.g.

ERROR file:///C:/ vs file:///

locn 'file:///C:/Users/Domenic/Dropbox/Programming/GitHub/jsdom/test/jsdom/index.js' base 'a/b' with ref '/' should resolve to 'file:///C:/' (file:///C:/) instead of 'file:///' (file:///)

I pushed a cleaner version of this branch up at https://github.com/tmpvar/jsdom/compare/url-resolution. In particular I squashed all the test commits down to one, and converted tabs to spaces.

Let me know if you think you can crack the Windows drive letters; otherwise I'll try to give it a shot this weekend (since I am, after all, the one with the Windows machine).

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 10, 2013

Contributor

I only have one Windows box available, my son's, on which I have never installed node.

But I can crack this without Windows:

  1. Please run the tests and post the entire output of errors (all those locn lines) here or elsewhere I can download it (.txt file is best).
  2. I can fix on that basis (I already extracted the jsdom resolution code into a separate module, so I can run them all).
  3. I will fix, repush, and you can run the tests again on your Windows box to validate.

Plan?

Contributor

deitch commented May 10, 2013

I only have one Windows box available, my son's, on which I have never installed node.

But I can crack this without Windows:

  1. Please run the tests and post the entire output of errors (all those locn lines) here or elsewhere I can download it (.txt file is best).
  2. I can fix on that basis (I already extracted the jsdom resolution code into a separate module, so I can run them all).
  3. I will fix, repush, and you can run the tests again on your Windows box to validate.

Plan?

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic May 11, 2013

Member

I got it :).

Merged as 1be2448 and f14c41b for your changes, plus 278543b to get it working on Windows.

THANK YOU SO MUCH <3 for your extensive work on this saga, starting from that tiny little bug #531 six months ago. You deserve a medal, sir!!

I've filed joyent/node#5452 and joyent/node#5453 in Node to complain about why we can't just use URL.resolve directly, BTW.

Member

domenic commented May 11, 2013

I got it :).

Merged as 1be2448 and f14c41b for your changes, plus 278543b to get it working on Windows.

THANK YOU SO MUCH <3 for your extensive work on this saga, starting from that tiny little bug #531 six months ago. You deserve a medal, sir!!

I've filed joyent/node#5452 and joyent/node#5453 in Node to complain about why we can't just use URL.resolve directly, BTW.

@domenic domenic closed this May 11, 2013

@deitch

This comment has been minimized.

Show comment
Hide comment
@deitch

deitch May 12, 2013

Contributor

Awesome! Is it really six months?

It is good to work with @domenic and @tmpvar.

Medal? Maybe we can come up with the "jsdom award of excellent!" LOL!

Contributor

deitch commented May 12, 2013

Awesome! Is it really six months?

It is good to work with @domenic and @tmpvar.

Medal? Maybe we can come up with the "jsdom award of excellent!" LOL!

@deitch deitch deleted the deitch:url-testing branch May 12, 2013

@domenic domenic referenced this pull request in nodejs/node-v0.x-archive May 12, 2013

Open

url: resolve strips drive letters from Windows file URLs #5452

papandreou pushed a commit to papandreou/jsdom that referenced this pull request Apr 2, 2014

URL parsing properly handles all urlmaster test cases.
1) add changes recommended by @domenic at jsdom#550 (comment) to properly handle all URL resolution
2) Make sure to URL.resolve() even absolute href URLs, so that /./ and /../ get resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment