Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented behavior for handling options of http.request #47624

Closed
ZEDCWT opened this issue Apr 19, 2023 · 10 comments · Fixed by #47886
Closed

Undocumented behavior for handling options of http.request #47624

ZEDCWT opened this issue Apr 19, 2023 · 10 comments · Fixed by #47886
Assignees
Labels
http Issues or PRs related to the http subsystem. url Issues and PRs related to the legacy built-in url module.

Comments

@ZEDCWT
Copy link

ZEDCWT commented Apr 19, 2023

Version

v20.0.0

Platform

Microsoft Windows NT 10.0.19045.0 x64

Subsystem

No response

What steps will reproduce the bug?

var Opt = require('url').parse('https://httpbin.org/get');
require('https')
	.request({...Opt,path : Opt.path + '?A=B'},S => S.on('data',D => console.log(D.toString('UTF8'))))
	.end()

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

Running using v19.9.0, it prints

{
  "args": {
    "A": "B"
  },
  "headers": {
    "Host": "httpbin.org",
    ....
  },
  ....
  "url": "https://httpbin.org/get?A=B"
}

What do you see instead?

{
  "args": {},
  "headers": {
    "Host": "httpbin.org",
    ...
  },
  ...
  "url": "https://httpbin.org/get"
}

Additional information

Before #47339, url.isURL checks if property href & origin exists, and it changed to check if property href & protocol exist now.
So before that, the Opt above goes to the else branch of ClientRequest, but now it goes to the else if (isURL(input)) branch, in which it ignores the path property given and just glues pathname & search together.
Reading the document, it says url can be a string or a URL object also never mentions anything about search or pathname.
since I'm not providing a WHATWG URL object, I'm expecting to call this signature http.request(options[, callback]) retaining my path property as what v19 and before do.

@VoltrexKeyva VoltrexKeyva added the http Issues or PRs related to the http subsystem. label Apr 20, 2023
@Bernice55231
Copy link

Bernice55231 commented Apr 24, 2023

I think this problem can be solved by adding validations in urlToHttpOptions(url) function, which is called from the else if (isURL(input)) branch as @ZEDCWT mentioned. Since we deliberately added the query string in path, (like Opt.path + "?A=B"), but we didn't modify the search and query fields in this object. Therefore, although this object is an URL object, the path, search and query fields are inconsistent, so the urlToHttpOptions(url) function cannot recognize the added search parameters.

If it is ok, can I take this issue, thanks!

@travispaul
Copy link

I think we've also encountered this issue.

It seems that in v20, if the URL’s path property is set after parsing the URL, then used for an HTTP request, the path is absent from the request.

I've created this reproducer:

const https = require('https');
const url = require('url');

// Works in v18 fails in v20:
const opts = url.parse('https://postman-echo.com');
opts.path = '/get';

// Works in both versions:
// const opts = url.parse('https://postman-echo.com/get');

// Curiously, deleting either of these properties causes it to work in both versions:
// delete opts.href;
// delete opts.protocol;

console.log('node version:', process.version);
console.log('parsed url:', opts);

https.request(opts, response => {
    console.log('status:', response.statusCode);
    response.on('data', data => process.stdout.write(data));
}).end();
v20.1.0 output:
node version: v20.1.0
parsed url: Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'postman-echo.com',
  port: null,
  hostname: 'postman-echo.com',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/get',
  href: 'https://postman-echo.com/'
}
status: 302
Found. Redirecting to https://docs.postman-echo.com
v18.16.0 output (expected)
node version: v18.16.0
parsed url: Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'postman-echo.com',
  port: null,
  hostname: 'postman-echo.com',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/get',
  href: 'https://postman-echo.com/'
}
status: 200
{
  "args": {},
  "headers": {
    "x-forwarded-proto": "https",
    "x-forwarded-port": "443",
    "host": "postman-echo.com",
    "x-amzn-trace-id": "Root=1-64532010-42007c685001b2c957e08cbc"
  },
  "url": "https://postman-echo.com/get"
}

@benjamingr
Copy link
Member

benjamingr commented May 4, 2023

@nodejs/http probably also @nodejs/url

@mcollina
Copy link
Member

mcollina commented May 4, 2023

cc @Trott

@Trott
Copy link
Member

Trott commented May 4, 2023

FWIW, setting pathname instead of path seems to work as expected (as implied above, but just to make that clear for anyone looking for a quick fix).

@Trott
Copy link
Member

Trott commented May 4, 2023

Is e3bb668 the likely commit that introduced this change? @anonrig

@anonrig
Copy link
Member

anonrig commented May 4, 2023

Is e3bb668 the likely commit that introduced this change? @anonrig

I don't think it is likely that the isURL led to this. Specifically, we selected href and protocol because those two attributes are the easiest & fastest to calculate to use as an identifiable. This was done because of the lazy-loading feature of URL parser.

FWIW, setting pathname instead of path seems to work as expected

Unfortunately, the old url class uses path attribute whereas the WHATWG API uses bot path and pathname for defining the pathname of the URL.

I think the breaking change is related to how urlToHttpOptions is changed. Previously we were not passing ...url referencing at: ad2c3c0#diff-26e6b1e2a90040b1ce20e7f305b2ff3140b483f863ee2573b4c6af9b696ad31eR1111 @aduh95

@anonrig anonrig added the url Issues and PRs related to the legacy built-in url module. label May 4, 2023
@Trott
Copy link
Member

Trott commented May 5, 2023

Still bisecting, but we're getting near the very end of the bisect and it's looking like the commit that changed this behavior is almost certainly one of the three commits in #47339. So, this is likely to come back your way, @anonrig!

@anonrig
Copy link
Member

anonrig commented May 5, 2023

Thanks for the bisect @Trott. I'll take a look at it.

@anonrig anonrig self-assigned this May 5, 2023
@Trott
Copy link
Member

Trott commented May 5, 2023

Bisecting confirms the problem was introduced in either 26a967f or 3867641. I can't be sure which one because 26a967f won't successfully compile without the changes in 3867641 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants