Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional unhandled 'error' event when http.request with .lookup #48771

Open
loynoir opened this issue Jul 14, 2023 · 16 comments · May be fixed by #51038
Open

Conditional unhandled 'error' event when http.request with .lookup #48771

loynoir opened this issue Jul 14, 2023 · 16 comments · May be fixed by #51038
Labels
confirmed-bug Issues with confirmed bugs. good first issue Issues that are suitable for first-time contributors. http Issues or PRs related to the http subsystem. net Issues and PRs related to the net subsystem.

Comments

@loynoir
Copy link

loynoir commented Jul 14, 2023

Version

20.4.0

Platform

Docker ArchLinux 6.1.35-1-lts

Subsystem

No response

What steps will reproduce the bug?

https://github.com/loynoir/reproduce-node-48771

How often does it reproduce? Is there a required condition?

  • ip blackhole like behavior

What is the expected behavior? Why is that the expected behavior?

Make it possible to catch error, and exit with 0.

What do you see instead?

Error is not caught, and exit with 1.

Additional information

No response

@bnoordhuis
Copy link
Member

I'm not able to reproduce. Let's try to reduce the test case. What happens when you run this?

const lookup = (_0, _1, cb) => cb(null, [{address:"192.168.144.2", family:4}])
const s = require("net").connect({host:"example.com", port:80, lookup})
s.on("error", console.log) // reached?

@loynoir
Copy link
Author

loynoir commented Jul 15, 2023

// @ts-nocheck
const lookup = (_0, _1, cb) => cb(null, [{ address: "192.168.144.2", family: 4 }])
const s = require("net").connect({ host: "example.com", port: 80, lookup })
s.on("error", (err) => {
  console.log({ err })
})
$ sudo ip route del blackhole 192.168.144.2
$ curl -s 192.168.144.2 >/dev/null && echo OK
OK
$ node ./src/extra/reproduce.cjs && echo OK
^C
$ sudo ip route add blackhole 192.168.144.2
$ node ./src/extra/reproduce.cjs && echo OK
{
  err: Error: connect EINVAL 192.168.144.2:80 - Local (0.0.0.0:0)
      at internalConnect (node:net:1087:16)
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
      at emitLookup (node:net:1478:9)
      at lookup (/workspaces/loynoir/repo/reproduce-node-48771/src/extra/reproduce.cjs:2:32)
      at emitLookup (node:net:1402:5)
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
      at lookupAndConnectMultiple (node:net:1401:3)
      at node:net:1347:7
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
      at lookupAndConnect (node:net:1346:5) {
    errno: -22,
    code: 'EINVAL',
    syscall: 'connect',
    address: '192.168.144.2',
    port: 80
  }
}
OK

@bnoordhuis
Copy link
Member

Okay, so that works for you as well. Please try expanding the test until you have a minimal reproducer.

@loynoir
Copy link
Author

loynoir commented Jul 15, 2023

@bnoordhuis

https://github.com/loynoir/reproduce-node-48771

I think this is the minimal reproduce.

I guess, some node code is not wrap with process.nextTick.

So, if user not use

nextTick(() => {
  ...
  callback(...

In some situation, there is unhandled 'error' event within http.request, and leads to

@ShogunPanda
Copy link
Contributor

FYI I tested if my network family autoselection was involved into this. Apparently is not: if you change lines 19 and 23 in `` in your repro repo to this:

if (_options.all) {
  callback(null, [{ address: opt.ip, family: 4 }]);
} else {
  callback(null, opt.ip, 4);
}

to accomodate for both single and multiple DNS lookup, then the problem will happen even if you use --no-network-family-autoselection.

In macOS in order to reproduce the problem you can route that IP to something unreachable. For instance:

sudo route add 192.168.144.2 10.3.0.1

(10.3.0.1 is unreachable from my system, you might have to use another IP).

With the unreachable IP setup, I tried the narrow down example in #48771 (comment) and it worked, so it seems like the problem is not on the net module but rather in http which is not setting the error listener fast enough.

@mcollina Any thoughts on this?

@mcollina
Copy link
Member

It looks like a bug, unfortunately, I don't have time to dig deep on how to fix it. I've never seen this problem happen in practice, but I guess it can happen if the kernel is really fast in responding.

A quick code review spotted the problem: when a socket is assigned to a ClientRequest, we defer to the next tick setting an error handler:

node/lib/_http_client.js

Lines 860 to 900 in a2fc4a3

ClientRequest.prototype.onSocket = function onSocket(socket, err) {
// TODO(ronag): Between here and onSocketNT the socket
// has no 'error' handler.
process.nextTick(onSocketNT, this, socket, err);
};
function onSocketNT(req, socket, err) {
if (req.destroyed || err) {
req.destroyed = true;
function _destroy(req, err) {
if (!req.aborted && !err) {
err = connResetException('socket hang up');
}
if (err) {
req.emit('error', err);
}
req._closed = true;
req.emit('close');
}
if (socket) {
if (!err && req.agent && !socket.destroyed) {
socket.emit('free');
} else {
finished(socket.destroy(err || req[kError]), (er) => {
if (er?.code === 'ERR_STREAM_PREMATURE_CLOSE') {
er = null;
}
_destroy(req, er || err);
});
return;
}
}
_destroy(req, err || req[kError]);
} else {
tickOnSocket(req, socket);
req._flush();
}
}

The 'error' handler is set in tickOnSocket:

socket.on('error', socketErrorListener);

Deferring by a nextTick is fine for every I/O but not a synchronous DNS error:

node/lib/net.js

Lines 1414 to 1418 in a2fc4a3

// net.createConnection() creates a net.Socket object and immediately
// calls net.Socket.connect() on it (that's us). There are no event
// listeners registered yet so defer the error event to the next tick.
process.nextTick(connectErrorNT, self, err);
return;
.

Can that happen?


As a side note, I'd recommend you to use undici as it should better handle this case (and be easier to fix).

@loynoir
Copy link
Author

loynoir commented Jul 19, 2023

@mcollina

I've never seen this problem happen in practice

Can that happen?

When I use .lookup without nextTick, I did see node.js http.request throw un-catch-able error.

Although I cannot reproduce same error, I found ip-blackhole may also let http.request throw un-catch-able error.

@ShogunPanda
Copy link
Contributor

@mcollina I think the only way to fix this is to defer the error emitting (self.destroy(ex)) in internalConnect and internalConnectMultiple via nextTick. WDYT?

@loynoir
Copy link
Author

loynoir commented Aug 2, 2023

As a side note, I'd recommend you to use undici as it should better handle this case (and be easier to fix).

Verified undici don't have this bug.

Error is caught within undici.

$ sudo ip route add blackhole 192.168.144.4
$ cat reproduce.mjs 
import { Agent } from 'undici'

try {
    await fetch('http://example.com', {
        dispatcher: new Agent({
            connect: {
                lookup: (hostname, options, callback) => {
                    // node <20
                    // callback(null, '192.168.144.4', 4)
                    // node 20
                    callback(null, [{ address: '192.168.144.4', family: 4 }])
                }
            }
        })
    })
} catch (caught) {
    console.warn({ caught })
}
$ node reproduce.mjs
{
  caught: TypeError: fetch failed
      at Object.fetch (node:internal/deps/undici/undici:11576:11)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async file:///tmp/tmp.ogEc9I0rez/reproduce.mjs:4:5 {
    cause: Error: connect EINVAL 192.168.144.4:80 - Local (0.0.0.0:0)
        at internalConnect (node:net:1087:16)
        at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
        at emitLookup (node:net:1478:9)
        at lookup (file:///tmp/tmp.ogEc9I0rez/reproduce.mjs:11:21)
        at emitLookup (node:net:1402:5)
        at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
        at lookupAndConnectMultiple (node:net:1401:3)
        at node:net:1347:7
        at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
        at lookupAndConnect (node:net:1346:5) {
      errno: -22,
      code: 'EINVAL',
      syscall: 'connect',
      address: '192.168.144.4',
      port: 80
    }
  }
}

@mcollina
Copy link
Member

mcollina commented Aug 2, 2023

@mcollina I think the only way to fix this is to defer the error emitting (self.destroy(ex)) in internalConnect and internalConnectMultiple via nextTick. WDYT?

@ShogunPanda yes

@mcollina mcollina added good first issue Issues that are suitable for first-time contributors. http Issues or PRs related to the http subsystem. net Issues and PRs related to the net subsystem. confirmed-bug Issues with confirmed bugs. labels Aug 2, 2023
@mertcanaltin
Copy link
Member

@mcollina hello

To fix the issue, I made a modification to the onSocket function. I checked for the presence of an error (err) and if it exists, I immediately emitted the 'error' event using this.emit('error', err) and then called this.destroy() to terminate the request. This way, the error is handled synchronously.

Modified Code:

ClientRequest.prototype.onSocket = function onSocket(socket, err) {
 if (err) {
   this.emit('error', err);
   this.destroy();
   return;
 }

 process.nextTick(onSocketNT, this, socket);
};

Open Questions:

Is the proposed modification a correct and appropriate solution to this issue?
Is there any potential downside or side effect to this modification that I should be aware of?
Looking forward to your feedback on the proposed solution. If this approach is deemed appropriate, I can submit a pull request with the suggested changes.

Thanks!

@ShogunPanda
Copy link
Contributor

@mertcanaltin Thanks for your proposal.
In general it would look fine, except that it just fix the issue in HTTP client. I would have a more generic solution which will handle the error in async way so that all libraries will be impacted.

@mertcanaltin
Copy link
Member

Thanks for your proposal.
In general it would look fine, except that it just fix the issue in HTTP client. I would have a more generic solution which will handle the error in async way so that all libraries will be impacted.

I wonder if I should open a pull request with this update @ShogunPanda

@ShogunPanda
Copy link
Contributor

I think so. Otherwise I already plan to include this issue in an future OSS onboarding event.

@Prateek462003
Copy link

is this issue still open i would like to take a look at it

timothyjrogers added a commit to timothyjrogers/node that referenced this issue Dec 4, 2023
… tick

self.destroy calls in the internalConnect adn internalConnectMultiple
functions have a narrow case where they can throw before an error
handler has been established. This change defers them to the next
tick to allow time for the error handler to be set.

Fixes: nodejs#48771
@timothyjrogers
Copy link

I've opened a PR for this issue here: #51038. It adds the deferral on emitting in internalConnect and internalConnectMultiple described in @ShogunPanda 's comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. good first issue Issues that are suitable for first-time contributors. http Issues or PRs related to the http subsystem. net Issues and PRs related to the net subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants