Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

console.log failure while working with worker threads. #30491

Open
Mitschiro opened this issue Nov 15, 2019 · 29 comments
Open

console.log failure while working with worker threads. #30491

Mitschiro opened this issue Nov 15, 2019 · 29 comments
Labels
stdio Issues and PRs related to stdio. worker Issues and PRs related to Worker support.

Comments

@Mitschiro
Copy link

  • Version: 12.4.0
  • Platform: Docker which runs on mac (Darwin Kernel Version 18.7.0, xnu-4903.278.12~1/RELEASE_X86_64 x86_64)
  • Subsystem:

Hello everyone.

Thank you for your hard work on NodeJS.

Now, i developed a small service, which takes 4 csv files, parses them, maps them together and imports them into elasticsearch.

Each file is being parsed on a different thread.

The parsed content of one of the files is being send via an Event to a different file, this file spawns for each set of data, a new thread that will import that set into ES.

In parallel on the main thread, i send the content of one of the files in chunks via an Event with the contents of the remaining 2 files, to a different script again.

Which will spawn a new thread for that chunk of data. That thread will map the given data to the chunk provided, if they match. Send the mapped data back to the main thread, which again will spawn a new thread who will import the mapped data into ES.

The issue i have here is, that once everything is working at the same time, the only console.logs i get are the ones from the main thread. Everything that is being logged on a worker thread, is being lost somewhere, while the main thread is under load.

Note: The actual code is being processed as it should, it is just the console.logs who do not care.

This makes debugging on worker threads really difficult. Maybe i am missing something.

@GrosSacASac
Copy link
Contributor

As a temporary solution, send logs to the main and log them there.

@Mitschiro Mitschiro changed the title console.log failure during while working with worker threads. console.log failure while working with worker threads. Nov 15, 2019
@Mitschiro
Copy link
Author

As a temporary solution, send logs to the main and log them there.

Thanks for answering.
Yes that's what i intended to do for the QA version.

@addaleax addaleax added console Issues and PRs related to the console subsystem. worker Issues and PRs related to Worker support. labels Nov 18, 2019
@addaleax
Copy link
Member

As a temporary solution, send logs to the main and log them there.

Fwiw, this is how console.log(), and more generally process.stdout and process.stderr, in Workers currently work – they send their data for the main thread and wait for that to process it.

The issue happens because when the main thread is busy, then it also won’t process such data coming from the Workers.

@Mitschiro
Copy link
Author

As a temporary solution, send logs to the main and log them there.

Fwiw, this is how console.log(), and more generally process.stdout and process.stderr, in Workers currently work – they send their data for the main thread and wait for that to process it.

The issue happens because when the main thread is busy, then it also won’t process such data coming from the Workers.

I can confirm that. Sending the logs to the main thread does nothing while the main thread is busy.
The only thing that worked for me, was off loading more work from the main thread to the worker threads / event based scripts and even that is tricky to do.

@tniessen
Copy link
Member

The only thing that worked for me, was off loading more work from the main thread to the worker threads / event based scripts

That sounds like a good idea. I usually try not to block the main thread at all in those scenarios, to make sure it remains responsive.

In theory, we could allow threads to do I/O directly, but it will be tricky and might lead to unwanted side effects.

@devsnek
Copy link
Member

devsnek commented Nov 20, 2019

how about new Worker(x, { synchronizedStdio: false })?

@addaleax
Copy link
Member

@devsnek What would that do and how would it work?

@devsnek
Copy link
Member

devsnek commented Nov 20, 2019

@addaleax instead of sending internal messages to the main thread, it would just directly write to stdout/err. messages might be garbled on occasion, but sometimes that isn't a problem.

@addaleax
Copy link
Member

@devsnek Libuv won’t let you create multiple handles for the same fd, plus writing synchronously would be problematic for the same reasons for which it is on the main thread (and fs.writeSync() for writing data fails on Windows for non-ASCII characters).

@devsnek
Copy link
Member

devsnek commented Nov 20, 2019

hmm, aren't they separate event loops?

@addaleax
Copy link
Member

They are separete event loops – see #30507 for example crashes when sharing fds between threads.

@devsnek
Copy link
Member

devsnek commented Nov 20, 2019

but it isn't inherently impossible... you just have to make sure you aren't renumbering, closing, etc.

@addaleax
Copy link
Member

@devsnek You have to close the handle when the thread stops 🙃 But feel free to work on something if you have ideas for how to approach this, or more generally the issues in #30507.

@Mitschiro
Copy link
Author

I have a question, i don't know a lot when it comes to the inner workings of nodejs.

But couldn't a console.log from a worker thread be pushed into the event loop of the main thread?

That way it should be processed at some point since its in the Q, even if it's late or at the end, still better than not getting them at all.

@tniessen
Copy link
Member

instead of sending internal messages to the main thread, it would just directly write to stdout/err. messages might be garbled on occasion, but sometimes that isn't a problem.

If we really want to do that, it might make more sense to have some mutex/locking mechanism to prevent concurrent writes. That would ensure that ouput from console.log is never "garbled".

But couldn't a console.log from a worker thread be pushed into the event loop of the main thread?

I think that is the current behavior, isn't it?

@devsnek
Copy link
Member

devsnek commented Nov 20, 2019

mutex is an interesting idea. I guess we'd have to specialise it to our stdio handles?

@Mitschiro
Copy link
Author

But couldn't a console.log from a worker thread be pushed into the event loop of the main thread?

I think that is the current behavior, isn't it?

I don't know, that's why i'm asking. But if it is the current behavior, than there is an issue no? If it gets pushed into the event loop, than it has to be processed at some point. It can't be freed from the event loop without execution else this would mean the event loop is totally unreliable.

@tniessen
Copy link
Member

mutex is an interesting idea. I guess we'd have to specialise it to our stdio handles?

Yes, but that is assuming that we would prefer allowing worker threads to use I/O directly instead of the current behavior. I hope that @addaleax can shine some light on this.

If it gets pushed into the event loop, than it has to be processed at some point

It depends on what the main thread is doing, if it is busy running synchronous JavaScript code, then it will never have a chance to process messages. If you have some code for us to reproduce the problem, that would be helpful.

@addaleax
Copy link
Member

@tniessen I’m not entirely sure how the mutex approach would work… it seems like that’s something that might still require moving stdio to a separate thread if we really want it to always be available, and then post messages to that thread (which can be just a C++ thread, not a Worker)?

@tniessen
Copy link
Member

If it's just about writing to stdout/stderr we could just write to the file descriptors "directly" (unless libuv prevents that?) and have one mutex per file descriptor, which prevents two threads from writing at the same time (mostly to prevent console.log outputs to interfere with each other).

On the other hand, I think that the current behavior is reasonable, and it is a fair restriction to only write from the main thread. I guess worker threads shouldn't really use console.log in production anyway.

@deregtd
Copy link

deregtd commented Nov 24, 2019

FWIW, I don't believe this is actually the behavior being used, at least not as of 12.9.0 (on MacOS). I just ran a test app that ran an interval on the main thread to output a console.log, and then put console.log lines on the onmessage handler inside the worker thread from the parentport, and on the onmessge handler in the main thread from the worker thread, and then some console.logs inside the worker thread as it does some work, and reports progress out via the port along the way.

The interval regularly reported (expected -- basically nothing's going on on the main thread here).
Anything that was reported over the port was console.logged immediately on the main thread as well. However, both the request from the worker thread and all progress called in console.log on the worker thread wasn't shown in the console until the worker thread entirely finished all work and relinquished control.

If it was just a matter of keeping the main thread unblocked, I'd totally understand that limitation in behavior, but that doesn't appear to be the case at all here -- you need both the worker thread message pump to be free before it can send the console messages back to the main thread, who then ALSO needs its message pump to be free before it can process them. So, I'm going to end up having to basically trap console.log on my worker thread and pipe it back over the message port to the parent thread if I want real-time console reporting of progress (while debugging, etc.)

@addaleax
Copy link
Member

@deregtd I can’t reproduce that behaviour based on your description – would you mind sharing the code you used?

@deregtd
Copy link

deregtd commented Mar 21, 2020

Sorry it took so long to get around to this repro. Life’s been busy…

consoleissue.js:

const path = require('path');
const wt = require('worker_threads');

const worker = new wt.Worker(path.resolve(path.join(__dirname, 'consoleissue-worker.js')));
worker.on('message', msg => {
    console.log('worker response: ' + JSON.stringify(msg));
});

console.log('posting');
worker.postMessage({ dostuff: true });
console.log('posted');

consoleissue-worker.js

const wt = require('worker_threads');

wt.parentPort.on('message', (msg) => {
    console.log('starting');
    let total = 0;
    for (let i=0; i<50000000; i++) {
        total += Math.sin(i);
        if ((i % 5000000) === 0) {
            console.log(i);
            wt.parentPort.postMessage({ msg: i.toString() });
        }
    }
    console.log('finished - ' + total);
    wt.parentPort.postMessage({ done: true });
});

Run node consoleissue.js and you shall see:

posting
posted
starting
worker response: {"msg":"0"}
worker response: {"msg":"5000000"}
worker response: {"msg":"10000000"}
worker response: {"msg":"15000000"}
worker response: {"msg":"20000000"}
worker response: {"msg":"25000000"}
worker response: {"msg":"30000000"}
worker response: {"msg":"35000000"}
worker response: {"msg":"40000000"}
worker response: {"msg":"45000000"}
worker response: {"done":true}
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
finished - -0.013948591772423392

If the console log worked as theoretically stated in here, the worker responses should be right alongside the normal console.logs of the numbers, but instead the console.logs from the worker thread all show up at the same time when the worker thread returns control to the event loop.

@shadowgate15
Copy link

Any calls that send to stdout/stderr, also will not be sent if the process terminates before the main thread becomes available. Any plans for this or is the only option to send all messages via parentPort?

@BridgeAR BridgeAR removed the console Issues and PRs related to the console subsystem. label Nov 29, 2021
lberki pushed a commit to lberki/myenk that referenced this issue Jun 20, 2022
And a copious amount of yak shaving:
* I had to reimplement console logging because it's apparently a
  known bug that it does not work that well on worker threads
  (nodejs/node#30491)
* I had to add a GYP config file because reimplementing logging is
  apparently impossible without adding native code
* I had to change arena.js debug logging to not rely on the
  percent interpolation functionality in console.log which is
  apparently not available anywhere else

...and the rwlock may not even work. I'll still need to add a
stress test.
@maxpain
Copy link

maxpain commented Jun 21, 2022

Any updates?

@GeoffreyBooth
Copy link
Member

how about new Worker(x, { synchronizedStdio: false })?

@addaleax In #44710 @JakobJingleheimer and I have been working on getting the ESM loaders processing to happen in a worker thread, and we’ve run into this issue, that console.logs within the worker don’t print anywhere. For now I’ve worked around it via this:

import { writeFile } from 'node:fs/promises'
await writeFile('/dev/fd/1', 'some message I want printed to stdout')

And this works, even while the main thread is frozen. So I think @devsnek’s idea should be possible.

As you might’ve guessed, I’m on a Mac, so I assume this will probably have difficulty in Windows. Is there an equivalent to /dev/fd/1 that could work for Windows?

More generally, could we provide something like a worker_threads.logToStdout method that is an alias for this, and handles the mapping to stdout as appropriate for whatever platform they’re running on? Or console.log could be such a method, if that doesn’t break anything. Likewise for console.error/stderr.

@addaleax
Copy link
Member

addaleax commented Jan 5, 2023

and we’ve run into this issue, that console.logs within the worker don’t print anywhere

@GeoffreyBooth They do print, just asynchronously (which can also happen with the main thread, to be clear). Is the main thread blocked in this scenario? If so, it might be worth thinking a bit about why that would be necessary.

And this works, even while the main thread is frozen

Yeah, I think that’s expected. For debugging this should be totally fine.

So I think @devsnek’s idea should be possible.

IIUC what @devsnek was suggesting was to make stdio handles in the main thread and worker threads work the same way and refer to the same fds. That’s what doesn’t work.

I would be very reluctant to go the way of using fs operations under the hood here. I still believe that if we think this is an issue worth fixing, we should take care of stdio on a fully separate thread, independent of main thread and worker thread.

Is there an equivalent to /dev/fd/1 that could work for Windows?

You could pass 1 instead of the string '/dev/fd/1' (but may need to take care of character conversions yourself in that case).

More generally, could we provide something like a worker_threads.logToStdout method that is an alias for this, and handles the mapping to stdout as appropriate for whatever platform they’re running on?

If we had this, it should probably just be console.log directly rather than a separate method.

Or console.log could be such a method, if that doesn’t break anything. Likewise for console.error/stderr.

I do think it’s a good thing that console.log and friends are built with a clean architecture where they operate on top of the typical Node.js stdio streams, not lower-level mechanisms that would then have to replicate the streams infrastructure.


From a very practical point of view, I think something that could address the issue that @deregtd brought up above would be to allow for a certain amount of data to be written without the other side acknowleding it, similar to how stdio works in other contexts. That doesn’t address the case in which the main thread event loop is blocked, but it partially addresses the one in which the worker thread event loop is blocked.

@devsnek
Copy link
Member

devsnek commented Jan 5, 2023

my suggestion was that console methods should coordinate natively using a mutex or smth, instead of queuing up on the main thread to be processed. i don't think we really need to worry that much about fairness or anything.

@addaleax
Copy link
Member

i don't think we really need to worry that much about fairness or anything.

Agreed, I'm also not worried about fairness between threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdio Issues and PRs related to stdio. worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests