Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix worker event loop ref/unref + leak #4114

Merged
merged 13 commits into from Aug 12, 2023
Merged

Fix worker event loop ref/unref + leak #4114

merged 13 commits into from Aug 12, 2023

Conversation

paperdave
Copy link
Collaborator

@paperdave paperdave commented Aug 11, 2023

What does this PR do?

Fixes worker tests but it makes these changes to WebWorker.

was reimplemented a little on aug 11:

  • A worker's event loop impacts the main process' event loop. Meaning if you setTimeout in a worker/listen for messages, it keeps main alive.
  • Worker is by default .ref(), but workers will exit if their event loop dies.
  • If you do addEventListener("message") on a message port or the global, this refs the event loop.
  • If you do worker.unref() this no longer makes the worker's event loop affect the caller's. This is what esbuild depends on.
  • We were not freeing WebWorker struct. This was because its a little complex. I solved it by assigning vm to null when the vm is done with, then the while(true) loop for the worker (the source of the UAF) would call real deinit. It's not really that complex with the changes to the worker event loop. .terminate sets a flag to tell the worker to exit the event loop, and the deinit is called from within spin() on the Worker's thread.

I'd like a close review to make sure i did everything right. I triple checked my uses of event loop refs and memory usage in the worker zig file.

  • Documentation or TypeScript types (it's okay to leave the rest blank in this case)
  • Code changes

How did you verify your code works?

I ran unit tests. For testing large scale worker event loop stuff.

import { Worker, isMainThread, workerData } from "worker_threads";

if (isMainThread) {
  for (let i = 0; i < 1000; i++) {
    const w = new Worker("/Users/dave/code/bun/test/js/web/worker-demo.mjs", { ref: false, workerData: i });
    let recieved = false;
    w.on("message", msg => {
      if (msg === "initial message") {
        recieved = true;
      } else {
        console.log("WHAT?", msg);
      }
    });
    setTimeout(() => {
      if (!recieved) {
        w.terminate();
        console.log("worker", i, "did not respond");
      }
    }, 5000 + 500);
  }
} else {
  setTimeout(() => {
    self.postMessage("initial message");
    console.log("workerid=", workerData);
  }, Math.random() * 5000 + 500);
}

Pipe this to a file to see that exactly 1000 lines are output.

worker.queueInitialTask();
}
}
// defer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker.queueInitialTask was removing an eventloop ref way too early. since the ref was now based on the worker's event loop, the initial task to unref isn't needed i think. did not mean to leave as commented code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code here was removed instead of commented

continue;
while (true) {
while (vm.eventLoop().tasks.count > 0 or vm.active_tasks > 0 or vm.uws_event_loop.?.active > 0) {
this.eventloop_poll_ref.refConcurrently(this.parent);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be here

Copy link
Collaborator Author

@paperdave paperdave Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant to have this at the end of the while(true) (didnt push that commit because was looking into other bugs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not call .refConcurrently() on every tick of the event loop? It should only need to call it once while there are pending tasks.

this.onTerminate();
this.eventloop_poll_ref.unrefConcurrently(this.parent);
vm.eventLoop().tickPossiblyForever();
this.parent.eventLoop().wakeup();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a race condition here where the parent event loop is unref'd

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the case where the parent event loop is not the main thread

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i did not think about workers starting other workers

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

@paperdave 2 files with test failures on bun-darwin-aarch64:

  • test/js/bun/test/test-test.test.ts
  • test/js/node/dns/node-dns.test.js

View test output

#706591f8fb828e8de364e047f8b461c90ea10673

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

@paperdave 1 files with test failures on linux-x64:

  • test/js/third_party/prisma/prisma.test.ts

View test output

#706591f8fb828e8de364e047f8b461c90ea10673

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

@paperdave 1 files with test failures on linux-x64-baseline:

  • test/js/third_party/prisma/prisma.test.ts

View test output

#706591f8fb828e8de364e047f8b461c90ea10673

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

@paperdave 4 files with test failures on bun-darwin-x64-baseline:

  • test/js/bun/spawn/spawn-streaming-stdin.test.ts
  • test/js/bun/sqlite/sqlite.test.js
  • test/js/third_party/webpack/webpack.test.ts
  • test/js/web/timers/setTimeout.test.js

View test output

#706591f8fb828e8de364e047f8b461c90ea10673

@@ -2122,6 +2122,11 @@ pub export fn MarkedArrayBuffer_deallocator(bytes_: *anyopaque, _: *anyopaque) v
// zig's memory allocator interface won't work here
// mimalloc knows the size of things
// but we don't
if (comptime Environment.allow_assert) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[existing bug reproduced by sveltekit build. this assertion makes it fail consistantly]

@@ -82,5 +82,3 @@ import { build, buildSync, transform, transformSync } from "esbuild";
throw new Error("Test failed.");
}
}

process.exit(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably was not needed, but a reason to keep this is that the process should exit since there are no ref'd polls.

@paperdave
Copy link
Collaborator Author

bug with terminate() but usable

@Jarred-Sumner Jarred-Sumner merged commit 78defe7 into main Aug 12, 2023
15 of 20 checks passed
@Jarred-Sumner Jarred-Sumner deleted the dave/workertests branch August 12, 2023 20:51
@Douile
Copy link

Douile commented Aug 13, 2023

This PR adds documentation about compiling a specific version of ZLS that works better which is after the Install zig part of the docs. But the version of zig installed by the install zig section (0.11.0-dev.4004+a57608217) is too old to compile the specific version of ZLS (v0.11.0-dev.4332+43b830415 is required). Perhaps its worth adding something that mentions this if this is intended.

78defe7#diff-f0c08b47e6be8e8b574a71f652c3c2c66f49d157b392f3091c58e8585d411010R183-R202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants