Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for future causes interpreter crashes in Node 8 when debugging #357

Closed
jeremypenner opened this issue Dec 21, 2017 · 9 comments
Closed

Comments

@jeremypenner
Copy link

Given the following program:

var Future = require('fibers/future');
var Fiber = require('fibers');

function sleepFuture(ms) {
  var future = new Future();
  setTimeout(function() { future.return(); }, ms);
  return future;
}
Fiber(function() {
  for (var i = 0; i < 1000; i ++) {
    sleepFuture(100).wait();
    console.log(i);
  }
}).run();
  • I run it as node --inspect-brk ./fiber-future.js
  • Attach the Chrome Node devtools
  • Set a breakpoint on console.log(i)
  • When the breakpoint hits, enter sleepFuture(500).wait() at the devtools console

I get the following crash from node:

Debugger listening on ws://127.0.0.1:9229/27da74c2-0d87-4443-a933-d179a35a5e66
For help see https://nodejs.org/en/docs/inspector
Debugger attached.
0
1
2
3
4
5
6
7
Error: async hook stack has become corrupted (actual: 0, expected: 38)
 1: node::AsyncWrap::PopAsyncIds(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/jeremy/.nvm/versions/node/v8.9.3/bin/node]
 2: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) [/Users/jeremy/.nvm/versions/node/v8.9.3/bin/node]
 3: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/jeremy/.nvm/versions/node/v8.9.3/bin/node]
 4: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/jeremy/.nvm/versions/node/v8.9.3/bin/node]
 5: 0x5312170463d
 6: 0x531217f4ed2
 7: 0x531217bf9ce
 8: 0x531217f4ed2

Following the same steps with Node 6, the command pauses for 500ms as expected.

My use case is debugging webdriver.io tests, and using the Chrome devtools as a REPL, which works very reliably under Node 6.

@benjamn
Copy link
Contributor

benjamn commented Dec 22, 2017

I'm not super familiar with the async_hooks API, but my first reaction to this bug (which I can confirm) is that perhaps the Fiber class should extend the require("async_hooks").AsyncResource class. If I'm thinking about this correctly, fibers would call this.emitBefore() at the beginning of Fiber.prototype.run or .throwInto, and this.emitAfter() just before any Fiber.yield.

@benjamn
Copy link
Contributor

benjamn commented Dec 22, 2017

The async_hooks API appears to be unaware of multithreading, unfortunately.

I've raised this concern here, and may open an issue if I don't get a response.

Since the stack of async IDs is not exposed to JS or C++, I don't know if there's anything we can do here until this problem is addressed by Node.

@laverdet
Copy link
Owner

I've run into this issue in my other project isolated-vm. I ended up fixing it there with a hack which uses an internal v8 function to monkey patch my own callback before node's native callback get to run. In the absence of support from the nodejs team, which I doubt I'll get, I think that's what I'll have to end up doing here as well.

@benjamn
Copy link
Contributor

benjamn commented Jan 4, 2018

@laverdet Any luck with applying that hack (or something similar) here?

@laverdet
Copy link
Owner

laverdet commented Jan 5, 2018

Sorry I've been traveling a lot for the holidays and haven't had much time to look at this. Turns out the hack I had mind wasn't relevant but I think I can get this issue resolved in another way. The fix I'm exploring now uses an undocumented API in async_wrap which gives me some access to the async stack. As fibers swap contexts I can swap out the async stacks as well.

Though, without support from the node team I fear I'm just kicking the can down the road until this API changes. It also seems that a new API called async_hooks is supposed to replace async_wrap, and async_hooks doesn't have the features I need-- so this may come back sooner rather than later.

You can monkey-patch this fix directly in fibers.js without needing to recompile:

if (process.fiberLib) {
	module.exports = process.fiberLib;
} else {
	var fs = require('fs'), path = require('path');

	// Seed random numbers [gh-82]
	Math.random();

	// Look for binary for this platform
	var modPath = path.join(__dirname, 'bin', process.platform+ '-'+ process.arch+ '-'+ process.versions.modules, 'fibers');
	try {
		// Pull in fibers implementation
		process.fiberLib = module.exports = require(modPath).Fiber;
	} catch (ex) {
		// No binary!
		console.error(
			'## There is an issue with `node-fibers` ##\n'+
			'`'+ modPath+ '.node` is missing.\n\n'+
			'Try running this to fix the issue: '+ process.execPath+ ' '+ __dirname.replace(' ', '\\ ')+ '/build'
		);
		console.error(ex.stack || ex.message || ex);
		throw new Error('Missing binary. See message above.');
	}

	// !!!! Fix for async hooks starts here
	let aw = process.binding('async_wrap');
	let kExecutionAsyncId = aw.constants.kExecutionAsyncId;
	let kTriggerAsyncId = aw.constants.kTriggerAsyncId;

	function getAndClearStack() {
		let ii = aw.asyncIdStackSize();
		let stack = new Array(ii);
		for (; ii > 0; --ii) {
			let asyncId = aw.async_id_fields[kExecutionAsyncId];
			stack[ii - 1] = {
				asyncId: asyncId,
				triggerId: aw.async_id_fields[kTriggerAsyncId],
			};
			aw.popAsyncIds(asyncId);
		}
		return stack;
	}

	function restoreStack(stack) {
		for (let ii = stack.length - 1; ii >= 0; --ii) {
			aw.pushAsyncIds(stack[ii].asyncId, stack[ii].triggerId);
		}
	}

	let Fiber = module.exports;
	let stacks = new WeakMap;
	let defaultHolder = {};
	Fiber.yield = function(yield) {
		return function(param) {
			stacks.set(Fiber.current || defaultHolder, getAndClearStack());
			yield(param);
			let key = Fiber.current || defaultHolder;
			let stack = stacks.get(key);
			if (stack) {
				stacks.delete(key);
				restoreStack(stack);
			}
		};
	}(Fiber.yield);

	Fiber.prototype.run = function run(run) {
		return function(param) {
			stacks.set(Fiber.current || defaultHolder, getAndClearStack());
			if (this.started) {
				let stack = stacks.get(this);
				stacks.delete(this);
				restoreStack(stack);
			}
			run.call(this, param);
		};
	}(Fiber.prototype.run);
}

This is just experimental code but I wanted to give you an update. The example in the original post works now, but without the upstream patch you mentioned on meteor#9275 the process comes crashing down not long after.

@benjamn
Copy link
Contributor

benjamn commented Jan 5, 2018

Thanks for the update! Swapping out the async stacks seems like the right (only?) way to do it. I'm hopeful that I can turn this into a Node PR to make async_hooks work better with multiple threads (a slightly more general use case than fibers).

Also, I like the idea of moving more of this logic into JS (out of C++), but I was uncertain whether the native C++ always calls the JS implementation of Fiber.prototype.run, or sometimes calls the underlying C++ Run function directly, without going through the wrapped JS method.

Per my previous comment, I tried instantiating an AsyncResource object and calling emitBefore and emitAfter in SwapContext, and it appears implementing the AsyncResource protocol is not enough to fix this bug. In other words, the current async_hooks API is not sufficient to handle threads/fibers, so we'll need to either hack it or submit a PR.

@laverdet
Copy link
Owner

laverdet commented Jan 9, 2018

@jeremypenner @benjamn can you please take master out for a spin and let me know if it's any better

benjamn added a commit to meteor/meteor that referenced this issue Jan 10, 2018
The async_hooks API seems not entirely thread-aware, which can cause
problems when fibers are used, since fibers behave a lot like threads from
the perspective of Node/V8.

The author of node-fibers (@laverdet) has found a way to save/restore the
async_hooks stack, and wants feedback on the effectiveness of the changes:
laverdet/node-fibers#357 (comment)

The relevant commit (not yet published to npm):
laverdet/node-fibers@c6fec5c

If these changes work, I hope to turn them into a proper Node PR, but in
the meantime it seems useful to run Meteor's test suite against them.
benjamn added a commit to meteor/meteor that referenced this issue Jan 10, 2018
The async_hooks API seems not entirely thread-aware, which can cause
problems when fibers are used, since fibers behave a lot like threads from
the perspective of Node/V8.

The author of node-fibers (@laverdet) has found a way to save/restore the
async_hooks stack, and wants feedback on the effectiveness of the changes:
laverdet/node-fibers#357 (comment)

The relevant commit (not yet published to npm):
laverdet/node-fibers@c6fec5c

If these changes work, I hope to turn them into a proper Node PR, but in
the meantime it seems useful to run Meteor's test suite against them.
@benjamn
Copy link
Contributor

benjamn commented Jan 10, 2018

@laverdet All of Meteor's tests pass with this version of fibers, for what it's worth!

@jeremypenner
Copy link
Author

@laverdet I don't see any crashes relating to the async stack anymore, but as you noted, node still crashes if I call sleepFuture(500).wait() from the debugger twice. I have other test cases that mimic webdriver.io's logic that cause a crash more immediately, which I can post if it would help, but I suspect that without @benjamn's v8 patch I'm likely to be out of luck on that front.

Really appreciate both of you digging into this problem.

sasmith pushed a commit to Asana/node-fibers that referenced this issue Apr 17, 2024
These hacks were added to address laverdet#357, which was an issue in Node 8.

The hack works by clearing the async context stack when a fiber runs, and restoring it when the fiber yields. Unfortunately, the restoration isn't complete. At a minimum, it doesn't correctly restore the stack of async resources.

But also, the hack depends on async_wrap, which is now deprecated, and when the hack is removed, the original issues no longer reproduces, and unit tests pass in Asana. So as far as I can tell, the hack is no longer needed in node 18.

Test Plan:
Run the example code from laverdet#357 to try to reproduce the issue in node 18. See that the issue doesn't reproduce.
(Note that this was in Asana's codebase, rather than running pure node)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants