Skip to content
Please note that GitHub no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: limit foreground tasks draining loop #19987

Closed
wants to merge 5 commits into from
Closed

Conversation

@ulan
Copy link
Contributor

ulan commented Apr 12, 2018

Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

Fixes: #19937

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

Fixes: #19937
@@ -66,6 +67,7 @@ class PerIsolatePlatformData :
int unref();

// Returns true iff work was dispatched or executed.
// New tasks that are posted during flushing of the queue are not run.

This comment has been minimized.

Copy link
@addaleax

addaleax Apr 12, 2018

Member

Tiny nit: Maybe be more specific here and say that they are not run as part of this iteration, rather than implying that they aren’t run at all?

This comment has been minimized.

Copy link
@ulan

ulan Apr 13, 2018

Author Contributor

Thanks. Done.

@jasnell

This comment has been minimized.

Copy link
Member

jasnell commented Apr 12, 2018

hmmm... definitely +1 on this but should this be semver-major?

@hashseed

This comment has been minimized.

Copy link
Member

hashseed commented Apr 13, 2018

@jasnell

This comment has been minimized.

Copy link
Member

jasnell commented Apr 14, 2018

There are some CI failures. it's not clear if they are unrelated.

@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Apr 15, 2018

@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Apr 15, 2018

Not sure but there might be issues with Windows.

@BridgeAR BridgeAR removed the author ready label Apr 16, 2018
@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Apr 16, 2018

Windows failures seem related.

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 16, 2018

Thanks. I'll try to reproduce the windows failure and debug it.

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 16, 2018

The windows failure should be fixed now. The problem was caused by the incorrect repost count in the new test. That resulted in one left-over task that tried to access Environment after its destruction.

@hashseed

This comment has been minimized.

Copy link
Member

hashseed commented Apr 16, 2018

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 17, 2018

sequential/test-inspector-stop-profile-after-done is failing on ubuntu1710-x64 on all three CI jobs.
https://ci.nodejs.org/job/node-test-commit-linux/18054/nodes=ubuntu1710-x64/consoleFull

Unfortunately it doesn't reproduce on my machine. I will have to find ubuntu 17.10.

11:36:59 not ok 2197 sequential/test-inspector-stop-profile-after-done
11:36:59   ---
11:36:59   duration_ms: 1.293
11:36:59   severity: fail
11:36:59   exitcode: 1
11:36:59   stack: |-
11:36:59     [test] Connecting to a child Node process
11:36:59     [test] Testing /json/list
11:36:59     [err] Debugger listening on ws://127.0.0.1:38959/3e238875-261d-468a-ae06-ed825beac8af
11:36:59     [err] 
11:36:59     [err] For help, see: https://nodejs.org/en/docs/inspector
11:36:59     [err] 
11:36:59     [err] Debugger attached.
11:36:59     [err] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [err] Waiting for the debugger to disconnect...
11:36:59     [err] 
11:36:59     { AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
11:36:59     + expected - actual
11:36:59     
11:36:59     - null
11:36:59     + 0
11:36:59         at runTests (/mnt/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1710-x64/test/sequential/test-inspector-stop-profile-after-done.js:28:10)
11:36:59         at process._tickCallback (internal/process/next_tick.js:178:7)
11:36:59       generatedMessage: true,
11:36:59       name: 'AssertionError [ERR_ASSERTION]',
11:36:59       code: 'ERR_ASSERTION',
11:36:59       actual: null,
11:36:59       expected: 0,
11:36:59       operator: 'strictEqual' }
11:36:59     1
11:36:59   ...
@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Apr 17, 2018

@ulan that is a known flake. It is independent from this PR and you can safely ignore it :-)

@hashseed

This comment has been minimized.

Copy link
Member

hashseed commented Apr 17, 2018

Guess this is ready then?

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 17, 2018

@BridgeAR, thank you!
@hashseed, yes, it is ready.

Copy link
Member

bnoordhuis left a comment

@apapirovski

This comment has been minimized.

Copy link
Member

apapirovski commented Apr 19, 2018

Ok, I know that sequential/test-inspector-stop-profile-after-done is a flake but this PR might've exacerbated it? Last CI it got triggered 5 times and I'm pretty sure it's highly uncommon to hit that one more than once in a CI run. Anyone have any thoughts?

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 19, 2018

@apapirovski yes, that seems strange. Could it be caused by a "unlucky" base revision? (I don't know if CI rebases the patch to the tip of tree for each run).

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 19, 2018

After further inspection, I think it might be related because the test uses the sampling heap profiler, which internally relies on V8's allocation observer. Incremental marker also uses the allocation observer.

There is probably some timing change that causing the test to fail more frequently. I will investigate.

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 20, 2018

Sampling heap profiler theory was incorrect. The test is about CPU profiler.

I found a bug in the initial patch: NodeInspectorClient::runMessageLoopOnPause uses FlushForegroundTasks, so it has to be called now in a loop to preserve the old behaviour. I uploaded a patch that will hopefully fix the failure on the bot.

Copy link
Member

addaleax left a comment

Copy link
Member

apapirovski left a comment

Thank you for following up on this.

// Returns true iff work was dispatched or executed.
// New tasks that are posted during flushing of the queue are postponed until
// the next flushing.
// Returns true iff work was dispatched or executed. New tasks that are

This comment has been minimized.

Copy link
@apapirovski

apapirovski Apr 20, 2018

Member

Nit: s/iff/if.

This comment has been minimized.

Copy link
@BridgeAR

BridgeAR Apr 20, 2018

Member

@apapirovski I guess these were intentional?

This comment has been minimized.

Copy link
@apapirovski

apapirovski Apr 20, 2018

Member

Yeah but I know the meaning and I didn't even think of it until you mentioned it... but maybe that just reflects badly on me... 🤔 😆

This comment has been minimized.

Copy link
@apapirovski

apapirovski Apr 20, 2018

Member

I guess what I'm saying: I don't think this is the place for iff. It doesn't make it clearer.

But I'm not going to object if someone prefers this. It's a low priority thing for me.

This comment has been minimized.

Copy link
@ulan

ulan Apr 20, 2018

Author Contributor

Done. The iff was there before my change. I agree that if seems slightly better.

@@ -133,7 +133,10 @@ class NodePlatform : public MultiIsolatePlatform {
double CurrentClockTimeMillis() override;
v8::TracingController* GetTracingController() override;

void FlushForegroundTasks(v8::Isolate* isolate);
// Returns true iff work was dispatched or executed. New tasks that are

This comment has been minimized.

Copy link
@apapirovski

apapirovski Apr 20, 2018

Member

Same here, s/iff/if

This comment has been minimized.

Copy link
@ulan

ulan Apr 20, 2018

Author Contributor

Done.

@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 20, 2018

There are two windows failures in the recent CI job:
1)

not ok 246 parallel/test-tls-server-verify
  ---
  duration_ms: 60.164
  severity: fail
  stack: |-
    timeout
  ...
FATAL: command execution failed
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:18 

Both failures seem to me unrelated to the PR.

@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Apr 23, 2018

@apapirovski

This comment has been minimized.

Copy link
Member

apapirovski commented Apr 25, 2018

Landed in d3edf2f

Thank you for the contribution @ulan and congrats on becoming a Contributor! 🎉

apapirovski added a commit that referenced this pull request Apr 25, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: #19987
Fixes: #19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
@ulan

This comment has been minimized.

Copy link
Contributor Author

ulan commented Apr 25, 2018

@apapirovski thanks for landing!

@@ -316,7 +316,7 @@ class NodeInspectorClient : public V8InspectorClient {
terminated_ = false;
running_nested_loop_ = true;
while (!terminated_ && channel_->waitForFrontendMessage()) {
platform_->FlushForegroundTasks(env_->isolate());
while (platform_->FlushForegroundTasks(env_->isolate())) {}

This comment has been minimized.

Copy link
@Lucretiel

Lucretiel Apr 26, 2018

Doesn't this line preserve the bug? Because, even though you swap the queue before draining it, you're still running this loop forever if new tasks are constantly added to the original queue

This comment has been minimized.

Copy link
@ulan

ulan Apr 27, 2018

Author Contributor

The inspector posts foreground tasks and requires that they are all processed before going to the outer loop.

AFAIK this code runs only when the inspector is active and the program is paused. The normal libuv tasks are not processed here. Latency shouldn't be an issue.

This comment has been minimized.

Copy link
@Lucretiel

Lucretiel Apr 28, 2018

My understanding is that the original bug this is trying to fix (#19937) is that there are cases where foreground tasks can add additional tasks to the queue. The bug was fixed by freezing the queue inside of FlushForegroundTasks, but this line of code I'm commenting on appears to loop THAT CALL so that the freeze fix doesn't actually help anything. It still will run forever, if foreground tasks add themselves.

Unless there's more than one place where FlushForegroundTasks is being called, and that's not an issue for this line?

This comment has been minimized.

Copy link
@apapirovski

apapirovski Apr 29, 2018

Member

There is certainly more than one place. This particular line addresses a very specific interaction.

isurusiri added a commit to isurusiri/node that referenced this pull request Apr 30, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: nodejs#19987
Fixes: nodejs#19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
MylesBorins added a commit that referenced this pull request May 4, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: #19987
Fixes: #19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
@MylesBorins MylesBorins mentioned this pull request May 8, 2018
blattersturm added a commit to citizenfx/node that referenced this pull request Nov 3, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: nodejs#19987
Fixes: nodejs#19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.