Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http2: Do not call non-reentrant nghttp2 functions from nghttp2 callback #39076

Closed
wants to merge 2 commits into from

Conversation

mdouglass
Copy link
Contributor

The nghttp2 functions nghttp2_session_send, nghttp2_session_mem_send,
nghttp2_session_recv and nghttp2_session_mem_recv are documented as not
being able to be called from nghttp2 callback functions.

Add a tracker so we can tell when we are within the scope of an nghttp2
callback and early return from ConsumeHTTP2Data and SendPendingData
which are the only two methods that use the prohibited methods.

This prevents a use-after-free crash bug that can be triggered otherwise.

Fixes: #38964
Refs: https://nghttp2.org/documentation/programmers-guide.html#remarks

The nghttp2 functions nghttp2_session_send, nghttp2_session_mem_send,
nghttp2_session_recv and nghttp2_session_mem_recv are documented as not
being able to be called from nghttp2 callback functions.

Add a tracker so we can tell when we are within the scope of an nghttp2
callback and early return from ConsumeHTTP2Data and SendPendingData
which are the only two methods that use the prohibited methods.

This prevents a use-after-free crash bug that can be triggered otherwise.

Fixes: nodejs#38964
Refs: https://nghttp2.org/documentation/programmers-guide.html#remarks
@github-actions github-actions bot added c++ Issues and PRs that require attention from people who are familiar with C++. http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run. labels Jun 18, 2021
Copy link
Contributor

@bl-ue bl-ue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, you are amazing @mdouglass 😄

@mdouglass
Copy link
Contributor Author

OMG, you are amazing @mdouglass smile

😊 thanks :)

@mdouglass
Copy link
Contributor Author

@jasnell I've been looking into the test failures here and picked one pretty much at random -- test-http2-compat-errors.js.
In that case, the test is doing:

  res.write('hello');
  expected = new Error('kaboom');
  res.stream.destroy(expected);

That code is executed inside the OnFrameReceive callback so when the write call triggers a call to SendPendingData we skip it. The problem is that the destroy call isn't skipped, so the stream is destroyed without actually sending the data and the test ends up timing out waiting for data to arrive that won't.

I can see a couple possible solutions:

  1. Delay all calls that change stream state when inside an nghttp2 callback. Kinda brute force, would require identifying all those calls which may be difficult.
  2. When making callbacks from node_http2 to javascript (eg. MakeCallback(env()->http2session_on_headers_function() in HandleHeadersFrame, delay them via setImmediate so that they don't operate within the scope of the nghttp2 callback. This may cause other issues that are in my blind spots (perf maybe?).
  3. Have all the nghttp2 callbacks immediately call setImmediate so we never do anything within the scope of the nghttp2 callback.
  4. ???

Any thoughts?

@jasnell
Copy link
Member

jasnell commented Jun 19, 2021

Definitely appreciate this. I had it in my list to investigate but hadn't yet been able to get to it. Good stuff

@jasnell
Copy link
Member

jasnell commented Jun 19, 2021

Hmmm I don't know right off hand. And I'm not near my laptop at that moment. I'll look at it on Monday morning ..

@addaleax
Copy link
Member

2. This may cause other issues that are in my blind spots (perf maybe?).

Yeah, I think that would be too heavy on the benchmarks to be a good solution :/

3. Have all the nghttp2 callbacks immediately call setImmediate so we never do anything within the scope of the nghttp2 callback.

The native env->SetImmediate() function is quite a bit more efficient than the JS equivalent, so I think this might be doable, but it would probably still have some unfortunate perf overhead?

That being said … if the test is calling .write() and then .destroy() on the underlying stream without waiting for the write() call to succeed, then I think it’s perfectly acceptable to say that the test should not expect the data to actually end up being sent.

@mdouglass
Copy link
Contributor Author

mdouglass commented Jun 20, 2021

That being said … if the test is calling .write() and then .destroy() on the underlying stream without waiting for the write() call to succeed, then I think it’s perfectly acceptable to say that the test should not expect the data to actually end up being sent.

I had wondered if that might be the case, but it seemed like such a straightforward example and I hadn't verified that all the other broken tests had a similar cause that I was hesitant to suggest it.

@mdouglass
Copy link
Contributor Author

mdouglass commented Jun 20, 2021

  1. Have all the nghttp2 callbacks immediately call setImmediate so we never do anything within the scope of the nghttp2 callback.

Responding to my own suggestion -- I don't think this is doable in the general case as the return value of some of the nghttp2 callbacks are meaningful (eg. OnBeginHeadersCallback wants to return 0, NGHTTP2_ERR_CALLBACK_FAILURE or NGHTTP2_ERR_TEMPORAL_CALLBACK_FAILURE).

@mcollina
Copy link
Member

Could this land?

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 28, 2021
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 28, 2021
@nodejs-github-bot
Copy link
Collaborator

@mdouglass
Copy link
Contributor Author

Could this land?

Unfortunately, tests don't pass in the current form of this PR. I need some advice from someone more involved with the project on how to handle this:

  1. Treat them as invalid tests because they depended on not guaranteed behavior and then rewrite them
  2. Figure out a way to make them work. My earlier suggestions don't work out very well (see my earlier notes on them).

I'd really like to get a fix in and am happy to do the work, but I'm blocked at this point.

@mcollina
Copy link
Member

If the code above is the source of the problem (write after destroy without any guarantee of data writing down), I'd just recommend to edit the tests.

@kumarak
Copy link
Contributor

kumarak commented Jul 14, 2021

@mdouglass, I looked into the double-free memory issue. It seems the nghttp2 callbacks are getting called in the different event loops and it's not a reentrancy issue. The double-free happens because the event loop is trying to close the already closed stream on calling stream.cancel(). This is because it is passed to the JS side before the stream is closed.

How do you think about delaying the callbacks to javascript side (env->http2session_on_stream_close_function) on stream close callbacks (OnStreamClose) with setImmediate and not for all the nghttp2 callbacks? If the stream cancel event is raised from JS, the event loop will have information that the stream is closed and does not try to close it again. There will be a performance concern but it is only on stream close events. Please let me know your thoughts.

@mdouglass
Copy link
Contributor Author

@mdouglass, I looked into the double-free memory issue. It seems the nghttp2 callbacks are getting called in the different event loops and it's not a reentrancy issue. The double-free happens because the event loop is trying to close the already closed stream on calling stream.cancel(). This is because it is passed to the JS side before the stream is closed.

I am fairly certain I was seeing reentrant calls to the nghttp2 library. I have nghttp2/nghttp2#1591 open against nghttp2 that will warn when reentrant calls are made on the same session object. If I use that PR in place of the vendored version of nghttp then it triggers reliably from my sample code. That being said, I'm new to both these codebases so could be wrong.

How do you think about delaying the callbacks to javascript side (env->http2session_on_stream_close_function) on stream close callbacks (OnStreamClose) with setImmediate and not for all the nghttp2 callbacks? If the stream cancel event is raised from JS, the event loop will have information that the stream is closed and does not try to close it again. There will be a performance concern but it is only on stream close events. Please let me know your thoughts.

I think adding the setImmediate call is likely to work as it is pretty much an internal implementation of my original workaround from the bug report:

There is a workaround available in my example client.js if you comment out line 45 and comment in line 46 (delaying stream.cancel via the use of setImmediate).

The validity of that really rests on rather I am correct or not on the underlying reentrancy issue :)

@kumarak
Copy link
Contributor

kumarak commented Jul 15, 2021

I am also new to the node codebase. Please help me if I miss something.

I am fairly certain I was seeing reentrant calls to the nghttp2 library. I have nghttp2/nghttp2#1591 open against nghttp2 that will warn when reentrant calls are made on the same session object. If I use that PR in place of the vendored version of nghttp then it triggers reliably from my sample code. That being said, I'm new to both these codebases so could be wrong.

You might be seeing calls to the nghttp2 library from the JS side of callbacks. I think that should not be a problem.
The problem I see with the current fix is it will start skipping the JS side of operations which it should not. That's what you notice with the failing test cases and it gets stuck in the io loop.

I think adding the setImmediate call is likely to work as it is pretty much an internal implementation of my original workaround from the bug report:

setImmediate will be a similar work around to handle stream close/cancel but I think it should be handled natively to avoid memory error if one misses it in JS. if not then it should be handled gracefully in the native code.

@mdouglass
Copy link
Contributor Author

You might be seeing calls to the nghttp2 library from the JS side of callbacks. I think that should not be a problem.
The problem I see with the current fix is it will start skipping the JS side of operations which it should not. That's what you notice with the failing test cases and it gets stuck in the io loop.

So got this opened back up enough and back up to speed with where I was a few weeks ago. There are definitely reentrant calls that go nghttp2 library -> nghttp2 callback in node_http2.cc -> nghttp2 library. You can see that in log output from my PR when you see the log line: skipping sending pending data from within nghttp2 callback. If you'd like to play with it more, I have two branches I pushed to my copy of the node repo you can play with:

https://github.com/mdouglass/node/commits/fix/38964-alt - This just adds reentrancy detection to nghttp2, running that with any of the existing http2 tests will trigger an ASAN violation.

https://github.com/mdouglass/node/commits/fix/38964-debug-help - Adds the reentrancy detection on the nghttp2 and nodejs sides as well as a lot of extra logging (from when I was trying to figure out what was going on).

That being said, using setImmediate on the close operation might still be an interesting way to allow the not-yet-working fix in this PR to prevent the reentrant operations while avoiding the behavior changes. I'll try to play with that a bit more tomorrow.

@ssilve1989
Copy link

Any update on this?

@mdouglass
Copy link
Contributor Author

It should probably be closed. It was fixed a while back as a CVE that was separated from this issue: #39423.

@mdouglass mdouglass closed this Oct 8, 2021
@ssilve1989
Copy link

ssilve1989 commented Nov 3, 2021

@mdouglass Is it actually fixed though? I was still able to reproduce what I believe is the same issue on LTS/Current recently in code at my job. I am having a hard time isolating and reproducing the exact conditions to make a minimal repo example though. For me it only seems to happen under extreme load using GHZ to load test + NestJS's gRPC framework, and modifying Nest's source code to cancel the call in a setImmediate alleviates the issue

@mdouglass
Copy link
Contributor Author

My original fix was not used as it had other issues. A fix was made without my involvement as part of the CVE process and I have not retested my poc with it. Sorry, I don't have more info.

@kumarak
Copy link
Contributor

kumarak commented Nov 4, 2021

Hi @ssilve1989, Could you share more details about what you see with your test? The node has a fix to avoid the issue but I believe the actual bug is with nghttp2 that frees up memory early and call the stream close callback with a specific flow of rst streams. If the problem you see is similar and because of the rst streams, I can review it and we can open a discussion with nghttp2 team.

@ssilve1989
Copy link

@kumarak Hi, i re-ran my scenario yesterday and it looks like this was fixed in 16.11.0 where the bump to the nghttp2 module occurred, so it all looks good now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
8 participants