Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL QUIC client goes into DRAINING unexpectedly #23285

Closed
icing opened this issue Jan 12, 2024 · 14 comments
Closed

OpenSSL QUIC client goes into DRAINING unexpectedly #23285

icing opened this issue Jan 12, 2024 · 14 comments
Labels
branch: master Merge to master branch branch: 3.2 Merge to openssl-3.2 triaged: bug The issue/pr is/fixes a bug

Comments

@icing
Copy link

icing commented Jan 12, 2024

Description

Working on integrating OpenSSL 3.2 QUIC API into the curl project, I encounter an error with a >60% likelihood where the QUIC connection goes into DRAINING state. This happens just after sending off the first request.

I traced down the channel state change to quic_channel.c#3022: ch_start_terminating() being called without reason string and cause->error_code == 260.

OS: macOS 14.2.1
Server: same behaviour with local nghttpx or Fastly CDN on curl.se

Caveat: I use an unconnected socket in a dgram BIO, because of #23251.

Expectation

OpenSSL QUIC should be able to keep the connection going for many requests, as ngtcp2 and quiche do in the same network/server environment.

@icing icing added the issue: bug report The issue was opened to report a bug label Jan 12, 2024
icing added a commit to icing/curl that referenced this issue Jan 12, 2024
- opened openssl/openssl#23285
  on OpenSSL for connection draining issue
- shortened pytest nghttpx live check --connect-timeout
  since our unconnected socket does not fail otherwise
@mattcaswell mattcaswell added branch: master Merge to master branch triaged: bug The issue/pr is/fixes a bug branch: 3.2 Merge to openssl-3.2 and removed issue: bug report The issue was opened to report a bug labels Jan 12, 2024
@mattcaswell
Copy link
Member

Do you see anything on the OpenSSL error stack when that happens?

I traced down the channel state change to quic_channel.c#3022: ch_start_terminating() being called without reason string and cause->error_code == 260.

Any chance you might be able to get a stack trace at that point when this happens?

@icing
Copy link
Author

icing commented Jan 12, 2024

Sure:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	    0x7ff80ee137a2 __pthread_kill + 10
1   libsystem_pthread.dylib       	    0x7ff80ee4bf30 pthread_kill + 262
2   libsystem_c.dylib             	    0x7ff80ed6aa49 abort + 126
3   libcrypto.3.dylib             	       0x10d3b453a OPENSSL_die + 26
4   libssl.3.dylib                	       0x10cd2a84f ch_start_terminating + 639
5   libssl.3.dylib                	       0x10cd2ac70 ossl_quic_channel_on_remote_conn_close + 80
6   libssl.3.dylib                	       0x10cd3c599 ossl_quic_handle_frames + 3065
7   libssl.3.dylib                	       0x10cd2ddba ch_tick + 2394
8   libssl.3.dylib                	       0x10cd35ec0 ossl_quic_reactor_tick + 32
9   libssl.3.dylib                	       0x10cd3172a ossl_quic_handle_events + 58
10  libcurl.4.dylib               	       0x10cc76dbf cf_progress_ingress + 79 (curl_osslq.c:1168)
11  libcurl.4.dylib               	       0x10cc73f14 cf_osslq_recv + 804 (curl_osslq.c:1742)
12  libcurl.4.dylib               	       0x10cba96b4 Curl_cf_def_recv + 84 (cfilters.c:103)
13  libcurl.4.dylib               	       0x10cba96b4 Curl_cf_def_recv + 84 (cfilters.c:103)
14  libcurl.4.dylib               	       0x10cba96b4 Curl_cf_def_recv + 84 (cfilters.c:103)
15  libcurl.4.dylib               	       0x10cba9ab9 Curl_conn_recv + 297 (cfilters.c:183)
16  libcurl.4.dylib               	       0x10cc211a0 Curl_read + 208 (sendf.c:794)
17  libcurl.4.dylib               	       0x10cc468b0 readwrite_data + 496 (transfer.c:478)
18  libcurl.4.dylib               	       0x10cc46216 Curl_readwrite + 406 (transfer.c:1098)
19  libcurl.4.dylib               	       0x10cc10f3f multi_runsingle + 5391 (multi.c:2481)
20  libcurl.4.dylib               	       0x10cc0f887 curl_multi_perform + 391 (multi.c:2778)
21  libcurl.4.dylib               	       0x10cbc2c02 easy_transfer + 130 (easy.c:675)
22  libcurl.4.dylib               	       0x10cbc16fe easy_perform + 430 (easy.c:765)
23  libcurl.4.dylib               	       0x10cbc1547 curl_easy_perform + 23 (easy.c:784)
24  curl                          	       0x10c42c619 serial_transfers + 297 (tool_operate.c:2495)
25  curl                          	       0x10c42bf75 run_all_transfers + 101 (tool_operate.c:2686)
26  curl                          	       0x10c42bea9 operate + 857 (tool_operate.c:2802)
27  curl                          	       0x10c42af77 main + 167 (tool_main.c:273)
28  dyld                          	    0x7ff80eac1386 start + 1942

@icing
Copy link
Author

icing commented Jan 12, 2024

It claims to encounter CONN_CLOSE FRAME seen: x1d type=5, so a 1RTT type.

@t8m
Copy link
Member

t8m commented Jan 12, 2024

Yeah, that means the peer closed the connection for some reason.

@icing
Copy link
Author

icing commented Jan 12, 2024

nghttp3 closes the connection with error 260 when it see a close of a "critical" stream, e.g. ctrl, qpack_enc or qpack_dec. I don't see any closing of those stream SSL* in my code. Any good location in the OpenSSL code to trace that?

@hlandau
Copy link
Member

hlandau commented Jan 12, 2024

Are you calling SSL_free on an essential stream causing it to be ended?

@icing
Copy link
Author

icing commented Jan 12, 2024

Are you calling SSL_free on an essential stream causing it to be ended?

For being double certain, I commented out my SSL_free() and SSL_stream_conclude() still shows the same behaviour.

@hlandau
Copy link
Member

hlandau commented Jan 12, 2024

Concluding the HTTP/3 control stream would be a problem.

There is an HTTP/3 demo here: https://github.com/openssl/openssl/tree/master/demos/http3

@mattcaswell
Copy link
Member

nghttp3 closes the connection with error 260 when it see a close of a "critical" stream, e.g. ctrl, qpack_enc or qpack_dec. I don't see any closing of those stream SSL* in my code. Any good location in the OpenSSL code to trace that?

Stream resets get encoded here:

openssl/ssl/quic/quic_txp.c

Lines 2427 to 2466 in dbe66cd

if (stream->want_reset_stream) {
OSSL_QUIC_FRAME_RESET_STREAM f;
if (!ossl_assert(stream->send_state == QUIC_SSTREAM_STATE_RESET_SENT))
return 0;
wpkt = tx_helper_begin(h);
if (wpkt == NULL)
return 0; /* alloc error */
f.stream_id = stream->id;
f.app_error_code = stream->reset_stream_aec;
if (!ossl_quic_stream_send_get_final_size(stream, &f.final_size))
return 0; /* should not be possible */
if (!ossl_quic_wire_encode_frame_reset_stream(wpkt, &f)) {
tx_helper_rollback(h); /* can't fit */
txp_enlink_tmp(tmp_head, stream);
break;
}
if (!tx_helper_commit(h))
return 0; /* alloc error */
*have_ack_eliciting = 1;
tx_helper_unrestrict(h); /* no longer need PING */
stream->txp_sent_reset_stream = 1;
/*
* The final size of the stream as indicated by RESET_STREAM is used
* to ensure a consistent view of flow control state by both
* parties; if we happen to send a RESET_STREAM that consumes more
* flow control credit, make sure we account for that.
*/
if (!ossl_assert(f.final_size <= ossl_quic_txfc_get_swm(&stream->txfc)))
return 0;
stream->txp_txfc_new_credit_consumed
= f.final_size - ossl_quic_txfc_get_swm(&stream->txfc);
}

A stream is marked with FIN (concluded) here:

int ossl_quic_wire_encode_frame_stream_hdr(WPACKET *pkt,
const OSSL_QUIC_FRAME_STREAM *f)
{
uint64_t frame_type = OSSL_QUIC_FRAME_TYPE_STREAM;
if (f->offset != 0)
frame_type |= OSSL_QUIC_FRAME_FLAG_STREAM_OFF;
if (f->has_explicit_len)
frame_type |= OSSL_QUIC_FRAME_FLAG_STREAM_LEN;
if (f->is_fin)
frame_type |= OSSL_QUIC_FRAME_FLAG_STREAM_FIN;
if (!encode_frame_hdr(pkt, frame_type)
|| !WPACKET_quic_write_vlint(pkt, f->stream_id))
return 0;

@icing
Copy link
Author

icing commented Jan 12, 2024

Talking to google.com gives a reason string:

ch_start_terminating - DRAINING, error=260, reason=156:STOP_SENDING received for send control stream

@hlandau
Copy link
Member

hlandau commented Jan 12, 2024

Talking to google.com gives a reason string:

ch_start_terminating - DRAINING, error=260, reason=156:STOP_SENDING received for send control stream

You are sending STOP_SENDING on the HTTP/3 control stream. This is illegal. This will happen automatically if you call SSL_free on the control stream. Do not do this.

@mattcaswell
Copy link
Member

Hmm, but @icing says all SSL_free calls were commented out!

@mattcaswell
Copy link
Member

If it helps to track down where this occurs, the stop sending gets scheduled here:

qs->want_stop_sending = 1;

@icing
Copy link
Author

icing commented Jan 12, 2024

Thanks to you all, I start to see some light here. I did not set the incoming stream policy, which seems to lead to auto-rejection of the control streams the server wants to open. Working on it....

@openssl openssl locked and limited conversation to collaborators Jan 12, 2024
@t8m t8m converted this issue into discussion #23291 Jan 12, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
branch: master Merge to master branch branch: 3.2 Merge to openssl-3.2 triaged: bug The issue/pr is/fixes a bug
Projects
None yet
Development

No branches or pull requests

4 participants