Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-521: SSL Connections Close Prematurely Due to Race Condition #3410

Closed
OTP-Maintainer opened this issue Nov 23, 2017 · 3 comments
Closed
Labels
bug Issue is reported as a bug priority:medium team:PS Assigned to OTP team PS
Milestone

Comments

@OTP-Maintainer
Copy link

Original reporter: acaruth
Affected versions: OTP-20.0, OTP-20.1, OTP-20.0.1, OTP-20.0.2, OTP-20.1.1, OTP-20.1.5
Fixed in version: OTP-20.2
Component: ssl
Migrated from: https://bugs.erlang.org/browse/ERL-521


Due to the github commit [7f33b43|https://github.com/erlang/otp/pull/1479/commits/7f33b438a21b65663338a32d05983b76bdb1ef28] the pattern of behaviour below will cause an early {{ssl_closed}} message before all data is delivered:

Consider an Application Data record comprising of 3 1460 byte TCP packets. Also consider that the application sending the packets intends to send no more data after the application data and so sends a TCP {{FIN}} immediately after sending the application data TLS record and possibly even before receiving a TCP {{ACK}} from the receiver (this is how {{openssl s_server}} behaves):

# {{tls_connection:next_tls_record}} calls {{tls_record:get_tls_records}} with first 1460 byte packet.
# The packet is not a complete TLS record so the data is added to the {{protocol_buffers.tls_record_buffer}}.
# {{tls_connection:next_record/1}} is called and we hit the new code for the commit mentioned earlier. This calls {{tls_socket:setopts(...active, once)}} and in this case returns OK, so {{no_record, State}} is returned.
# The second packet is processed. Again it is not a complete record but this time, imagine that the application has closed the socket.
# {{tls_socket:setopts(...active, once])}} gets an {{error, einval}} and returns a {{socket_closed}}
# The {{socket_closed}} propagates even though the {{protocol_buffers.tls_record_buffer}} contains data.
# The process will receive an {{ssl_closed}} on the socket but will not receive the application data record (that data is lost in the buffer mentioned above).

I have a system that reproduces this when connecting to an SSL HTTP server using {{openssl s_server -www ...}}. The problem is visible about 50% of the time.

I propose the following patch to fix the issue, though the SSL maintainers might have a better fix :) Note the following is based on the 20.1.6 tag in github.

{code}
--- otp/lib/ssl/src/tls_connection.erl	2017-11-23 08:57:16.885403500 +0000
+++ otp/lib/ssl/src/tls_connection.fixed.erl	2017-11-23 09:01:11.834836364 +0000
@@ -599,14 +599,18 @@
 	#alert{} = Alert ->
 	    {Alert, State}
     end;
-next_record(#state{protocol_buffers = #protocol_buffers{tls_packets = [], tls_cipher_texts = []},
+next_record(#state{protocol_buffers = 
+		       #protocol_buffers{tls_packets = [], tls_cipher_texts = []}
+		   = Buffers,
 		   socket = Socket,
 		   transport_cb = Transport} = State) ->
     case tls_socket:setopts(Transport, Socket, [{active,once}]) of
 	ok ->
 	    {no_record, State};
+	_ -> when Buffers#protocol_buffers.tls_record_buffer =:= [] ->
+	    {socket_closed, State};
 	_ ->
-	    {socket_closed, State}
+	    {no_record, State}
     end;
 next_record(State) ->
     {no_record, State}.
{code}

I think, in the diff above, the {{Buffers#protocol_buffers.tls_record_buffer}} might be a empty binary, or an empty list, so the final fix would need to take that into account (rather than just checking for an empty list).
@OTP-Maintainer
Copy link
Author

ingela said:

Actually the original patch sent a fake closed message to self()  and we/I thought it was not the thing to do, but we/I might have to take that back, sometimes it can be the thing to do!  That Buffers#protocol_buffers.tls_record_buffer has data does not mean it got enough data to deliver and unfortunately there is no tcp delivery guarantee  at  application level only on transport level. But of course we want to do our best.

@OTP-Maintainer
Copy link
Author

ingela said:

Hi!

Would the following patch also solve it? Trying to make sure I am understanding the problem properly.

{code:java}
diff --git a/lib/ssl/src/tls_connection.erl b/lib/ssl/src/tls_connection.erl
index 96243db..fa6854d 100644
--- a/lib/ssl/src/tls_connection.erl
+++ b/lib/ssl/src/tls_connection.erl
@@ -141,12 +141,14 @@ next_record(#state{protocol_buffers =
     end;
 next_record(#state{protocol_buffers = #protocol_buffers{tls_packets = [], tls_cipher_texts = []},
 		   socket = Socket,
+                   close_tag = CloseTag,
 		   transport_cb = Transport} = State) ->
     case tls_socket:setopts(Transport, Socket, [{active,once}]) of
 	ok ->
 	    {no_record, State};
 	_ ->
-	    {socket_closed, State}
+            self() ! {CloseTag, Socket},
+	    {no_record, State}
     end;
 next_record(State) ->
     {no_record, State}.
@@ -154,15 +156,15 @@ next_record(State) ->
 next_event(StateName, Record, State) ->
     next_event(StateName, Record, State, []).
 
-next_event(StateName, socket_closed, State, _) ->
-    ssl_connection:handle_normal_shutdown(?ALERT_REC(?FATAL, ?CLOSE_NOTIFY), StateName, State),
-    {stop, {shutdown, transport_closed}, State};
+%% next_event(StateName, socket_closed, State, _) ->
+%%     ssl_connection:handle_normal_shutdown(?ALERT_REC(?FATAL, ?CLOSE_NOTIFY), StateName, State
),
+%%     {stop, {shutdown, transport_closed}, State};
 next_event(connection = StateName, no_record, State0, Actions) ->
     case next_record_if_active(State0) of
 	{no_record, State} ->
 	    ssl_connection:hibernate_after(StateName, State, Actions);
-	{socket_closed, State} ->
-	    next_event(StateName, socket_closed, State, Actions);
+	%% {socket_closed, State} ->
+	%%     next_event(StateName, socket_closed, State, Actions);
 	{#ssl_tls{} = Record, State} ->
 	    {next_state, StateName, State, [{next_event, internal, {protocol_record, Record}} | Actions]};

 	{#alert{} = Alert, State} ->

{code}

@OTP-Maintainer
Copy link
Author

acaruth said:

Excellent, that works great as it preserves the ordering of the data messages and the socket closed message.

I've tested this before the patch and it failed 5/10 times (using the openssl s_server). With your new suggested change, I re-tested 20 times and there were no failures.

Thanks!

@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug team:PS Assigned to OTP team PS priority:medium labels Feb 10, 2021
@OTP-Maintainer OTP-Maintainer added this to the OTP-20.2 milestone Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:medium team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

1 participant