Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls_sender: avoid livelock timeout for distribution data #2703

Merged
merged 1 commit into from
Sep 14, 2020

Conversation

max-au
Copy link
Contributor

@max-au max-au commented Aug 3, 2020

Distribution over TLS may exhibit livelock-like behaviour when
there is a constant stream of distribution messages. Limit
TLS record to "distribution buffer busy limit" to avoid this
behaviour.

This commit also recovers ssl_dist_bench_SUITE accidentally
broken when adding "erl_epmd callback listen_port_please".

@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Aug 4, 2020
@IngelaAndin
Copy link
Contributor

We think that this is a good PR. We have only one thing that we would like you to change, that is we rather you use a constant like say ?MAX_PLAIN_TEXT_LENGTH *16 instead of erlang:system_info(dist_buf_busy_limit). The reason for this is that the user can change this value and then destroy the intention of this code.

@IngelaAndin IngelaAndin added the testing currently being tested, tag is used by OTP internal CI label Aug 26, 2020
@IngelaAndin
Copy link
Contributor

@max-au Looking good so far in the tests it will be ready for merge once you address comment above.

@max-au
Copy link
Contributor Author

max-au commented Aug 28, 2020

Let me verify this does not lead to a performance degradation. We are using large distribution buffers of 16+ Mb, so the idea behind erlang:system_info(dist_buf_busy_limit). was to use at least the same size.

Quick run of ssl_dist_bench_SUITE reports no degradation, except for "Got 250 busy_dist_port msgs" instead of "Got 0 busy_dist_port msgs" when I use dist_buf_busy_limit.

@IngelaAndin
Copy link
Contributor

@rickard-green do you have any comment on this ? If the suggested constant should cause a problem maybe we can find a better value or new config parameter. But using dist_buf_busy_limit is problematic.

@IngelaAndin IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Sep 1, 2020
@rickard-green
Copy link
Contributor

If the Erlang application manage flow control over distribution channels itself, the dist_buf_busy_limit does not add anything of value, only potential problems. Today you can increase the dist_buf_busy_limit up to 2 GB and it is not impossible that we make it possible to disable the busy limit all together. That is, in such scenarios it it problematic to base this limit on the dist_buf_busy_limit.

I'm not sure what value to use, but just using dist_buf_busy_limit as is, is not a good option. One option can be to use dist_buf_busy_limit up to a certain point (for example up to 100 MB), but never go beyond that point.

@max-au
Copy link
Contributor Author

max-au commented Sep 4, 2020

I found that ?MAX_PLAIN_TEXT_LENGTH is 16.384, and 16x of this is 256 Kb.
This value it quite small for long-distance networking (cross-continent latency is 200 ms on average).
Would it work if I start with some very large number (e.g. 16 Mb) hardcoded? Currently there is no limit at all (hence livelock condition that times out connections).

@IngelaAndin IngelaAndin removed the stalled waiting for input by the Erlang/OTP team label Sep 7, 2020
@IngelaAndin
Copy link
Contributor

IngelaAndin commented Sep 7, 2020

@max-au I agree that a hard coded limit still would be an improvement, however as it feels hard to find a good default value we would probably want it to be configurable. Maybe an distribution specific ssl application environment variable that defaults to the big value so you do not have to set it. What to you think @RaimoNiskanen ?

@RaimoNiskanen
Copy link
Contributor

I'd guess that 16 MB would not be unreasonably large.

Do we actually want to get the send buffer size of the socket and use that value?
If so, is there a way to figure out that value and propagate it to the right place in the code?

@IngelaAndin IngelaAndin added the waiting waiting for changes/input from author label Sep 8, 2020
@IngelaAndin
Copy link
Contributor

We need to decide soon if we want it be part of 23.1

Distribution over TLS may exhibit livelock-like behaviour when
there is a constant stream of distribution messages. Limit
TLS record to 16 Mb to avoid this behaviour.

This commit also recovers ssl_dist_bench_SUITE accidentally
broken when adding "erl_epmd callback listen_port_please".
@max-au
Copy link
Contributor Author

max-au commented Sep 9, 2020

I believe that 16 Mb buffer can be a good starting point.

I attempted to use sndbuf value for an underlying socket, and realised that I do not know how to set it up. Is it expected that "-kernel inet_dist_listen_options [{sndbuf, 12345}]" does not affect TLS distribution sockets?

@IngelaAndin
Copy link
Contributor

Looking at the code I do believe we use those listen options. I think it then could be an option to use the sndbuf
value if it is set and otherwise default to 16 MB. What do you think @RaimoNiskanen ?

@max-au
Copy link
Contributor Author

max-au commented Sep 10, 2020

I attempted to check it in runtime, using this:

max-au$ erl -proto_dist inet_tls -sname me -ssl_dist_optfile ssl_dist_opt.conf -kernel inet_dist_listen_options '[{sndbuf, 11111}]'
1> application:get_env(kernel, inet_dist_listen_options).
{ok,[{sndbuf,11111}]}

(connect from another node)
4> S = sys:get_state(net_kernel).
{state,
       #{<0.104.0> => 'another@max-au-mbp'},
       #{<0.101.0> => 'another@max-au-mbp'},

5> Sender = list_to_pid("<0.101.0>").
6>  sys:get_state(Sender).
{connection,{data,{static,<0.102.0>,server,#Port<0.8>,

7> Port = list_to_port("#Port<0.8>").
8> inet:getopts(Port, [sndbuf]).
{ok,[{sndbuf,65328}]}

However I can clearly see that listening socket itself has this option set:

(me@dane-mbp)10> Ls = list_to_port("#Port<0.2>").
#Port<0.2>
(me@dane-mbp)11> inet:getopts(Ls, [sndbuf]).
{ok,[{sndbuf,11111}]}

So it appears that sndbuf is set only for a listener socket, but not inherited by the accepted socket. Is it intended?

@RaimoNiskanen
Copy link
Contributor

Ouch!

I am pretty sure the intention was to change the send buffer for the traffic socket i.e the accept socket. Not much point in having a big send buffer on a listening socket. Seems like a fumble nobody has noticed.

I can understand how it has happened. There is room for improvement here...

In inet_drv the current buffer size is inherited i.e the internal 'bufsz' option, but inheriting does not affect the kernel 'sndbuf' and 'rcvbuf' values.

But when you set the 'sndbuf' or 'rcvbuf' values, 'bufsz' is set to match...

Linux explicitly doubles the value you set with 'sndbuf' so a getsockopt returns about or at least twice what you set, so inheriting that seems strange, and sure enough 'sndbuf' and 'rcvbuf' are not on the list of options that prim_inet or inet_tcp copies from the listen to the accept socket.

So the behaviour is some programmer's (perhaps not explicit) intention, but not the intention of the one that set 'sndbuf' in inet_dist_listen_options. (We may have adviced customers to do this, and they have merrily walked away)

I guess reading 'rcvbuf' instead would kind of work since it seems to be inherited as-is. Then we probably have a performance bug in that 'sndbuf' and 'rcvbuf' does not match for an accepted socket, that we ought to fix.

Or we fix inheriting of 'sndbuf' and 'rcvbuf' for an accept socket and just ignore that it is doubled. It is just one doubling, after all, and Linux users can be informed about this pecularity. Though, I browsed past worrying code for UDP avoiding to set a buffer larger than the MTU, so the combination of the kernel's doubling and that code needs an investigation...

Or, just use 16 MB, and we might have to refine that later.

@IngelaAndin
Copy link
Contributor

Well, looks like the best way forward is going for the hard code it now and improve later strategy. We need to merge this on Monday to make it for 23.1

@max-au
Copy link
Contributor Author

max-au commented Sep 11, 2020

Or we fix inheriting of 'sndbuf' and 'rcvbuf' for an accept socket and just ignore that it is doubled.

We actually patch inet_drv to do this - but we halve sndbuf on Linux. I am not completely sure whether it can break anything else, so I haven't made a PR.

The current PR version hardcodes 16 Mb. Thanks for productive conversation, and I wonder what would be the best way to proceed with all other discoveries (e.g. inability to set snd/rcv buf for TLS distribution, listen socket not being reopened on acceptor restart). Would it be appropriate to create tickets in JIRA or GitHub issues?

@IngelaAndin IngelaAndin added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI waiting waiting for changes/input from author labels Sep 12, 2020
@RaimoNiskanen
Copy link
Contributor

Regarding the buffer size inheritance - I think it is a real problem that we should put at fix for in our daily builds to see if it has any ill effects. I think the right place is to put it in the list of options to inherit in tcp_inet, and to not compensate for Linux doubling.

We do not and have never used GitHub Issues (so far), so we should take remaining issues in Jira or GitHub PR:s.

@RaimoNiskanen RaimoNiskanen merged commit 22707e5 into erlang:maint Sep 14, 2020
@max-au max-au deleted the max-au/tls-sender-fix branch June 30, 2021 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants