Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Erlang over TLS does not work correctly using OTP 26 #7497

Closed
lukebakken opened this issue Jul 13, 2023 · 10 comments
Closed

Distributed Erlang over TLS does not work correctly using OTP 26 #7497

lukebakken opened this issue Jul 13, 2023 · 10 comments
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS

Comments

@lukebakken
Copy link
Contributor

lukebakken commented Jul 13, 2023

Describe the bug

When using OTP 26 (26.0.2 to be precise) and -proto_dist inet_tls -ssl_dist_optfile ... two Erlang nodes can't connect via distributed Erlang.

Using the latest version of OTP 25 works correctly.

To Reproduce

Reproduction steps that show SUCCESS

Note: clone my repository and change to the main branch to show success:

https://github.com/lukebakken/erlang-otp-7497#reproduction-steps

Reproduction steps that show FAILURE

Note: clone my repository and change to the otp-26 branch to show failure:

https://github.com/lukebakken/erlang-otp-7497/tree/otp-26#reproduction-steps

The main difference between the two branches is the Erlang version specified in the .tool-versions file.

Expected behavior

This code succeeds:

net_kernel:connect('a@HOSTNAME').

Affected versions

OTP 26.0.2 is the version with which I am testing that demonstrates failure. OTP 25.3.2.3 succeeds.

@lukebakken lukebakken added the bug Issue is reported as a bug label Jul 13, 2023
@u3s u3s added the team:PS Assigned to OTP team PS label Jul 14, 2023
@u3s
Copy link
Contributor

u3s commented Jul 14, 2023

  • are you using the same OTP version on both nodes?
  • what are you configuring in ssl_dist_optfile?
  • is there some error message, stack trace or some other symptoms of failure?

@lukebakken
Copy link
Contributor Author

lukebakken commented Jul 14, 2023

@u3s I provided comprehensive reproduction steps in this git repository, in two branches:

This succeeds: https://github.com/lukebakken/erlang-otp-7497#reproduction-steps

This fails: https://github.com/lukebakken/erlang-otp-7497/tree/otp-26#reproduction-steps

If you have time, I'd appreciate someone else confirming what I have found. This issue was originally discovered when Team RabbitMQ and Docker tried to upgrade to OTP 26 -

docker-library/rabbitmq#652

There is a test that uses TLS-enabled distributed Erlang that used to succeed with OTP 25 and then started to fail with 26. I traced it down to this issue.

are you using the same OTP version on both nodes?

Yes, they are the same. Please check out the reproduction steps.

what are you configuring in ssl_dist_optfile?

This is the template that is turned into the optfile. It is identical for the working version and the one that fails. The x509 certs are identical as well.

https://github.com/lukebakken/erlang-otp-7497/blob/main/inet-dist-tls.config.in

is there some error message, stack trace or some other symptoms of failure?

No, there is not. It would be great to have some sort of debug logging available for distributed Erlang but I am not aware of it.

@RaimoNiskanen
Copy link
Contributor

Try net_kernel:verbose(1). in both nodes before connecting. 2 is also possible, but start with 1.

@lukebakken
Copy link
Contributor Author

@RaimoNiskanen on it!

@lukebakken
Copy link
Contributor Author

Here is the transcript of a successful run using OTP 25.3.2.3 and net_kernel:verbose(2):

Node a

lbakken@shostakovich ~/development/lukebakken/erlang-otp-7497 (main *=)
$ ./run-node.sh a
elixir          1.15.2-otp-26   /home/lbakken/.tool-versions
erlang          25.3.2.3        /home/lbakken/development/lukebakken/erlang-otp-7497/.tool-versions
python          3.11.4          /home/lbakken/.tool-versions
ruby            3.2.2           /home/lbakken/.tool-versions
Erlang/OTP 25 [erts-13.2.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Eshell V13.2.2.2  (abort with ^G)
(a@shostakovich)1> =INFO REPORT==== 14-Jul-2023::05:43:07.749556 ===
{net_kernel,{'EXIT',<0.109.0>,shutdown}}
=INFO REPORT==== 14-Jul-2023::05:43:07.752467 ===
{net_kernel,{net_kernel,1473,nodedown,b@shostakovich}}
=INFO REPORT==== 14-Jul-2023::05:43:07.752525 ===
{net_kernel,{'EXIT',<0.112.0>,shutdown}}

(a@shostakovich)1> init:stop().
ok

Node b

lbakken@shostakovich ~/development/lukebakken/erlang-otp-7497 (main *=)
$ ./run-node.sh b
elixir          1.15.2-otp-26   /home/lbakken/.tool-versions
erlang          25.3.2.3        /home/lbakken/development/lukebakken/erlang-otp-7497/.tool-versions
python          3.11.4          /home/lbakken/.tool-versions
ruby            3.2.2           /home/lbakken/.tool-versions
Erlang/OTP 25 [erts-13.2.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

=INFO REPORT==== 14-Jul-2023::05:42:55.575656 ===
{net_kernel,{connect,normal,a@shostakovich}}
Eshell V13.2.2.2  (abort with ^G)
(b@shostakovich)1> ['a@shostakovich']

(b@shostakovich)1> init:stop().
ok

@lukebakken
Copy link
Contributor Author

lukebakken commented Jul 14, 2023

AHA @RaimoNiskanen using net_kernel:verbose(2) reveals the error:

=INFO REPORT==== 14-Jul-2023::07:55:51.621871 ===
{net_kernel,
    {conn_own_exit,<0.103.0>,
        {ssl_connect_failed,
            {192,168,1,5},
            36843,
            {error,{option,server_only,fail_if_no_peer_cert}}},
        a@shostakovich}}

I did notice this while working on the original issue (docker-library/official-images@7335ae3) but it didn't make sense because there was no useful error logged or returned, and that setting worked just fine with OTP 25.

Before I close this issue I'm going to re-check the OTP 26 READMEs and release announcements to see where this change was (or was not) announced.

@lukebakken
Copy link
Contributor Author

@RaimoNiskanen the change wasn't announced in my opinion. I see a brief mention of "Improved error checking and handling of ssl options" here -

https://www.erlang.org/news/164#ssl

...and one mention of fail_if_no_peer_cert here -

https://erlang.org/download/otp_src_26.0.readme

However, isn't fail_if_no_peer_cert a valid client option? Or was it seen as redundant when verify is taken into account?

@IngelaAndin
Copy link
Contributor

No it is not a valid client option, has never been. Is documented as server option but was ignored by client previously!
https://www.erlang.org/doc/man/ssl.html

@lukebakken
Copy link
Contributor Author

OK! I'll make sure all of the RabbitMQ docs reflect this. Thanks!

@lukebakken lukebakken closed this as not planned Won't fix, can't repro, duplicate, stale Jul 14, 2023
lukebakken added a commit to rabbitmq/rabbitmq-website that referenced this issue Jul 14, 2023
OTP 26 no longer ignores `fail_if_no_peer_cert` for a `client` setting.

See the following issue:

erlang/otp#7497
lukebakken added a commit to rabbitmq/cluster-operator that referenced this issue Jul 16, 2023
OTP 26 no longer ignores `fail_if_no_peer_cert` for a `client` setting.
Instead, distributed Erlang fails without informative error messages.

See the following issues:

* erlang/otp#7497
* rabbitmq/rabbitmq-website#1687
lukebakken added a commit to rabbitmq/cluster-operator that referenced this issue Jul 17, 2023
OTP 26 no longer ignores `fail_if_no_peer_cert` for a `client` setting.
Instead, distributed Erlang fails without informative error messages.

See the following issues:

* erlang/otp#7497
* rabbitmq/rabbitmq-website#1687
lukebakken added a commit to rabbitmq/cluster-operator that referenced this issue Jul 17, 2023
OTP 26 no longer ignores `fail_if_no_peer_cert` for a `client` setting.
Instead, distributed Erlang fails without informative error messages.

See the following issues:

* erlang/otp#7497
* rabbitmq/rabbitmq-website#1687

`customize_hostname_check` is client only
@lukebakken
Copy link
Contributor Author

FYI, I suspect this change will surprise people as they upgrade Erlang - https://groups.google.com/g/rabbitmq-users/c/-Yqb45pOYfc

We have updated the RabbitMQ docs and examples.

pstack2021 pushed a commit to rabbitmq/rabbitmq-website that referenced this issue Jul 18, 2023
OTP 26 no longer ignores `fail_if_no_peer_cert` for a `client` setting.

See the following issue:

erlang/otp#7497
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

4 participants