Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with OpenSSL 3.2 #105

Closed
sacovo opened this issue Jan 13, 2024 · 11 comments
Closed

Compatibility with OpenSSL 3.2 #105

sacovo opened this issue Jan 13, 2024 · 11 comments
Labels
bug Something isn't working triggers release Major issue that when fixed, results in an "emergency" release

Comments

@sacovo
Copy link

sacovo commented Jan 13, 2024

Retrieving a website over ssl leads to an infinite loop when using openssl 3.2.0, when compiled with --debug a segfault occurs.

This is the code:

use "net"
use "http"

actor Main
  new create(env: Env) =>
    let client = NetworkClient(TCPConnectAuth(env.root), env.out)

    try
      let url = URL.build("https://echo.sacovo.ch")?
      client.get(url)
    end

class SimpleHandlerFactory is HandlerFactory
  let out: OutStream

  new create(out': OutStream) =>
    out = out'

  fun box apply(session: HTTPSession tag): HTTPHandler ref^ =>
    SimpleHandler(out)


class SimpleHandler is HTTPHandler
  let out: OutStream

  new create(out': OutStream) =>
    out = out'

  fun ref apply(payload: Payload val): None tag =>
    out.print("Status " + payload.status.string())


    try
      for line in payload.body()?.values() do
        out.print(line)
      end
    end

actor NetworkClient
  let auth: TCPConnectAuth
  let out: OutStream

  new create(auth': TCPConnectAuth, out': OutStream) =>
    auth = auth'
    out = out'

  be get(url: URL) =>
    let client = HTTPClient.create(auth)
    out.print("Sending get...")
    try
      let payload = Payload.request("GET", url)
      client.apply(consume payload, SimpleHandlerFactory(out))?
      out.print("Sent request")
    end

The content is retrieved and displayed, but the program doesn't exit.

Compiling with --debug leads to a segfautl with the follwoing call stack:

#74845 0x0000555555572bc5 in net_ssl_SSL_ref_read_Zo (this=0x7ffff75ea440, expect=0)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl.pony:184
#74846 0x0000555555572bc5 in net_ssl_SSL_ref_read_Zo (this=0x7ffff75ea440, expect=0)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl.pony:184
#74847 0x0000555555572bc5 in net_ssl_SSL_ref_read_Zo (this=0x7ffff75ea440, expect=0)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl.pony:184
#74848 0x0000555555572bc5 in net_ssl_SSL_ref_read_Zo (this=0x7ffff75ea440, expect=0)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl.pony:184
#74849 0x000055555556fd7f in net_ssl_SSLConnection_ref__poll_oo (this=0x7ffff75ea580, conn=0x7ffff761c600)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl_connection.pony:178
#74850 0x000055555556fea8 in net_ssl_SSLConnection_ref_received_ooZb (this=0x7ffff75ea580, conn=0x7ffff761c600, 
    data=0x7fffe5e06c00, times=1)
    at /home/sandro/work/fhnw/pony-ssl-test/_corral/github_com_ponylang_net_ssl/net_ssl/ssl_connection.pony:96
#74851 0x0000555555580e25 in net_TCPConnection_ref__pending_reads_o (this=0x7ffff761c600)
    at /home/sandro/.local/share/ponyup/ponyc-release-0.58.0-x86_64-linux-ubuntu22/packages/net/tcp_connection.pony:964
#74852 0x00005555555809b7 in net_TCPConnection_tag__event_notify_oIIo (this=0x7ffff761c600, event=0x7fffe5e19e00, 
    flags=1, arg=0)
    at /home/sandro/.local/share/ponyup/ponyc-release-0.58.0-x86_64-linux-ubuntu22/packages/net/tcp_connection.pony:664
#74853 0x0000555555566219 in net_TCPConnection_Dispatch ()
#74854 0x0000555555589302 in ponyint_actor_run ()
#74855 0x000055555559373f in run_thread ()
#74856 0x00007ffff76aa9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#74857 0x00007ffff772e7cc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

So there seems to be an endless recursion when checking if there are more bytes to read from ssl.

# ponyc -v
0.58.0-a161b7c [release]
Compiled with: LLVM 15.0.7 -- Clang-14.0.0-x86_64
Defaults: pic=true

# openssl version
OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Downgrading to 3.1.4 fixes this issue.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jan 13, 2024
@SeanTAllen SeanTAllen added bug Something isn't working help wanted Extra attention is needed triggers release Major issue that when fixed, results in an "emergency" release needs investigation This needs to be looked into before it's "ready for work" labels Jan 13, 2024
@sacovo
Copy link
Author

sacovo commented Jan 13, 2024

I tried to find the commit that introduced the issue with git bisect, and it seems to be openssl/openssl@4030869

@sacovo
Copy link
Author

sacovo commented Jan 14, 2024

Not sure why this commit introduced the bug, but I could fix my issue by returning in SSL.read if @SSL_get_error(_ssl, r) is 2, which according to the openssl docs stands for a closed connection and no more data to read.

      if r <= 0 then
        match @SSL_get_error(_ssl, r)
        | 1 | 5 => _state = SSLError
        | 2 =>
          // SSL buffer has more data but it is not yet decoded (or something)
          _read_buf.truncate(offset)
          return None
        | 6 =>
          // SSL_ERROR_ZERO_RETURN, peer has closed connection, no more data to read
          return None
        end

Maybe other errors should also be handled in some way? _state is never checked in reading at the moment.

@SeanTAllen
Copy link
Member

Thanks @sacovo. I'll take a look in the next couple days and get some CI setup with the OpenSSL 3.2 environment I set up yesterday. I'll get it failing and then I'll take the excellent work you've done and get a fix out. Thanks for all the triaging. You've been awesome.

@SeanTAllen
Copy link
Member

@sacovo

_state is never checked in reading at the moment.

It is checked in ssl_connection.pony though in _poll which all the various methods including reading should be triggering. I think the bug is that

| 1 | 5 | 6 => _state = SSLError

in addition to setting the state, should also do return None.

Can you try:

        match @SSL_get_error(_ssl, r)
        | 1 | 5 | 6 =>
          _state = SSLError
          return None
        | 2 =>
          // SSL buffer has more data but it is not yet decoded (or something)
          _read_buf.truncate(offset)
          return None
        end

`

@sacovo
Copy link
Author

sacovo commented Jan 14, 2024

Yes, this works in fixing the issue.

I still think there is something wrong with the @SSL_has_pending call, for some reason it keeps returning 1, even after the response is finished and @SSL_get_error returns SSL_ERROR_ZERO_RETURN. This starts with the mentioned commit in openssl, before that the call returns 0.

@SeanTAllen
Copy link
Member

SeanTAllen commented Jan 14, 2024

@sacovo do you feel up to opening an issue with the OpenSSL folks and I can make that change (which I think is correct regardless) in our Pony code?

An interesting note from their manpage:

SSL_has_pending() returns 1 if s has buffered data (whether processed or unprocessed) and 0 otherwise. Note that it is possible for SSL_has_pending() to return 1, and then a subsequent call to SSL_read_ex() or SSL_read() to return no data because the unprocessed buffered data when processed yielded no application data (for example this can happen during renegotiation). It is also possible in this scenario for SSL_has_pending() to continue to return 1 even after an SSL_read_ex() or SSL_read() call because the buffered and unprocessed data is not yet processable (e.g. because OpenSSL has only received a partial record so far).

@sacovo
Copy link
Author

sacovo commented Jan 14, 2024

Yes, implementing that change seems reasonable and I'll open an issue in the openssl repo.

The quote from the manpage is interesting. I think it doesn't apply to the situation here, since there really is no more data to process and nothing should be in the buffer.

@SeanTAllen
Copy link
Member

The quote from the manpage is interesting. I think it doesn't apply to the situation here, since there really is no more data to process and nothing should be in the buffer.

I'm wondering if they introduced a bug where it thinks there is data. Something along the lines of an "off by one" or equivalent when they switched how they are determining it in the commit you referenced.

@SeanTAllen
Copy link
Member

Note to future humans. Given this regression requires ponylang/http which depends on this library, I'm going to make the regression test etc on ponylang/http rather than here.

SeanTAllen added a commit to ponylang/http that referenced this issue Jan 14, 2024
Prior to this commit, we were only testing against a single SSL library
on Linux plus also a version on Windows.

We have seen that with http related code, we can encounter errors on
different SSL versions. See ponylang/net_ssl#105.
To adjust to this, with this test, we are adding testing against all our
main supported SSL versions on Linux.

This does not change our MacOS (not currently done) or our Windows testing
matrix for ponylang/http.
SeanTAllen added a commit to ponylang/http that referenced this issue Jan 14, 2024
Prior to this commit, we were only testing against a single SSL library
on Linux plus also a version on Windows.

We have seen that with http related code, we can encounter errors on
different SSL versions. See ponylang/net_ssl#105.
To adjust to this, with this test, we are adding testing against all our
main supported SSL versions on Linux.

This does not change our MacOS (not currently done) or our Windows testing
matrix for ponylang/http.
SeanTAllen added a commit to ponylang/http that referenced this issue Jan 14, 2024
Prior to this commit, we were only testing against a single SSL library
on Linux plus also a version on Windows.

We have seen that with http related code, we can encounter errors on
different SSL versions. See ponylang/net_ssl#105.
To adjust to this, with this test, we are adding testing against all our
main supported SSL versions on Linux.

This does not change our MacOS (not currently done) or our Windows testing
matrix for ponylang/http.
SeanTAllen added a commit to ponylang/http that referenced this issue Jan 14, 2024
This switches us to testing against OpenSSL 3.2.0 from OpenSSL 3.1.3.

3.1.3 seems to work fine for us, but we've identified a bug with
OpenSSL 3.2.0. See ponylang/net_ssl#105 for
more information on the bug.
SeanTAllen added a commit to ponylang/http that referenced this issue Jan 14, 2024
This switches us to testing against OpenSSL 3.2.0 from OpenSSL 3.1.3.

3.1.3 seems to work fine for us, but we've identified a bug with
OpenSSL 3.2.0. See ponylang/net_ssl#105 for
more information on the bug.
@SeanTAllen
Copy link
Member

I have a working regression test in ponylang/http that catches this problem.

https://github.com/ponylang/http/actions/runs/7521779093/job/20473011093?pr=105

@SeanTAllen
Copy link
Member

Fix is verified in CI.

https://github.com/ponylang/http/actions/runs/7521816087/job/20473093908?pr=105

I'm going to finish up #107 then merge it and do a release then update the HTTP PR to use the new net_ssl and do an HTTP release.

Thank you so much for your assistance with this issue @sacovo. I did the final few feet to get releases out and regression testing in place, but none of this would have happened without you. That was some awesome debugging. You rock.

SeanTAllen added a commit that referenced this issue Jan 14, 2024
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Jan 14, 2024
@SeanTAllen SeanTAllen removed help wanted Extra attention is needed needs investigation This needs to be looked into before it's "ready for work" labels Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triggers release Major issue that when fixed, results in an "emergency" release
Projects
None yet
Development

No branches or pull requests

3 participants