Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post 22.1 upgrade lighttpd periodically issues SSL errors #5516

Closed
2 tasks done
shonjir opened this issue Jan 28, 2022 · 47 comments
Closed
2 tasks done

Post 22.1 upgrade lighttpd periodically issues SSL errors #5516

shonjir opened this issue Jan 28, 2022 · 47 comments
Assignees
Labels
upstream Third party issue
Milestone

Comments

@shonjir
Copy link
Contributor

shonjir commented Jan 28, 2022

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Following update to 22.1 lighttpd periodically reports SSL invalid argument errors. Based on timing it seems this happens when lighttpd reloads. See screenshots for details.

To Reproduce

Nothing to do to reproduce.

Expected behavior

A clear and concise description of what you expected to happen.

Describe alternatives you considered

A clear and concise description of any alternative solutions or workaround you considered.

Screenshots

image

Relevant log files

If applicable, information from log files supporting your claim.

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:
OPNsense 22.1-amd64
FreeBSD 13.0-STABLE
OpenSSL 1.1.1m 14 Dec 2021

@fichtner
Copy link
Member

Cc @gstrauss

@fichtner fichtner added the support Community support label Jan 28, 2022
@gstrauss
Copy link

What was the previous version of lighttpd? What is the current version of lighttpd in (opnsense 22.1)? (Looks like lighttpd 1.4.63 in the log above). What was the previous version of openssl?

Note: There were no direct changes to lighttpd src/mod_openssl.c between lighttpd 1.4.60 and lighttpd 1.4.63.

Following update to 22.1 lighttpd periodically reports SSL invalid argument errors. Based on timing it seems this happens when lighttpd reloads. See screenshots for details.

How is lighttpd being reloaded? Gracefully (SIGUSR1)? Or not gracefully (SIGINT, SIGTERM)? What signal is being sent to lighttpd? If lighttpd is not being stopped gracefully, I am not surprised that openssl might complain.

@shonjir
Copy link
Contributor Author

shonjir commented Jan 28, 2022

@fichtner, my apologies - I failed to specify the pre-upgrade condition. I upgraded from opnsense 21.7.8 to 22.1 by booting from USB stick. During the upgrade I converted the filesystem from UFS to ZFS but I don't know if that is relevant.

@gstrauss
Copy link

@shonjir if you'd like my assistance, please try to answer my questions. Thank you.

@shonjir
Copy link
Contributor Author

shonjir commented Jan 29, 2022

@gstrauss I was trying to do so, my apologies to you if I've offended somehow.

I did not restart or reload lighttpd. The timestamps of the SSL errors are 00:01:19, 00:01:21 and 06:01:19 and 06:01:22, six hours apart. Re-review of the logs just now shows an additional 4 events at 12:01:20 and 12:01:22, so whatever is happening is happening every 6 hours. That's strongly suggestive of a cron job or other periodic task.

I had to look through my splunk logs for the version info you requested. Up until the upgrade to 21.7.8 at 01-27 @ 10am I was I was running lighttpd 1.4.61 and openssl 1.1.1l_1,1. Opnsense 21.7.8 installed lighttpd 1.4.63 and openssl 1.1.1m_1,1 which didn't change for opnsense 22.1 at around 01-27 @ 8:30pm.

Looking through my splunk logs I think I found the trigger for the SSL errors. I'm running the opnsense app and TA in splunk which collects system information via the web API every 6 hours, and the timestamps for this collection match the SSL error almost exactly. I went back a month in the logs and there were no instances of the SSL error message before upgrading to 22.1. I was running 21.7.8 for about 10 hours before this but there were no similar SSL errors on 21.7.8 despite running the same versions of lighttpd and openssl.

Relevant logs from Splunk.

_time | _raw
2022-01-28 12:01:22 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-28 12:01:22 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "998KiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-28 12:01:22 | <27>Jan 28 12:01:22 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 12:01:22 | <27>Jan 28 12:01:22 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 12:01:20 | <27>Jan 28 12:01:20 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 12:01:20 | <27>Jan 28 12:01:20 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 06:01:22 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-28 06:01:22 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "998KiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-28 06:01:22 | <27>Jan 28 06:01:22 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 06:01:22 | <27>Jan 28 06:01:22 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 06:01:19 | <27>Jan 28 06:01:19 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 06:01:19 | <27>Jan 28 06:01:19 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 00:01:21 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-28 00:01:21 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "998KiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-28 00:01:21 | <27>Jan 28 00:01:21 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 00:01:21 | <27>Jan 28 00:01:21 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 00:01:19 | <27>Jan 28 00:01:19 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-28 00:01:19 | <27>Jan 28 00:01:19 opnsense.home.arpa lighttpd[23304]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
2022-01-27 18:01:25 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 18:01:25 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.22MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-27 12:01:24 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 12:01:24 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.22MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-27 06:01:23 | {"name": "openssl", "version": "1.1.1l_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 06:01:23 | {"name": "lighttpd", "version": "1.4.61", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.21MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-27 00:01:24 | {"name": "openssl", "version": "1.1.1l_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 00:01:24 | {"name": "lighttpd", "version": "1.4.61", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.21MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-26 18:01:25 | {"name": "openssl", "version": "1.1.1l_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-26 18:01:25 | {"name": "lighttpd", "version": "1.4.61", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.21MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-26 12:01:24 | {"name": "openssl", "version": "1.1.1l_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-26 12:01:24 | {"name": "lighttpd", "version": "1.4.61", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.21MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}

@gstrauss
Copy link

Thank you for providing additional information. "I upgraded from opnsense 21.7.8 to 22.1" was not sufficient, which is why I asked for more information.

Looking through my splunk logs I think I found the trigger for the SSL errors. I'm running the opnsense app and TA in splunk which collects system information via the web API every 6 hours, and the timestamps for this collection match the SSL error almost exactly. I went back a month in the logs and there were no instances of the SSL error message before upgrading to 22.1. I was running 21.7.8 for about 10 hours before this but there were no similar SSL errors on 21.7.8 despite running the same versions of lighttpd and openssl.

Since that is more than 6 hours, was the "TA in splunk which collects system information via the web AI every 6 hours" running while you were on 21.7.8? If so, and since there was not an error in the lighttpd error log, perhaps the TA in splunk is using a client or a library which changed between 21.7.8 and 22.1.

Since the openssl library version did not change in your upgrade to 22.1, and, as I noted above, since there were no direct changes to lighttpd src/mod_openssl.c between lighttpd 1.4.60 and lighttpd 1.4.63, this is probably not a new issue with lighttpd, if it even is an issue with lighttpd. (FYI: You could confirm this if you were to temporarily run an older version of lighttpd on your new system, to see if the error trace occurs.)

Most importantly, you have identified that there is one specific client which is triggering the trace in lighttpd, so I would suggest looking more closely at that job.

@shonjir
Copy link
Contributor Author

shonjir commented Jan 29, 2022

Since that is more than 6 hours, was the "TA in splunk which collects system information via the web AI every 6 hours" running while you were on 21.7.8?

Correct, the splunk process was running while opnsense was on 21.7.8. The following logs are from that period of time.

2022-01-27 18:01:25 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 18:01:25 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.22MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}
2022-01-27 12:01:24 | {"name": "openssl", "version": "1.1.1m_1,1", "comment": "TLSv1.3 capable SSL and crypto library", "flatsize": "13.6MiB", "locked": "N/A", "automatic": "1", "license": "OpenSSL", "repository": "OPNsense", "origin": "security/openssl", "provided": "1", "installed": "1", "path": "OPNsense/security/openssl", "configured": "1", "collection_type": "package"}
2022-01-27 12:01:24 | {"name": "lighttpd", "version": "1.4.63", "comment": "Secure, fast, compliant, and flexible Web Server", "flatsize": "1.22MiB", "locked": "N/A", "automatic": "1", "license": "BSD3CLAUSE", "repository": "OPNsense", "origin": "www/lighttpd", "provided": "1", "installed": "1", "path": "OPNsense/www/lighttpd", "configured": "1", "collection_type": "package"}

If so, and since there was not an error in the lighttpd error log, perhaps the TA in splunk is using a client or a library which changed between 21.7.8 and 22.1.

No, the TA did not change during this time. It was last updated on 12/17/21 and has been running without error since then.

this is probably not a new issue with lighttpd, if it even is an issue with lighttpd

That makes sense. Another possibility is that it may not be an issue with lighttpd (or openssl) at all but something with the way lighttpd (or openssl) operates (or builds) on fbsd13.

You could confirm this if you were to temporarily run an older version of lighttpd on your new system, to see if the error trace occurs.

I'm more than willing to try! However, going back a month I can't find any trace of this error in the logs until fbsd13 / opnsense 22.1 was introduced.

you have identified that there is one specific client which is triggering the trace in lighttpd, so I would suggest looking more closely at that job.

That certainly makes sense. The TA doesn't do anything special other than initiating a bog-standard TLS connection to opnsense. The only thing that comes to mind is that on my opnsense installation the web UI runs with a locally generated web server certificate issued by a locally generated CA rather than the default self-signed certificate.

I tried to replicate the issue with curl but even authenticating to the API using the API key I'm not able to reproduce what the TA is triggering using either of the endpoints the TA queries.

curl -v --cacert cacert.crt -u "$key":"$secret" -XGET https://$opnsense/api/core/firmware/status
curl -v --cacert cacert.crt -u "$key":"$secret" -XGET https://$opnsense/api/core/firmware/info

Based on the error message (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument I did some digging into lighttpd source. Line 3415 is in mod_ssl_close_notify(). Error -1 is the return code from SSL_shutdown, and Error 5 is SSL_ERROR_SYSCALL.

This message is emitted when ERR_get_error() returns 0 (no error found) yet errno is nonzero, and errno is neither EPIPE nor ECONNRESET. The code contains a comment that refers to an ssl bug and refers to lighttpd ticket #2213 which claims the issue was fixed in 1.4.27 but clearly it wasn't fixed completely.

The ticket seems to suggest they believe this to be an upstream openssl bug, that in this case may be triggered by some change in semantics between fbsd 12 and 13.

Unfortunately lighttpd doesn't return the value of errno which might help track this down further.

Correction: perror() is translating errno into Invalid Argument, so whatever bug is being triggered is probably the result of new or improved input validation on the affected syscall in fbsd13.

@gstrauss
Copy link

gstrauss commented Jan 29, 2022

Unfortunately lighttpd doesn't return the value of errno which might help track this down further.

"Invalid argument" in the error message is the value from strerror(errno) so errno is EINVAL (IIRC, that is 22 on fbsd).

You might truss lighttpd when triggering the TA to try to find out which syscall is returning EINVAL underneath the lighttpd call to openssl SSL_shutdown(). It's probaby send() or similar, but what args are being passed that the error is EINVAL?

Another possibility is that it may not be an issue with lighttpd (or openssl) at all but something with the way lighttpd (or openssl) operates (or builds) on fbsd13.

Hopefully a truss might give a clue. Looking back in the trace for operations on the socket might help to determine the state that the socket is in when the error occurs. (Also, any firewall config changes between opnsense 21.7.8 and 22.1?)

Then again, it is possible that there is something that can be improved in how lighttpd attempts to send the TLS close notify alert.
Here's someone else's rant on how many clients and servers do not implement the TLS close notify alert correctly. (lighttpd does implement TLS close notify alert, though with a timeout.)
http://garrett.damore.org/2017/11/tls-close-notify-what-were-they-thinking.html

@gstrauss
Copy link

BTW, lighttpd supports multiple TLS libraries, so in addition to mod_openssl, you might test with one of mod_gnutls, mod_mbedtls, mod_nss, or mod_wolfssl. Another angle might be to force TLSv1.3 in lighttpd.conf TLS config and see if the problem occurs, or force TLSv1.2 and see if the problem occurs.

@shonjir
Copy link
Contributor Author

shonjir commented Jan 29, 2022

Hopefully a truss might give a clue.

I'll look into this. I was thinking that it's likely the client isn't properly terminating the session so I'm not sure how much new information it'll provide at this point. It looks like the client is using the ssl library from a version of python3.7 included with splunk which may or may not properly implement the connection close.

Also, any firewall config changes between opnsense 21.7.8 and 22.1?

I did make one change but nothing that would affect this. There was an alias that had an incorrect fqdn that I corrected.

lighttpd supports multiple TLS libraries

Is this something I can test with a configuration change? I can change the config easily enough, but rebuilding the package is outside the scope of what I'm able to handle at this point.

Looks like the relevant config is here and has been this way for a year. Looks like there is a "MaxProtocol" option that I can set for force 1.2 only. I'll try to work on that over the weekend.

ssl.openssl.ssl-conf-cmd = (
    "MinProtocol" => "TLSv1.2",
    "Options" => "-ServerPreference",
    "CipherString" => "EECDH+AESGCM:AES256+EECDH:CHACHA20:!SHA1:!SHA256:!SHA384"
)

Other than the OS change there's really nothing I can put my finger on that would result in this error.

@gstrauss
Copy link

Other than the OS change there's really nothing I can put my finger on that would result in this error.

It's great that you identified the source of the issue and can reproduce it. That's big help.

I agree with you that there is no obvious culprit which causes the error trace, which is why I have suggested a few different coarse substitutions to see any substitution can help to narrow down where the issue is. I don't have the system components that you have described, so I can not iterate on this myself. (Also, I do not know if opnsense packages lighttpd with multiple lighttpd TLS modules pre-built or not.)

Yes, "MaxProtocol" => "TLSv1.2" will not permit TLSv1.3, but please remember to remove that after testing. Separately, "MinProtocol" => "TLSv1.3" will force TLSv1.3. Again, that should be for testing only unless you are certain that your environment does not contain any TLSv1.2 clients.

I think that capturing a truss demonstrating the issue is the next step so that we can both review it.

FYI -- and I am not suggesting that you do this unless inspired -- you can build lighttpd on FreeBSD following instructions at lighttpd install from source. You can build and run a test instance of lighttpd locally (without installing it) by shutting down the instance run by opnsense and then temporarily running the test build of lighttpd.

@fichtner
Copy link
Member

Sorry for the weekend silence. Just some data points:

opnsense 21.7.7 hardenedbsd 12.1 lighttpd 1.4.61 openssl 1.1.1l
opnsense 21.7.8 hardenedbsd 12.1 lighttpd 1.4.63 openssl 1.1.1m
opnsense 22.1 freebsd 13-stable lighttpd 1.4.63 openssl 1.1.1m

The build itself didn't change any parameters through FreeBSD ports. From what I would expect the issue is also in 21.7.8 with the same lighttpd/openssl combination but I have no real world setup to verify this at the moment having upgraded everything to 22.1 already.

Cheers,
Franco

@shonjir
Copy link
Contributor Author

shonjir commented Jan 30, 2022

It took me a while but I automated trace collection and obtained three different truss traces for this. I hope it helps.
lighttpd.trace.20220130-000000.txt
lighttpd.trace.20220130-060000.txt
lighttpd.trace.20220130-120000.txt

@kulikov-a
Copy link
Member

I would expect the issue is also in 21.7.8 with the same lighttpd/openssl combination

I have both.
same errors on 22.1
no errors on 21.7.8.
(if it may help)

@gstrauss
Copy link

Looks like the truss showing that openssl is reading EOF from the socket. (The socket is nonblocking and did not get EAGAIN or EWOULDBLOCK.) ...I wonder if the client forcing the socket closed without attempting TLS close notify to gracefully close the TLS connection?

Then when lighttpd asks openssl library to send close notify, sendmsg is returning EINVAL (?!)

23304: 81.903357153 0.000014543 read(11,0x800ae3e03,5) = 0 (0x0)
23304: 81.903468292 0.000012481 sendmsg(11,{NULL,0,[{"\^A\0",2}],1,{{level=IPPROTO_TCP,type=1,data={0x15}}},17,0},0) ERR#22 'Invalid argument'

...I suppose that I could have lighttpd treat EINVAL like lighttpd treats EPIPE and ECONNRESET when attempting to send TLS close notify, but doing so feels wrong. I wonder why EINVAL is being returned from sendmsg?

@kulikov-a would you be able to capture a truss of a successful request on 21.7.8 (which fails on 22.1) so that we can compare?

@gstrauss
Copy link

sendmsg returning EINVAL is suspicious. @shonjir or @kulikov-a, would you try running lighttpd under valgrind? I think trying to rule out memory corruption would be a good next step for a sanity check.

@kulikov-a
Copy link
Member

kulikov-a commented Jan 31, 2022

trussing same action (refresh gui logs page) on 21.7.8 and 22.1
21.7.8_lighttpd_truss.log
22.1_lighttpd_truss.log

@gstrauss
Copy link

@kulikov-a The 22.1 log shows use of sendmsg, whereas the 21.7.8 log does not. To double and triple confirm, are you running the same version of lighttpd and the same version of openssl libraries on both opnsense versions?

@kulikov-a
Copy link
Member

@gstrauss yep. OpenSSL 1.1.1m, 1.4.63
@shonjir do you have acme client plugin installed (uses lighttpd too)?
i dont see this errors after removing acme plugin and reboot (yet?)

@fichtner
Copy link
Member

@gstrauss the difference is 12.1 and 13-stable OS version -- maybe a difference in configure stage? I'm taking a look at source changes in OS now.

@gstrauss
Copy link

gstrauss commented Jan 31, 2022

Is it possible that 13-stable has some features that openssl is trying to use, e.g. ktls?

Edit: I am guessing at this, but if you build openssl with -DOPENSSL_NO_KTLS, then does the error trace for sendmsg() with EINVAL go away?

@shonjir
Copy link
Contributor Author

shonjir commented Jan 31, 2022

@kulikov-a I do not have the acme client installed.

@fichtner
Copy link
Member

@gstrauss errr, yes, there is KTLS now

@fichtner
Copy link
Member

@gstrauss should I submit a KTLS option for the FreeBSD port? Is it recommended to use it this error aside?

@gstrauss
Copy link

gstrauss commented Jan 31, 2022

We do not know for certain if KTLS is involved or not. I was throwing it out as a possibility. lighttpd.conf will allow you to configure openssl with ssl.openssl.ssl-conf-cmd += ("Options" => "-KTLS") if the KTLS patches were backported from openssl 3.0. Want to give that a try?

I found this to suggest that support for KTLS was backported from openssl 3.0 to openssl 1.1.1 in Freebsd 13.
security/openssl: Add support for in-kernel TLS (KTLS)
https://reviews.freebsd.org/D24274

@fichtner
Copy link
Member

I guess that's also why I don't see it with LibreSSL flavour installed... ;) Maybe @kulikov-a can quickly confirm the error is gone when disabling KTLS via configuration.

@fichtner fichtner added cleanup Low impact changes and removed support Community support labels Jan 31, 2022
@fichtner fichtner self-assigned this Jan 31, 2022
@fichtner fichtner added this to the 22.7 milestone Jan 31, 2022
@kulikov-a
Copy link
Member

@fichtner i'll try as soon as i can reproduce errors again. still no errors after reboot )

@kulikov-a
Copy link
Member

hah.
chrome connected with TLS1.3,X25519,AES_128_GCM - no errors
firefox connected with TLS1.3,TLS_AES_128_GCM_SHA256 - errors appers
will try with "-KTLS"

@kulikov-a
Copy link
Member

kulikov-a commented Jan 31, 2022

adding -KTLS to ssl.openssl.ssl-conf-cmd options seems made the trick! no errors with chrome or ff.
still watching..
@shonjir can you confirm?
(kulikov-a@796497a)

@fichtner
Copy link
Member

So while that can make the error disappear I would not want to disable KTLS if it works except for this (cosmetic?) issue and leave it up to @gstrauss to suggest how we proceed.

@kulikov-a
Copy link
Member

kulikov-a commented Jan 31, 2022

sorry, no luck. get errors again after reboot (with -KTLS)

@kulikov-a
Copy link
Member

a little more info: errors still occurs with TLS1.2 forced
will try with test build

@gstrauss
Copy link

So while that can make the error disappear I would not want to disable KTLS if it works except for this (cosmetic?) issue and leave it up to @gstrauss to suggest how we proceed.

As I am neither openssl developer or FreeBSD kernel developer, I must defer. I think an issue should be opened with openssl so that others might find it.

@bsdjhb please forgive the random tag. I see that you have contributed some KTLS support to openssl for FreeBSD. Do you have any suggestions for how to track down this issue? The issue here may or may not be related to KTLS, but is likely somewhere between openssl and the newer FreeBSD kernel version in opnsense 22.1, since the same version of openssl libraries and same version of lighttpd do not exhibit this issue with opnsense 21.7.8.

@kulikov-a
Copy link
Member

a bit more: seems that forcing TLS1.3 with CHACHA20 only not producing errors
ref. https://cgit.freebsd.org/src/commit/?id=6372fd253e32
any chance that "-KTLS" flag can be ignored?

@gstrauss
Copy link

any chance that "-KTLS" flag can be ignored?

I am not sure what you mean. You can configure the opposite in lighttpd with: ssl.openssl.ssl-conf-cmd += ("Options" => "+KTLS"), assuming that openssl was also built with KTLS support.

@kulikov-a
Copy link
Member

@gstrauss sorry, I must be just confused
if I understand opnsense/src@622809b correctly (ktls no longer enabled by default?) and the fact that setting "-KTLS" didn't help, then ktls shouldn't be related to the problem?
but the fact that the restricting the CipherString to only CHACHA20 has kept the log silent for several hours and the fact that ktls support has not been added for this (https://cgit.freebsd.org/src/commit/?id=6372fd253e32) still allows KTLS impact?

@shonjir
Copy link
Contributor Author

shonjir commented Jan 31, 2022

Looking at the ktls(4) manpage I see the following note:

     TLS transmit requires the use of unmapped mbufs.  Unmapped mbufs are not
     enabled by default, but can be enabled by setting the
     kern.ipc.mb_use_ext_pgs sysctl node to 1.

On my opnsense that sysctl is off.

admin@opnsense:~ $ sysctl kern.ipc.mb_use_ext_pgs
kern.ipc.mb_use_ext_pgs: 0

Might this have something to do with the sendmsg(2) issue? If KTLS is enabled by default in openssl on fbsd13 then it makes sense that it needs to be enabled.

I don't know what effect that sysctl might have on the system but I'll give it a try and report if it makes a difference.

@shonjir
Copy link
Contributor Author

shonjir commented Jan 31, 2022

Welp. I enabled the sysctl manually without issue but when I added it via the tunables UI my firewall went offline.

I found some additional info in the freebsd release notes that openSSL includes support for KTLS if the WITH_OPENSSL_KTLS build option is enabled. It looks like the upstream port for 1.1.1m openssl enables KTLS by default and the opnsense package indeed has this option enabled. So it would seem that the openssl default conflicts with the default disabled kernel tunable.

Seems that either the sysctl should be enabled or KTLS for openssl should be disabled by default on fbsd13.

I have the tunable enabled and will see what happens.

@fichtner
Copy link
Member

Well the OpenSSL port enables KTLS by default on FreeBSD 13. However, FreeBSD package builds don't use OpenSSL from ports for package builds. ;)

@shonjir
Copy link
Contributor Author

shonjir commented Jan 31, 2022

I don't know whether lighttpd is using the system library or the openssl package, but the openssl package installed on my firewall definitely has the ktls option compiled in.

Doesn't seem to matter though since the error is still present even with the kernel tunable enabled from boot.

| <27>Jan 31 12:01:22 opnsense.home.arpa lighttpd[83445]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
  | <27>Jan 31 12:01:22 opnsense.home.arpa lighttpd[83445]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
  | <27>Jan 31 12:01:19 opnsense.home.arpa lighttpd[83445]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument
  | <27>Jan 31 12:01:19 opnsense.home.arpa lighttpd[83445]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument 

I don't have a truss for this since the reboot rewrote my crontab and got rid of my trace command... I've added that back and should have a trace in 5 hours or so.

@bsdjhb
Copy link

bsdjhb commented Jan 31, 2022

So the tunable for KTLS should fail much earlier (the setsockopt() to enable KTLS on a socket would fail). sendmsg() shouldn't be used by OpenSSL unless it is using KTLS. The sendmsg() does look to be using the TLS_SET_RECORD_TYPE message type though (1). Does sysctl kern.ipc.tls show any non-zero values?

@shonjir
Copy link
Contributor Author

shonjir commented Jan 31, 2022

From my system. There has been only a single instance of the issue since the system rebooted this morning.

admin@opnsense:~ $ sysctl kern.ipc.tls
kern.ipc.tls.toe.chacha20: 0
kern.ipc.tls.toe.gcm: 0
kern.ipc.tls.toe.cbc: 0
kern.ipc.tls.ifnet.permitted: 0
kern.ipc.tls.ifnet.reset_failed: 0
kern.ipc.tls.ifnet.reset_dropped: 0
kern.ipc.tls.ifnet.reset: 0
kern.ipc.tls.ifnet.chacha20: 0
kern.ipc.tls.ifnet.gcm: 0
kern.ipc.tls.ifnet.cbc: 0
kern.ipc.tls.sw.chacha20: 0
kern.ipc.tls.sw.gcm: 0
kern.ipc.tls.sw.cbc: 0
kern.ipc.tls.tasks_active: 0
kern.ipc.tls.cbc_enable: 1
kern.ipc.tls.enable: 0
kern.ipc.tls.maxlen: 16384
kern.ipc.tls.bind_threads: 1
kern.ipc.tls.allow_unload: 0
kern.ipc.tls.stats.switch_failed: 0
kern.ipc.tls.stats.switch_to_sw: 0
kern.ipc.tls.stats.switch_to_ifnet: 0
kern.ipc.tls.stats.failed_crypto: 0
kern.ipc.tls.stats.corrupted_records: 0
kern.ipc.tls.stats.active: 0
kern.ipc.tls.stats.enable_calls: 0
kern.ipc.tls.stats.offload_total: 0
kern.ipc.tls.stats.sw_rx_inqueue: 0
kern.ipc.tls.stats.sw_tx_inqueue: 0
kern.ipc.tls.stats.sw_tx_pending: 0
kern.ipc.tls.stats.threads: 4

@bsdjhb
Copy link

bsdjhb commented Feb 1, 2022

So that is consistent with the OpenSSL library not trying to enable KTLS, but BIO_get_ktls_send() still returning true.

Hmm, I think I might see a conflict between the internal BIO_FLAGS_IN_EOF (which is not present in OpenSSL 3) and BIO_FLAGS_KTLS_TX in the 1.1.1 backport.

@bsdjhb
Copy link

bsdjhb commented Feb 1, 2022

Ah, this was fixed in OpenSSL 3 itself as a separate commit I just need to backport: openssl/openssl@e278f18

@bsdjhb
Copy link

bsdjhb commented Feb 1, 2022

Building without KTLS (WITHOUT_OPENSSL_KTLS or disabling the port option) is a workaround.

https://reviews.freebsd.org/D34135 - review to fix base system OpenSSL
https://reviews.freebsd.org/D34136 - review to fix security/openssl

The colliding flag was merged to 1.1.1 in 1.1.1e.

@fichtner fichtner added upstream Third party issue and removed cleanup Low impact changes labels Feb 1, 2022
@fichtner
Copy link
Member

fichtner commented Feb 1, 2022

@bsdjhb @gstrauss et al thanks a lot. We will merge these patches into the next release as well and that should be it 😊

@fichtner fichtner closed this as completed Feb 1, 2022
@kulikov-a
Copy link
Member

@gstrauss @bsdjhb thanks a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

5 participants