New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post 22.1 upgrade lighttpd periodically issues SSL errors #5516
Comments
Cc @gstrauss |
What was the previous version of lighttpd? What is the current version of lighttpd in (opnsense 22.1)? (Looks like lighttpd 1.4.63 in the log above). What was the previous version of openssl? Note: There were no direct changes to lighttpd src/mod_openssl.c between lighttpd 1.4.60 and lighttpd 1.4.63.
How is lighttpd being reloaded? Gracefully (SIGUSR1)? Or not gracefully (SIGINT, SIGTERM)? What signal is being sent to lighttpd? If lighttpd is not being stopped gracefully, I am not surprised that openssl might complain. |
@fichtner, my apologies - I failed to specify the pre-upgrade condition. I upgraded from opnsense 21.7.8 to 22.1 by booting from USB stick. During the upgrade I converted the filesystem from UFS to ZFS but I don't know if that is relevant. |
@shonjir if you'd like my assistance, please try to answer my questions. Thank you. |
@gstrauss I was trying to do so, my apologies to you if I've offended somehow. I did not restart or reload lighttpd. The timestamps of the SSL errors are 00:01:19, 00:01:21 and 06:01:19 and 06:01:22, six hours apart. Re-review of the logs just now shows an additional 4 events at 12:01:20 and 12:01:22, so whatever is happening is happening every 6 hours. That's strongly suggestive of a cron job or other periodic task. I had to look through my splunk logs for the version info you requested. Up until the upgrade to 21.7.8 at 01-27 @ 10am I was I was running lighttpd 1.4.61 and openssl 1.1.1l_1,1. Opnsense 21.7.8 installed lighttpd 1.4.63 and openssl 1.1.1m_1,1 which didn't change for opnsense 22.1 at around 01-27 @ 8:30pm. Looking through my splunk logs I think I found the trigger for the SSL errors. I'm running the opnsense app and TA in splunk which collects system information via the web API every 6 hours, and the timestamps for this collection match the SSL error almost exactly. I went back a month in the logs and there were no instances of the SSL error message before upgrading to 22.1. I was running 21.7.8 for about 10 hours before this but there were no similar SSL errors on 21.7.8 despite running the same versions of lighttpd and openssl. Relevant logs from Splunk.
|
Thank you for providing additional information. "I upgraded from opnsense 21.7.8 to 22.1" was not sufficient, which is why I asked for more information.
Since that is more than 6 hours, was the "TA in splunk which collects system information via the web AI every 6 hours" running while you were on 21.7.8? If so, and since there was not an error in the lighttpd error log, perhaps the TA in splunk is using a client or a library which changed between 21.7.8 and 22.1. Since the openssl library version did not change in your upgrade to 22.1, and, as I noted above, since there were no direct changes to lighttpd src/mod_openssl.c between lighttpd 1.4.60 and lighttpd 1.4.63, this is probably not a new issue with lighttpd, if it even is an issue with lighttpd. (FYI: You could confirm this if you were to temporarily run an older version of lighttpd on your new system, to see if the error trace occurs.) Most importantly, you have identified that there is one specific client which is triggering the trace in lighttpd, so I would suggest looking more closely at that job. |
Correct, the splunk process was running while opnsense was on 21.7.8. The following logs are from that period of time.
No, the TA did not change during this time. It was last updated on 12/17/21 and has been running without error since then.
That makes sense. Another possibility is that it may not be an issue with lighttpd (or openssl) at all but something with the way lighttpd (or openssl) operates (or builds) on fbsd13.
I'm more than willing to try! However, going back a month I can't find any trace of this error in the logs until fbsd13 / opnsense 22.1 was introduced.
That certainly makes sense. The TA doesn't do anything special other than initiating a bog-standard TLS connection to opnsense. The only thing that comes to mind is that on my opnsense installation the web UI runs with a locally generated web server certificate issued by a locally generated CA rather than the default self-signed certificate. I tried to replicate the issue with curl but even authenticating to the API using the API key I'm not able to reproduce what the TA is triggering using either of the endpoints the TA queries. curl -v --cacert cacert.crt -u "$key":"$secret" -XGET https://$opnsense/api/core/firmware/status Based on the error message This message is emitted when ERR_get_error() returns 0 (no error found) yet errno is nonzero, and errno is neither EPIPE nor ECONNRESET. The code contains a comment that refers to an ssl bug and refers to lighttpd ticket #2213 which claims the issue was fixed in 1.4.27 but clearly it wasn't fixed completely. The ticket seems to suggest they believe this to be an upstream openssl bug, that in this case may be triggered by some change in semantics between fbsd 12 and 13. Unfortunately lighttpd doesn't return the value of errno which might help track this down further. Correction: perror() is translating errno into Invalid Argument, so whatever bug is being triggered is probably the result of new or improved input validation on the affected syscall in fbsd13. |
"Invalid argument" in the error message is the value from You might
Hopefully a Then again, it is possible that there is something that can be improved in how lighttpd attempts to send the TLS close notify alert. |
BTW, lighttpd supports multiple TLS libraries, so in addition to mod_openssl, you might test with one of mod_gnutls, mod_mbedtls, mod_nss, or mod_wolfssl. Another angle might be to force TLSv1.3 in lighttpd.conf TLS config and see if the problem occurs, or force TLSv1.2 and see if the problem occurs. |
I'll look into this. I was thinking that it's likely the client isn't properly terminating the session so I'm not sure how much new information it'll provide at this point. It looks like the client is using the ssl library from a version of python3.7 included with splunk which may or may not properly implement the connection close.
I did make one change but nothing that would affect this. There was an alias that had an incorrect fqdn that I corrected.
Is this something I can test with a configuration change? I can change the config easily enough, but rebuilding the package is outside the scope of what I'm able to handle at this point. Looks like the relevant config is here and has been this way for a year. Looks like there is a "MaxProtocol" option that I can set for force 1.2 only. I'll try to work on that over the weekend.
Other than the OS change there's really nothing I can put my finger on that would result in this error. |
It's great that you identified the source of the issue and can reproduce it. That's big help. I agree with you that there is no obvious culprit which causes the error trace, which is why I have suggested a few different coarse substitutions to see any substitution can help to narrow down where the issue is. I don't have the system components that you have described, so I can not iterate on this myself. (Also, I do not know if opnsense packages lighttpd with multiple lighttpd TLS modules pre-built or not.) Yes, I think that capturing a FYI -- and I am not suggesting that you do this unless inspired -- you can build lighttpd on FreeBSD following instructions at lighttpd install from source. You can build and run a test instance of lighttpd locally (without installing it) by shutting down the instance run by opnsense and then temporarily running the test build of lighttpd. |
Sorry for the weekend silence. Just some data points: opnsense 21.7.7 hardenedbsd 12.1 lighttpd 1.4.61 openssl 1.1.1l The build itself didn't change any parameters through FreeBSD ports. From what I would expect the issue is also in 21.7.8 with the same lighttpd/openssl combination but I have no real world setup to verify this at the moment having upgraded everything to 22.1 already. Cheers, |
It took me a while but I automated trace collection and obtained three different truss traces for this. I hope it helps. |
I have both. |
Looks like the truss showing that openssl is reading EOF from the socket. (The socket is nonblocking and did not get EAGAIN or EWOULDBLOCK.) ...I wonder if the client forcing the socket closed without attempting TLS close notify to gracefully close the TLS connection? Then when lighttpd asks openssl library to send close notify,
...I suppose that I could have lighttpd treat EINVAL like lighttpd treats EPIPE and ECONNRESET when attempting to send TLS close notify, but doing so feels wrong. I wonder why EINVAL is being returned from @kulikov-a would you be able to capture a |
|
|
@kulikov-a The 22.1 log shows use of |
@gstrauss the difference is 12.1 and 13-stable OS version -- maybe a difference in configure stage? I'm taking a look at source changes in OS now. |
Is it possible that 13-stable has some features that openssl is trying to use, e.g. ktls? Edit: I am guessing at this, but if you build openssl with |
@kulikov-a I do not have the acme client installed. |
@gstrauss errr, yes, there is KTLS now |
@gstrauss should I submit a KTLS option for the FreeBSD port? Is it recommended to use it this error aside? |
We do not know for certain if KTLS is involved or not. I was throwing it out as a possibility. lighttpd.conf will allow you to configure openssl with I found this to suggest that support for KTLS was backported from openssl 3.0 to openssl 1.1.1 in Freebsd 13. |
I guess that's also why I don't see it with LibreSSL flavour installed... ;) Maybe @kulikov-a can quickly confirm the error is gone when disabling KTLS via configuration. |
@fichtner i'll try as soon as i can reproduce errors again. still no errors after reboot ) |
hah. |
adding |
So while that can make the error disappear I would not want to disable KTLS if it works except for this (cosmetic?) issue and leave it up to @gstrauss to suggest how we proceed. |
sorry, no luck. get errors again after reboot (with |
a little more info: errors still occurs with TLS1.2 forced |
As I am neither openssl developer or FreeBSD kernel developer, I must defer. I think an issue should be opened with openssl so that others might find it. @bsdjhb please forgive the random tag. I see that you have contributed some KTLS support to openssl for FreeBSD. Do you have any suggestions for how to track down this issue? The issue here may or may not be related to KTLS, but is likely somewhere between openssl and the newer FreeBSD kernel version in opnsense 22.1, since the same version of openssl libraries and same version of lighttpd do not exhibit this issue with opnsense 21.7.8. |
a bit more: seems that forcing TLS1.3 with CHACHA20 only not producing errors |
I am not sure what you mean. You can configure the opposite in lighttpd with: |
@gstrauss sorry, I must be just confused |
Looking at the ktls(4) manpage I see the following note:
On my opnsense that sysctl is off.
Might this have something to do with the sendmsg(2) issue? If KTLS is enabled by default in openssl on fbsd13 then it makes sense that it needs to be enabled. I don't know what effect that sysctl might have on the system but I'll give it a try and report if it makes a difference. |
Welp. I enabled the sysctl manually without issue but when I added it via the tunables UI my firewall went offline. I found some additional info in the freebsd release notes that openSSL includes support for KTLS if the WITH_OPENSSL_KTLS build option is enabled. It looks like the upstream port for 1.1.1m openssl enables KTLS by default and the opnsense package indeed has this option enabled. So it would seem that the openssl default conflicts with the default disabled kernel tunable. Seems that either the sysctl should be enabled or KTLS for openssl should be disabled by default on fbsd13. I have the tunable enabled and will see what happens. |
Well the OpenSSL port enables KTLS by default on FreeBSD 13. However, FreeBSD package builds don't use OpenSSL from ports for package builds. ;) |
I don't know whether lighttpd is using the system library or the openssl package, but the openssl package installed on my firewall definitely has the ktls option compiled in. Doesn't seem to matter though since the error is still present even with the kernel tunable enabled from boot. | <27>Jan 31 12:01:22 opnsense.home.arpa lighttpd[83445]: (mod_openssl.c.3415) SSL (error): 5 -1: Invalid argument I don't have a truss for this since the reboot rewrote my crontab and got rid of my trace command... I've added that back and should have a trace in 5 hours or so. |
So the tunable for KTLS should fail much earlier (the setsockopt() to enable KTLS on a socket would fail). sendmsg() shouldn't be used by OpenSSL unless it is using KTLS. The sendmsg() does look to be using the TLS_SET_RECORD_TYPE message type though (1). Does |
From my system. There has been only a single instance of the issue since the system rebooted this morning.
|
So that is consistent with the OpenSSL library not trying to enable KTLS, but BIO_get_ktls_send() still returning true. Hmm, I think I might see a conflict between the internal BIO_FLAGS_IN_EOF (which is not present in OpenSSL 3) and BIO_FLAGS_KTLS_TX in the 1.1.1 backport. |
Ah, this was fixed in OpenSSL 3 itself as a separate commit I just need to backport: openssl/openssl@e278f18 |
Building without KTLS (WITHOUT_OPENSSL_KTLS or disabling the port option) is a workaround. https://reviews.freebsd.org/D34135 - review to fix base system OpenSSL The colliding flag was merged to 1.1.1 in 1.1.1e. |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
Following update to 22.1 lighttpd periodically reports SSL invalid argument errors. Based on timing it seems this happens when lighttpd reloads. See screenshots for details.
To Reproduce
Nothing to do to reproduce.
Expected behavior
A clear and concise description of what you expected to happen.
Describe alternatives you considered
A clear and concise description of any alternative solutions or workaround you considered.
Screenshots
Relevant log files
If applicable, information from log files supporting your claim.
Additional context
Add any other context about the problem here.
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 22.1-amd64
FreeBSD 13.0-STABLE
OpenSSL 1.1.1m 14 Dec 2021
The text was updated successfully, but these errors were encountered: