Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads stuck at READ-INITIAL-REQUEST-LINE #189

Open
casouri opened this issue Feb 27, 2021 · 34 comments
Open

Threads stuck at READ-INITIAL-REQUEST-LINE #189

casouri opened this issue Feb 27, 2021 · 34 comments

Comments

@casouri
Copy link

casouri commented Feb 27, 2021

I noticed that many threads hang around for a very long time and eventually there are too many of them that the server can't accept new requests. All of them hang in HUNCHENTOOT::READ-INITIAL-REQUEST-LINE. The read and write timeout for the acceptor is the default 20. I couldn't figure out what's going on, the functions in the call chain don't seem to have timeout parameters.

Here is the stackframe of one of the threads:

 0: ("bogus stack frame")
 1: (SB-SYS:WAIT-UNTIL-FD-USABLE 11 :INPUT NIL T)
 2: ((:METHOD STREAM-READ-BYTE (CL+SSL::SSL-STREAM)) #<unavailable argument>) [fast-method]
 3: (READ-BYTE #<CL+SSL::SSL-SERVER-STREAM for #<FD-STREAM for "socket 23.239.22.57:4386, peer: 185.26.53.196:44126" {100421C383}>> NIL NIL)
 4: (READ-CHAR* #<CL+SSL::SSL-SERVER-STREAM for #<FD-STREAM for "socket 23.239.22.57:4386, peer: 185.26.53.196:44126" {100421C383}>> NIL NIL)
 5: (READ-LINE* #<CL+SSL::SSL-SERVER-STREAM for #<FD-STREAM for "socket 23.239.22.57:4386, peer: 185.26.53.196:44126" {100421C383}>> NIL)
 6: (HUNCHENTOOT::READ-INITIAL-REQUEST-LINE #<CL+SSL::SSL-SERVER-STREAM for #<FD-STREAM for "socket 23.239.22.57:4386, peer: 185.26.53.196:44126" {100421C383}>>)
 7: (HUNCHENTOOT::GET-REQUEST-DATA #<CL+SSL::SSL-SERVER-STREAM for #<FD-STREAM for "socket 23.239.22.57:4386, peer: 185.26.53.196:44126" {100421C383}>>)
 8: ((:METHOD HUNCHENTOOT:PROCESS-CONNECTION (HUNCHENTOOT:ACCEPTOR T)) #<BLOG-SERVER::SSL-SERVER (host *, port 4386)> #<USOCKET:STREAM-USOCKET {100421C4C3}>) [fast-method]
 9: ((:METHOD HUNCHENTOOT:PROCESS-CONNECTION :AROUND (HUNCHENTOOT:ACCEPTOR T)) #<BLOG-SERVER::SSL-SERVER (host *, port 4386)> #<USOCKET:STREAM-USOCKET {100421C4C3}>) [fast-method]
10: ((FLET HUNCHENTOOT::PROCESS-CONNECTION% :IN HUNCHENTOOT::HANDLE-INCOMING-CONNECTION%) #<BLOG-SERVER::SSL-SERVER (host *, port 4386)> #<USOCKET:STREAM-USOCKET {100421C4C3}>)
11: ((LAMBDA NIL :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
12: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
13: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
14: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
15: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
16: (SB-THREAD::RUN)
17: ("foreign function: call_into_lisp")
18: ("foreign function: funcall1")
@casouri casouri changed the title Threads stuck at Threads stuck at HUNCHENTOOT::READ-INITIAL-REQUEST-LINE Feb 27, 2021
@casouri casouri changed the title Threads stuck at HUNCHENTOOT::READ-INITIAL-REQUEST-LINE Threads stuck at READ-INITIAL-REQUEST-LINE Feb 27, 2021
@stassats
Copy link
Member

Don't think there's enough information to diagnose.

@casouri
Copy link
Author

casouri commented Feb 27, 2021

Here's some more info I found out: notice the timeout variable is nil.

 1: (SB-SYS:WAIT-UNTIL-FD-USABLE 12 :INPUT NIL T)
      Locals:
        SB-IMPL::DIRECTION = :INPUT
        SB-IMPL::FD = 12
        SB-IMPL::SERVE-EVENTS = T
        SB-IMPL::SIGNALP = NIL
        SB-IMPL::STOP-SEC = NIL
        SB-IMPL::STOP-USEC = NIL
        TIMEOUT = NIL
        SB-IMPL::TO-SEC = NIL
        SB-IMPL::TO-USEC = NIL

So STREAM-READ-BYTE eventually calls this:

(defun input-wait (stream fd deadline)
  (declare (ignore stream))
  (let ((timeout
   ;; *deadline* is handled by wait-until-fd-usable automatically,
   ;; but we need to turn a user-specified deadline into a timeout
   (when deadline
     (/ (- deadline (get-internal-real-time))
        internal-time-units-per-second))))
    (sb-sys:wait-until-fd-usable fd :input timeout)))

@casouri
Copy link
Author

casouri commented Feb 27, 2021

Ok, the stream object's deadline attribute is nil. That's why there is no timeout.

@stassats
Copy link
Member

Does your acceptor have a timeout?

@casouri
Copy link
Author

casouri commented Feb 27, 2021

Yeah, both read and write timeout are 20.

@stassats
Copy link
Member

Probably doesn't work with ssl.

@casouri
Copy link
Author

casouri commented Feb 27, 2021

Ok, so the problem is in cl+ssl:

(defun make-ssl-server-stream
    (socket &key certificate key password method external-format
                 close-callback (unwrap-stream-p t)
                 (cipher-list *default-cipher-list*))
  "Returns an SSL stream for the server socket descriptor SOCKET.
CERTIFICATE is the path to a file containing the PEM-encoded certificate for
 your server. KEY is the path to the PEM-encoded key for the server, which
may be associated with the passphrase PASSWORD."
  (ensure-initialized :method method)
  (let ((stream (make-instance 'ssl-server-stream
                               :socket socket
                               :close-callback close-callback
                               :certificate certificate
                               :key key)))
    (with-new-ssl (handle)
      (setf socket (install-handle-and-bio stream handle socket unwrap-stream-p))
      (ssl-set-accept-state handle)
      (when (zerop (ssl-set-cipher-list handle cipher-list))
        (error 'ssl-error-initialize :reason "Can't set SSL cipher list"))
      (with-pem-password (password)
        (install-key-and-cert handle key certificate))
      (ensure-ssl-funcall stream handle #'ssl-accept handle)
      (handle-external-format stream external-format))))

When they create the steam from the socket they probably should determine the deadline attribute from the socket's timeout.

@casouri casouri closed this as completed Feb 27, 2021
@casouri
Copy link
Author

casouri commented Feb 27, 2021

Thanks!

@stassats
Copy link
Member

Well, hunchentoot sets timeouts afterwards, with set-timeouts, so maybe it's both.

@casouri
Copy link
Author

casouri commented Feb 27, 2021

I don't know much about how all these works, but I think it might be a different timeout.
The code path is cl+ssl:stream-read-byte -> cl+ssl:ensure-ssl-funcall -> input-wait -> sb-sys:wait-until-fd-usable.

@avodonosov
Copy link
Contributor

avodonosov commented Jun 24, 2021

@casouri , the make-ssl-server-stream is not part of the stacktrace you provided.
So in this scenario it's irrelevant (maybe in another scenario it is relevant).

According to @stassats , hunchentoot set's timeout of ssl socket after it is created.
But in your stacktrace we see the timeout is absent on the :SSL-SERVER-STREAM object and the SB-SYS:WAIT-UNTIL-FD-USABLE is called without timeout.

@avodonosov
Copy link
Contributor

avodonosov commented Jun 24, 2021

After brief study of hunchentoot code, it seems hunchentoot does not set cl+ssl::ssl-stream-deadline on the cl+ssl:ssl-server-stream object.

It simply sets the timeouts on the underlying socket.

And later sb-sys:wait-until-fd-usable ignores that timeout, when called with the 3rd argument NIL. cl+ssl passes NIL because the deadline slot is NIL for cl+ssl:ssl-server-stream object.

@casouri
Copy link
Author

casouri commented Jun 26, 2021

@casouri , the make-ssl-server-stream is not part of the stacktrace you provided.
So in this scenario it's irrelevant (maybe in another scenario it is relevant).

Though make-ssl-server-stream creates the stream object, and the deadline slot being NIL in that object caused the problem.

After brief study of hunchentoot code, it seems hunchentoot does not set cl+ssl::ssl-stream-deadline on the cl+ssl:ssl-server-stream object.

Yes, because the stream is created by make-ssl-server-stream, and the socket passed to it has a timeout slot, I thought cl+ssl could set the deadline automatically. Maybe hunchentoot should set a deadline for the cl+ssl-server-steam like it did for the socket. @stassats WDYT?

@casouri casouri reopened this Jun 26, 2021
@stassats
Copy link
Member

Do you have enough to make a pull request?

@casouri
Copy link
Author

casouri commented Jul 3, 2021

Yes, more or less. I have patched it up like this:

(defmethod initialize-connection-stream :around
    ((acceptor ssl-acceptor) stream)
  (let ((ssl-stream (call-next-method acceptor stream)))
    (setf (cl+ssl::ssl-stream-deadline ssl-stream)
          (+ (get-internal-real-time)
             (* internal-time-units-per-second 20)))
    ssl-stream))

It sets :deadline according to :timeout. Maybe we can modify initialize-connection-stream as such?

@casouri
Copy link
Author

casouri commented Jul 3, 2021

Actually that's only a patch, it doesn't use the :timeout of the socket. I'll look into it further.

@casouri
Copy link
Author

casouri commented Jul 3, 2021

Ok, here:

(defmethod initialize-connection-stream ((acceptor ssl-acceptor) stream)
  ;; attach SSL to the stream if necessary
  (let ((ssl-stream (cl+ssl:make-ssl-server-stream stream
                                                   :certificate (acceptor-ssl-certificate-file acceptor)
                                                   :key (acceptor-ssl-privatekey-file acceptor)
                                                   :password (acceptor-ssl-privatekey-password acceptor))))
    (setf (cl+ssl::ssl-stream-deadline ssl-stream)
          (+ (get-internal-real-time)
             (* internal-time-units-per-second (acceptor-read-timeout ssl-acceptor))))
    (call-next-method acceptor ssl-stream)))

I don't know about the internals and overall structure of hunchentoot, so I'm not sure if it's the most appropriate place to set that :deadline.

avodonosov added a commit to avodonosov/hunchentoot that referenced this issue Jul 10, 2021
@avodonosov
Copy link
Contributor

@casouri , setting fixed deadline for connection stream when initializing it will lead to an error signaled when handling one of the next requests on this connection - not what we want.

@avodonosov
Copy link
Contributor

avodonosov commented Jul 10, 2021

@stassats, @casouri, I propose the following approach (draft for now): avodonosov@3a1297e

It's based on these two comments: cl-plus-ssl/cl-plus-ssl#69 (comment) and the reply
cl-plus-ssl/cl-plus-ssl#69 (comment)

CC @hanshuebner

avodonosov added a commit to avodonosov/hunchentoot that referenced this issue Jul 10, 2021
avodonosov added a commit to avodonosov/hunchentoot that referenced this issue Jul 10, 2021
@casouri
Copy link
Author

casouri commented Jul 10, 2021

Thanks for taking the time!

@avodonosov
Copy link
Contributor

avodonosov commented Jul 14, 2021

I am exploring this approach, calling close on the socket does not work, and incorrect if a system call is in progress, as it turns out.

The shutdown call is the proper way to close the socket and interrupt the waiting. I am testing with CCL, which uses poll for ccl::process-input-wait that is called by cl+ssl::input-wait

Several links on how and why close does not work:
https://stackoverflow.com/questions/49613058/tcp-socket-stays-in-established-state-after-close-by-server
https://stackoverflow.com/questions/14293254/tcp-fin-not-sent-on-doing-close-for-a-multi-threaded-tcp-client
https://stackoverflow.com/questions/10561602/closing-a-file-descriptor-that-is-being-polled
https://stackoverflow.com/questions/5039608/poll-cant-detect-event-when-socket-is-closed-locally
https://stackoverflow.com/questions/14212584/socket-descriptor-not-getting-released-on-doing-close-for-a-multi-threaded

Despite some of the comments in these links, the shutdown interrupts the waiting even if the other side does not send FIN packed, I tested that. So the solution does not depend on the client behavior.

@fjl
Copy link
Contributor

fjl commented Jul 29, 2021

Looking at your patch @avodonosov, it seems like a pretty ugly solution. Wouldn't it be possible to patch hunchentoot so it sets the cl+ssl deadline whenever it sets the socket timeout?

@avodonosov
Copy link
Contributor

avodonosov commented Jul 29, 2021

@fjl, I propose that approach because it does not depend on lisp implementation behavior in several layers of lisp libraries, unlike the socket timeouts / cl+ssl deadlines. Why do you think it's ugly?

Speaking of the deadlines, first, they are not really supported by cl+ssl. Also note, deadline and timeout have different meaning, so deadline can not be set at the same place where timeout is set. Deadline is an absolute time, it would need to be set for every request, maybe at several stages of request processing.

Essentially, the function I introduced in the draft is a deadline setting (I even considered naming it correspondingly) - after certain absolute time the IO is aborted (implemented as socket shutdown). It is used for 3 stages: stream setup (ssl handshake), request reading, request processing with response writing.

@fjl
Copy link
Contributor

fjl commented Jul 30, 2021

Why do you think it's ugly?

Hope you're not offended by the term. I just said ugly because the solution in your draft didn't match my intution for the problem.

Having looked into it more now, I can definitely see where you're coming from with this, but there are still a couple issues to consider before settling on your approach. Specifically, I'm thinking of these two things:

(1) In HTTP servers, generally, there are three distinct timeouts to consider (you also have them in your draft!): the client must provide the headers / request line within a short time after opening the connection. A fixed timeout is OK for this, and a deadline would also work (it can just be current time + fixed timeout). Once the headers are in, things get more complicated because the size of the data that will follow is not of fixed size. Two solutions possible here: put the application in charge of the timeout, or use some kind of 'idle timeout' that ensures at least some amount of data will flow within the given time. Either of these is necessary to prevent the slowloris attack. A fixed timeout (or deadline) is not great because it will not work with large file uploads, for example. Finally, there should be timeout on delivering the response data back to the client, because the client could otherwise tie up the connection. Here, a fixed timeout / deadline is also inappropriate because the response may be very large.

(2) Now, hunchentoots timeout handling isn't all that great. It does set timeouts, but if / how these are handled is kind of up to the lisp implementation. Also, since the client may get a wrapped stream from functions like RAW-POST-DATA, there is really no way to set the timeout on the application side. This is why I think the 'idle timeout' approach would work best for hunchentoot. It prevents the attack with a stuck socket, while also allowing the app not to care about timeouts at all while reading request body text.

IMHO this is a problem that should be solved by the usocket API, somehow. Ideally, usocket would provide methods on the stream returned by SOCKET-STREAM to set the read / write idle timeouts that apply to all future calls of READ-BYTE and READ-SEQUENCE on the stream. Working around the fact that there is no such API, hunchentoot has set-timeouts.lisp.

With CL+SSL, it gets even more complicated because it operates on a stream, not a socket. So not only is there no defined way for it to set the timeouts on the underlying socket, there is also no obvious way for library users to set the timeout on the SSL stream (except by calling custom CL+SSL methods). This, too, could be fixed by adding a stream timeout API to usocket. If such API existed, CL+SSL could: have a stream class specific to usocket, implement those timeout methods for usocket-based SSL streams (just pass them through), and thus allow callers to set the timeout generically for both SSL and non-SSL streams.

Mostly, what I'm getting at here is: hunchentoot tries to paper over a lot of library issues already, and while creating another hack to make it work will probably fix the issues for hunchentoot, setting timeouts will still be a nightmare for any other application. We don't even need to look very far beyond hunchentoot for this, just think of hunchensocket. It also needs timeouts, and they wouldn't work over SSL with the hack. Would be much nicer to find a solution that improves things for all applications and then use that to improve / simplify hunchentoot.

@casouri
Copy link
Author

casouri commented Jul 30, 2021

You should raise that over cl+ssl/usocket, there is nothing hunchentoot can do. In the mean time, hunchentoot need to fix this problem to be useable, regardless how "ugly" you think it is.

@fjl
Copy link
Contributor

fjl commented Jul 30, 2021

If you need a fix short-term, you should really consider running hunchentoot behind a proxy. I personally use caddy as the SSL frontend for hunchentoot. You can also set timeouts on the proxy and offload static file serving to it.

@avodonosov
Copy link
Contributor

avodonosov commented Oct 23, 2021

A little update.

The cl+ssl fix for cl-plus-ssl/cl-plus-ssl#133 (already in Quicklisp) restores Lisp BIO functionality, which means instead of giving OpenSSL a file descriptor of the lisp stream, we can wrap the lisp stream itself, thus inheriting all the timeout semantics. So the hunchentoot threads on inactive connections will be unblocked by IO timeout in the read-initial-request-line or any other hunchentoot function.

To utilize this functionality either globally (setq cl+ssl:*default-unwrap-stream-p* nil) or override the hunchentoot:initialize-connection-stream for your acceptor, and inside of this method add unwrap-stream-p nilparameter to the call to cl+ssl:make-ssl-server-stream:

(cl+ssl:make-ssl-server-stream stream

Maybe in the future timeout support can be added to cl+ssl in the default mode were we pass the file descriptor to OpenSSL. But I am not sure of interface for this. Maybe it's better to drop the current timeouts / deadlines support, which tries to signal a lisp implementation specific condition, and introduce an explicit timeout parameter and signal a new type of error condition.

Also I still think it may be good to have a kind of health-check for hunchentoot that detects stuck threads and logs them, or optionally tries to terminate them by shutting down the connection socket. It will be useful in case the timeout functionality degrades in the future, or for slow request attack, where 1 byte is sent every 15 seconds, for example, so that timeout does not happen, and the tread remains occupied by this connection.

I have a branch where such a stuck thread monitor is implemented. It logs stuck threads, and optionally can shutdown their sockets automatically in order to unblock them. https://github.com/edicl/hunchentoot/compare/master...avodonosov:ssl-thread-leak?expand=1

@stassats and other maintainers, do you think that's a useful feature for hunchentoot?

The implementation in my branch tries to touch hucnhentoot core as little as possible - only introduces a generic function track-progress that is invoked from appropriate places of process-connection. And a mixin class that acceptors can derive from, that
provides a method for the track-progress function and implements the monitoring described above.

The mixin is implemented in a separate package hunchentoot-stuck-connection-monitor, so in the source code it is easy to see what hunchentoot functionality it depends on by searching for the hunchentoot: package mentions in the hunchentoot-stuck-connection-monitor.lisp.

@stassats
Copy link
Member

I think I'd prefer working timeouts.

@avodonosov
Copy link
Contributor

avodonosov commented Nov 28, 2022

It would be good if hunchentoot at least exposed atrack-progress hook, as suggested in my branch:
https://github.com/edicl/hunchentoot/compare/master...avodonosov:ssl-thread-leak?expand=1#diff-93bc171f0cfbcb7ee4b05aef82f624cfab8be66630383c1d814f4d13557943bfR513
That would simplify creating 3rd party modules for health-check and monitoring.

I am packaging my solution as a separate module anyways: https://github.com/avodonosov/hunchentoot-stuck-connection-monitor

But it took copy-pasting the whole hunchentoot:process-connection method: https://github.com/avodonosov/hunchentoot-stuck-connection-monitor/blob/main/acceptor-hook.lisp#L45

@frodef
Copy link

frodef commented Mar 13, 2023

I'm bitten by this problem, and it's the only thing that occasionally stops service from my otherwise solid hunchetoot server. I might have a look, but maybe somebody with hunchentoot/cl+ssl knowledge could offer some updated insight into this?

@fjl
Copy link
Contributor

fjl commented Mar 14, 2023

This patch could help: cl-plus-ssl/cl-plus-ssl#157

@avodonosov
Copy link
Contributor

avodonosov commented Mar 15, 2023

@frodef

I might have a look

Look where? You mean investigate the issue?

This thead provides several solutions already.

@frodef
Copy link

frodef commented Mar 18, 2023

Look where? You mean investigate the issue?

Yes.

This thead provides several solutions already.

I don't quite understand. Are you saying this is not a flaw in Hunchentoot?

@avodonosov
Copy link
Contributor

avodonosov commented Mar 18, 2023

@frodef, it is a defect (in hunchentoot, or in cl+ssl, or in CL, or in the universe...)

This thread mentions 4 solutions, from very easy (one two lines) to little less easy. Do you mean none of them work for you? Or it's difficult to find them in the discussion?
I am afraid, if I ttry to summarize them, it will be another wordy comment, making the thread even longer and more difficult to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants