Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequel fork safety #691

Closed
eregon opened this issue Jul 30, 2013 · 27 comments
Closed

Sequel fork safety #691

eregon opened this issue Jul 30, 2013 · 27 comments

Comments

@eregon
Copy link
Contributor

eregon commented Jul 30, 2013

Hello,

I experienced some trouble due to the connection socket being shared between two processes (caused by fork and "smart spawning" of Phusion Passenger).

I read a couple issues/threads on the list about it and looked the code of the default connection pool, ThreadedConnectionPool.

The commonly proposed solution to this problem is to disconnect just before/after fork with a server-dependent hook. I find it unsatisfying because it is brittle code depending on the environment and it would seem logical if Sequel was also fork-safe as it is thread-safe.

Concretely, ThreadedConnectionPool use Thread.current as the key for connections. Unfortunately, in MRI Thread.current is identical for the main thread across fork:

> fork { p Thread.current }
=> #<Thread:0x000001008d8e88 run>
> Thread.current
=> #<Thread:0x000001008d8e88 run>

While they are of course very different from the OS (two different native threads). This creates the sharing mentioned above.

I see possible two solutions:

  • use the tuple [pid, thread id] instead of just thread id as key in @allocated and ensure there is no connection leaking after fork.
  • detect fork by change of PID (since there seems to be no hook in standard Ruby before/after fork and decorating it might not be so nice). This could be done in #hold and do the disconnect/reconnect needed once.

What do you think?
I would really appreciate Sequel to be fork-safe by default.

@jeremyevans
Copy link
Owner

I don't see this as a problem that requires fixing. The existing solution is straight-forward, and I don't think it is brittle. Adding a DB.disconnect call after your application is loaded but before it starts running/serving requests should never break code, so it is safe to use even in non-forking applications.

Both of your proposed solutions would cause a performance hit, and I don't think it's fair to penalize all Sequel users just to make things slightly easier for some of them.

Note that if you want to implement such a thing yourself, it's trivial to create a custom connection pool class that does what you want.

@eregon
Copy link
Contributor Author

eregon commented Jul 30, 2013

Please see https://gist.github.com/eregon/89a0a7591721c3ce8c3c as an example of the problem.

I don't see this as a problem that requires fixing.

You do not see data corruption caused by a forking webserver and usage of a default Sequel thread-safe connection pool (which is pretty common) as a problem which requires fixing?

Adding a DB.disconnect call after your application is loaded but before it starts running/serving requests should never break code, so it is safe to use even in non-forking applications.

But it is not exactly intuitive, I wish to provide a good experience for the default environment and newcomers.

I thought to the before_fork and on_event(:starting_worker_process) hooks when I said brittle.
This solution is nicer, but still requires to put that DB.disconnect at the exact good place, that is after every DB access during loading is done and before any request handling code. If this spawns multiple files or there are multiple DB accesses it soon becomes non trivial.

For instance, in the gist, I first wrote DB.disconnect before conn_info (to be fair and allow both processes to create a new connection) and it reproduced the problem. It is kind of obvious in the script but might be much less when you are not calling fork.

Obviously, not everyone knows about gory details of fd sharing and fork and hopefully they not always need to.
Sequel is not the only culprit of course, but it could help in this area.

I appreciate this solution is documented but never saw it before as it is written in "Misc." (http://sequel.rubyforge.org/rdoc/files/doc/code_order_rdoc.html). Maybe it could benefit from a better visibility in some intro about connect?

Both of your proposed solutions would cause a performance hit, and I don't think it's fair to penalize all Sequel users just to make things slightly easier for some of them.

I think the performance hit would not be really important or significant, do you have a good benchmark for this so I could test myself?

Note that if you want to implement such a thing yourself, it's trivial to create a custom connection pool class that does what you want.

I'll have a try.

Finally, this is also so unexpected. You have well-separated connections per thread, but processes (which are a group of threads) do not always, so in one way we could say it is no thread-safe since two (main) threads share the same connection without any particular configuration.

@jeremyevans
Copy link
Owner

I don't see this as a problem that requires fixing.

You do not see data corruption caused by a forking webserver and usage of a default Sequel thread-safe connection pool (which is pretty common) as a problem which requires fixing?

To be more specific, I don't see this as a problem that requires fixing in Sequel. Obviously, data corruption is a problem in applications, but I think that it is an application issue.

Adding a DB.disconnect call after your application is loaded but before it starts running/serving requests should never break code, so it is safe to use even in non-forking applications.

But it is not exactly intuitive, I wish to provide a good experience for the default environment and newcomers.

Well, I'll give you that it is not intuitive. However, not everything can be intuitive. If you are using a forking webserver and preloading code, you need to understand issues caused by file descriptor sharing.

I thought to the before_fork and on_event(:starting_worker_process) hooks when I said brittle.

If the forking webserver offers a before_fork hook, that should be used, as it is the best place to do something. I do not consider that brittle, since that's where all code that deals with shared file descriptors should go.

The on_event(:starting_worker_process) is a little different, since I believe that is called after forking. That indeed is brittle. But that's a problem with Passenger, in that it doesn't offer a before_fork hook, only an after_fork hook.

This solution is nicer, but still requires to put that DB.disconnect at the exact good place, that is after every DB access during loading is done and before any request handling code. If this spawns multiple files or there are multiple DB accesses it soon becomes non trivial.

True, it does require you put the code in the correct place. However, I have yet to come across an app where locating this place is difficult.

Obviously, not everyone knows about gory details of fd sharing and fork and hopefully they not always need to.
Sequel is not the only culprit of course, but it could help in this area.

I disagree with the idea that every library that opens file descriptors should have code that always checks to see if a fork has happened. Do the ruby stdlib libraries that open file descriptors automatically handle this? Of course not, the onus is on the application user to handle file descriptors shared between processes.

I appreciate this solution is documented but never saw it before as it is written in "Misc." (http://sequel.rubyforge.org/rdoc/files/doc/code_order_rdoc.html). Maybe it could benefit from a better visibility in some intro about connect?

If you send an a pull request for adding additional documentation in this area, I will definitely consider it.

Both of your proposed solutions would cause a performance hit, and I don't think it's fair to penalize all Sequel users just to make things slightly easier for some of them.

I think the performance hit would not be really important or significant, do you have a good benchmark for this so I could test myself?

If I had to guess, the performance hit would not be significant for most cases. Considering the correct way to handle this issue (before forking) doesn't cause any performance hit, I don't think taking any performance hit is acceptable.

Finally, this is also so unexpected. You have well-separated connections per thread, but processes (which are a group of threads) do not always, so in one way we could say it is no thread-safe since two (main) threads share the same connection without any particular configuration.

Thread-safe doesn't mean fork-safe. Even when using the single connection pool, you can still run into issues with shared connections across processes.

To sum up: I think file descriptors shared across fork is an application level issue, not a library level issue, and it should be dealt with in the application.

@blambeau
Copy link

I'd like to share a comment here.

In my opinion, this related to the fact that Sequel gives to the OP the illusion that he can abstract from connection handling, through implicit connection boundaries. While very appealing at first glance, I personally think that such a feature is not a nice gift to users, as it generally leads to subtle bugs. @eregon provides a very good example IMHO.

Just to be sure that my point is clear:

DB = Sequel.connect(...)   # does not actually connect
DB[:users].to_a            # will create a connection

instead of (explicit connection boundaries),

DB = Sequel.database(...)
DB.connect do |conn|
  conn[:users].to_a
end

Note that this is not directly related to connection pooling per se. More to the fact that connections are created and maintained automatically by Sequel, instead of being created and closed explicitly by users (and reused across thread/processes only after having been closed by them).

Now, Sequel made that choice a long time ago and that's probably fine. But, if the illusion has to be maintained, I would be temped to agree with @eregon that it would be more consistent for Sequel to be fork-safe as well as thread-safe. Otherwise, the abstraction tends to leak in very unpleasant and unexpected cases.

@jeremyevans
Copy link
Owner

Suppose we do have code that looks for pid changes to detect forking (all such code is buggy if fork is called more than once, due to pid reuse). Let's say we detect a fork has occurred, and there are outstanding connections. What do we do? Do we ignore them? Do we attempt to close them? The former leaks connections. The latter isn't safe, since multiple child processes could be operating on the connections simultaneously. So any attempt to handling this automatically by detecting pid changes is broken.

The truth is, you cannot correctly handle things after forking, since then you are already sharing the file descriptors and connection objects. The only way to correctly handling things is to close the file descriptors before forking. As ruby does not offer a generic before_fork hook, I don't see that being possible automatically currently. Even if ruby did offer a generic before_fork hook, I don't think Sequel would use it, as fork can be called safely in many situations, and automatically disconnecting before fork could break existing code (e.g. fork{exec 'some command'} in the middle of your script).

FWIW, PostgreSQL specifically recommends against forking while having open connections: http://www.postgresql.org/docs/9.0/interactive/libpq-connect.html. Note that they don't recommend forking and then fixing the problems, they just tell you not to fork with an open connection.

Listen, I can understand that automatically handling this would make some things easier. But there are corner cases in doing so. The issue we are dealing with here is vary narrow (forking server that preloads code, and doesn't use connections in the parent after forking). Even though such usage may be common, we only want to make this change if we know we are in that situation. The only person that knows that is the application author (or possibly the webserver). So I think this is always going to be application/server-level decision. Since the only good way to handle it is to disconnect the connections before forking, I think the current recommendation of DB.disconnect sometime between application load and fork is the correct one, unless your webserver automatically handles it for you.

Is it common for other ruby libraries you use to specifically check for forking? I'd honestly be interested in how other libraries are handling this. AFAIK, ActiveRecord is similar to Sequel. Now, Passenger handles ActiveRecord disconnection/reconnection automatically when smart spawning, which is the reason it works automatically with ActiveRecord. But note how it is not ActiveRecord automatically reconnecting after fork, but Passenger doing so, which makes sense as Passenger is the one doing the forking. Maybe you want to ask the Passenger guys to automatically do a similar thing for Sequel that they do for ActiveRecord?

@cyclotron3k
Copy link

Relative newbie to Ruby and Sequel. Here are my thoughts:

Would PID reuse really be an issue? The only time a PID can be reused is when the previous owner of that PID has died (and thus is no longer using the associated database connection)?

I've just spent a frustrating few days trying to figure out some of the weirdest behaviour I've ever seen, and It's essentially this problem. I'm with @eregon on this one. I think [pid, thread_id] would be an elegant solution. Not sure how it would be any more of a performance hit than calling DB.disconnect every time.

Besides,
a) you're using ruby in (probably) a web environment - clearly that level of performance is not a primary concern. Personally, even if there were a performance cost, I would have paid it gladly to have avoided the madness (and database corruption) of the last few days.
b) I've heard that DB.disconnect can fail silently. How can you even compensate for this? We're back to square one if this is true (unless you implement @eregon's idea)?

ubuntu 13.10/nginx 1.4.3/phusion_passenger 4.0.21/ruby 2.0.0/sequel 4.8.0/postgres

@jeremyevans
Copy link
Owner

The issue is not what key is used for the connection pool's @allocated. In almost all if not all cases where Sequel will be used in a preforking, preloading web application, connections will not be allocated during forking. The problem connections are going to be in @available_connections, which is an array, not a hash. So the solution if you wanted to do it automatically would be to check for pid changes, and clear @available_connections if the pid has changed.

It is true that pid reuse is not a significant issue. To be a problem it requires a grandparent->parent->child process model where the grandparent sets up the connection, forks the parent and exits. I only mentioned it in passing, it wasn't a central point to my argument.

I'm sorry if this took you a few days to track down. This type of issue is mentioned in the Sequel documentation, Passenger documentation, and the PostgreSQL documentation. If you can think of other appropriate places to add it in Sequel's documentation, I will add additional documentation to those places.

Detecting pid changes and clearing the available connections is certainly possible. However, requires an additional system call on every single connection checkout (which are very frequent as connection checkouts should be short lived in Sequel), so it is not free from a performance standpoint. I'm sure you would have gladly paid the cost, but I don't think it's the correct decision to force the additional cost for all Sequel users.

I could easily add an extension that makes the connection pool do such a check. However, since it wasn't the default, nobody would use it unless they think they needed it, and the only people who could figure out they needed it would be better off just disconnecting before fork. So I don't think it doesn't make sense to implement it as an extension.

I'm not aware of cases where DB.disconnect can fail silently. Even if the disconnection of the underlying connection failed silently, @available_connections would still be cleared, so you would still get a new connection on the next checkout. The worst case scenario even if the underlying connection disconnection failed silently is leaking the underlying file descriptors.

@clord
Copy link

clord commented Feb 16, 2016

@jeremyevans, I find that this statement of yours is false:

Adding a DB.disconnect call after your application is loaded but before it starts running/serving requests should never break code

We use postgres LISTEN/NOTIFY in a pretty standard way as a basic job dispatch mechanism. Specifically, we fork processes as the result of NOTIFY. Because of this issue (#691) and it's suggested fix (disconnect prior to fork) our job server has been occasionally missing NOTIFY events during the time between disconnect and the next iteration of LISTEN in the parent process. I would really prefer that the parent stays connected, and that the forked process reconnects to the server independently. This took a lot of time to debug. It sounds like we might be able to disconnect in the forked child right off the bat instead of the suggestion that we do it in the parent prior to the fork, but I'm not sure this is documented behaviour, so we've enabled a slow-and-safe path until we can figure out what the situation is.

@jeremyevans
Copy link
Owner

@clord Without posting some example code, I can only guess what is happening. First, some background:

If you are using the standard threaded connection pool, DB.disconnect does not disconnect connections that are in use, only connections in the pool that are available. DB.listen checks out a connection from the pool during it's whole runtime, so such a connection should not be available in the pool, and therefore should not be disconnected while listen is running.

If you are using the single threaded pool, then it can delete in use connections, so if you are using that, you'd want to change to the default pool.

Note that in order not to miss notifications, you need to be using the :loop option in order to receive multiple notifications. Otherwise, Sequel stops listening on the channel (UNLISTEN) before returning from Database#listen, as mentioned in the documentation for the method. If you are not using the :loop option, but instead are looping manually doing something like:

while true
  notification = DB.listen("channel")
  fork{do_something(notification)}
end

it's expected that you will lose notifications, and is unrelated to disconnection. You would need to switch to:

DB.listen("channel", :loop=>true) do |*notification|
  fork{do_something(notification)}
end

Note that you need to be careful to use exit! instead of exit in the forked processes, so that the parent socket connection won't be disconnected.

If you still think this is a bug and you can come up with a self contained reproducible example that shows the problem, please post it on the sequel-talk Google Group or open a new issue and I'll definitely see if I can help.

FWIW, the statement I made is not falsifiable, I said "should never break code", I didn't say "can never break code". :) Not to mention that the statement was made about disconnection before running/serving requests, not about doing so at runtime in response to notifications.

@HoneyryderChuck
Copy link
Contributor

I'm with @jeremyevans on this one, and admire his patience in educating about this caveat.

FWIW this is documented everywhere when it concerns using fork-based app servers or Background job frameworks (Resque, Unicorn and Puma, besides Passenger), and usually for Active Record. Check Heroku documentation if you don't believe me. Everywhere there's a database involved and a fork, people warn against this sort of thing. This does not seem Sequel's problem to solve. If that, then ruby should clearly distinguish between the main thread and the "forked" main thread. But then again, should it? Fork is a Unix system feature. Ruby is not a Unix-only environment (it also supports windows). JRuby goes at extreme lengths to be able to fork, and I don't know any production case mixing fork with db pools. Even PostgreSQL recommends against it, as previously linked.

The reason Passenger abstracts things away for Active Record is just based on popularity and common case (the main customer app is 95% almost always Rails-based), and to avoid yet-another-fork-db-connection-related-issue. They're not alone on that. Even Sidekiq takes care of checking connections back in after jobs are done only for Active Record (this is probably not documented in Sequel, and probably shouldn't). This does seem to be out-of-scope for Sequel.

The only thing I'd argue is that Sequel stores are not fiber-safe, as they use Thread#[] as thread-local storage, but that's a completely different issue.

@clord
Copy link

clord commented Feb 17, 2016

@jeremyevans thanks, that was a very useful comment. It looks like the problem I'm seeing is that unlisten is being sent as part of listen, which you are saying is by design.

The code I am porting from would look like this if possible:

begin
while running
  DB.listen("channel", timeout: 10)
  fork{do_something}
  update_state
  running = # Compute
end
ensure
  DB.unlisten "channel"
end

And the current ported version will unlisten via line 3 instead of in the ensure, probably because update_state is very fast usually so it was deemed ok to run in this manner... the assumption of which was the true cause of our bug anyway.

I suppose I could manually exec LISTEN "Channel" to avoid having the implicit UNLISTEN that Database#listen injects. It would be nice to have some api surface for raw listen/unlisten but I at least understand the situation better, thanks.

Edit: I was worried after writing this that LISTEN "channel" might not like being executed multiple times in a row without a corresponding UNLISTEN. So I checked postgres docs and they say:

LISTEN registers the current session as a listener on the notification channel named channel. If the current session is already registered as a listener for this notification channel, nothing is done.

So my original code is correct in that regard. It's not necessary to unlisten after every listen.

@jeremyevans
Copy link
Owner

Yes, unlistening on the channels before returning from listen is by design, and required if you want correct behavior with the following code:

n1 = DB.listen("chan1")
n2 = DB.listen("chan2")

Without the unlisten, the second call to listen here could return a notification for chan1 even though you are only supposed to be listening on chan2.

You should be able to change your code to:

running = true
DB.listen("channel", timeout: 10, :loop=>true) do |*notification|
  fork{do_something}
  update_state
  break unless running = # Compute
end

Note that if you want to just use the ruby-pg API, you can just use DB.synchronize to get access to the underlying PG::Connection object, and operate directly on that.

@clord
Copy link

clord commented Feb 17, 2016

I tried this:

DB.listen("channel", timeout: 10, loop:true) do
   puts "called"
end

and "called" is only printed after an event on the channel, not in the case of a timeout. Looking at the code for listen:

            loop do
              t = timeout_block ? [timeout_block.call] : []
              conn.wait_for_notify(*t, &block)
              l.call(conn) if loop_call
            end

it's pretty clear that &block will only be called in wait_for_notify, which does not execute the block in the event of a timeout (which makes sense, it's used in the non-loop version too, where we would not expect it to run for a timeout). It does not seem like I have the API required to implement run-on-event-or-timeout in pure Sequel.

@jeremyevans
Copy link
Owner

Well, you can probably do it in pure sequel, but it would be uglier:

called = false
DB.listen("channel", timeout: proc{called = false;10}, loop: proc{puts "not called" unless called}) do
   called = true
   puts "called"
end

@eregon
Copy link
Contributor Author

eregon commented Jul 19, 2018

@jeremyevans Hello!
What if Ruby added at_fork(:before/:after_parent/:after_child) hooks as in https://bugs.ruby-lang.org/issues/5446#note-27?
Would you consider automatically disconnecting Sequel connections in e.g., at_fork(:before) or at_fork(:after_child)?
That would prevent this issue form happening, make Sequel fork-safe, and have no performance cost on the fast path.

@jeremyevans
Copy link
Owner

@eregon I wouldn't do it automatically, as it would break code. Let's say Sequel automatically uses at_fork(:before). Then if the user does something like: fork{exec('...')}, you end up disconnecting connections in the current process for no reason. You still don't end up with fork safety in the case where a connection is currently checked out of the connection pool, since checked out connections would still be shared by the parent and child processes.

at_fork(:after_child) has all of the problems that after_fork has instead of before_fork for preforking webservers, it's definitely the wrong way to handle file descriptor sharing issues. If you are concerned about preforking webservers, you should be disconnecting all connections before the fork, while no connections are checked out.

My reasoning hasn't changed in the last 5 years, it's still that trying to handle this automatically in Sequel is a bad idea. This should be handled in the application, as the application is the only place where it can be correctly handled. Yes, such handling may not be automatic. But I would rather no automatic handling than poor automatic handling, or automatic handling that works for a particular use case but deals poorly with other use cases.

@eregon
Copy link
Contributor Author

eregon commented Jul 20, 2018

@jeremyevans I am not convinced yet this cannot be done automatically and sensibly for the vast majority of realistic use cases (maybe even all). Could you elaborate on a few points?

Let's say Sequel automatically uses at_fork(:before). Then if the user does something like: fork{exec('...')}, you end up disconnecting connections in the current process for no reason.

Yes, but who does that? Ruby has many utilities for spawning subprocesses and I see very little good reason to fork+exec manually instead of backticks/spawn/system/popen/etc. Those later options won't trigger at_fork hooks.
And if there is something potentially touching the DB between the fork and exec, fork-safety would turn up very useful!

You still don't end up with fork safety in the case where a connection is currently checked out of the connection pool, since checked out connections would still be shared by the parent and child processes.

Why not? If the parent closed (as in close(2)) the connection, there is no way the fork can ever access it, isn't it? So both will have to reestablish a new connection when they use the database, which seems perfectly safe.

If you mean fork-ing while doing a DB transaction then that sounds inherently buggy, as the transaction would then end twice in the parent and the child.

at_fork(:after_child) has all of the problems that after_fork has instead of before_fork for preforking webservers, it's definitely the wrong way to handle file descriptor sharing issues.

Could you link to some documentation about that?
I think at_fork(:after_child) { reconnect } would:

  1. be fork-safe
  2. avoid disconnecting the parent uselessly
  3. naturally deal with the issue by preventing the child to use the parent's connection

So intuitively this option sounds great to me. It's also similar to what Passenger uses to avoid file descriptor sharing automatically with ActiveRecord AFAIK. What are the drawbacks?

If you are concerned about preforking webservers, you should be disconnecting all connections before the fork [manually, at the application level], while no connections are checked out.

If I use Sequel and some forking Ruby webserver (most do, and many by default), I'd like to have a sane situation by default and e.g., avoid end users of the website seeing database results of one another (and so leaking private information) due to unintentional connection sharing.
Yes, the problem is documented in a couple places, but it's also easy to miss it or forget about it, and it is fairly easy to get this issue with default settings.
More importantly, isn't good and safe behavior by default valuable?

But I would rather no automatic handling than poor automatic handling, or automatic handling that works for a particular use case but deals poorly with other use cases.

Could you show realistic use cases where automatic fork-safety would be harmful? (using at_fork(:after_child))

@jeremyevans
Copy link
Owner

Yes, but who does that?

Not sure. However, I'm guessing the following is not that uncommon, since it is the recommended way to support fork without breaking things in the parent:

fork do
  something
  exit!
end

at_fork(:before) would break that use case and be unacceptable to me.

Why not? If the parent closed (as in close(2)) the connection, there is no way the fork can ever access it, isn't it? So both will have to reestablish a new connection when they use the database, which seems perfectly safe.

You are talking about close(2) on the connection, but that's not what Sequel needs to do. First close(2) would be called on a socket that Sequel generally doesn't deal with, that's a driver level issue. Sequel would be calling Database#disconnect, which does a lot more than close(2). Most disconnection code does not just close(2) a socket, it sends a proper protocol disconnect to the server over the connection first.

Could you link to some documentation about that?
I think at_fork(:after_child) { reconnect } would:

Sequel doesn't have reconnect, so I'm not sure what you mean. If you mean by emptying the connection pool without disconnecting the connections in it, then you'll can run into problems when those objects are garbage collected and their finalizers are run potentially by multiple processes at the same time, often doing a protocol level disconnect on the socket before closing it. If you mean actually disconnecting the connections and then creating new connections,, then you have the same issue, it just happens sooner.

Could you show realistic use cases where automatic fork-safety would be harmful?

The burden is not on me to show it can cause no harm. The burden is on the person who wants this added to prove doing so cannot cause harm. I've already described possible ways this can be harmful.

This problem has a simple, manual solution with one line of code that causes no problems if used correctly. That is preferable in my opinion to a more complex automatic case that will handle certain cases fine and break other cases.

As an analogy, there are many people using Sequel with multiple databases and model classes. If they create a model class, Sequel doesn't attempt to look at all databases to try to find which one is the best fit for the model class (i.e. has the related table). The burden is on the user to manually specify which database if the default database in use is not the desired one.

@eregon
Copy link
Contributor Author

eregon commented Jul 20, 2018

Thank you for the reply.

at_fork(:before) would break that use case and be unacceptable to me.

To clarify, what would break exactly? The connection would be disconnected in the parent on fork, but I guess Sequel would automatically reconnect after. So what's the harm?

It's not ideal that on every fork the parent needs to disconnect (and reconnect later if the DB is accessed), I understand that.

Sequel doesn't have reconnect, so I'm not sure what you mean. If you mean by emptying the connection pool without disconnecting the connections in it, then you'll can run into problems when those objects are garbage collected and their finalizers are run potentially by multiple processes at the same time, often doing a protocol level disconnect on the socket before closing it. If you mean actually disconnecting the connections and then creating new connections,, then you have the same issue, it just happens sooner.

This seems the meat of the issue, and the challenge is to do this correctly. It sounds fairly similar to IO.for_fd, on which autoclose must be disabled manually in Ruby (otherwise there is a double-close, or an early concurrent close() which is harmful).

So I think during the at_fork(:after_child) hook we'd need to empty/clear the connection pool, not send anything on those connections, but mark them (somewhat like IO#autoclose=) so indeed finalizers don't do anything, or alternatively just directly remove the finalizers on those objects.
Basically, we want to prevent these connections to communicate in any way with the database, since that would be done concurrently with the parent process.
Does that sound right? Are there any problems with this approach?

We should ideally also close the underlying file descriptors (and free native allocations for the connection pointer) as they won't be used. I'm not sure if e.g., pg has an API for that.
Reading a highly related thread of discussion shows I'm not alone to want that:
http://www.postgresql-archive.org/Detecting-libpq-connections-improperly-shared-via-fork-td5726553.html

Daniel Farina-4: However, I would like to re-iterate that this is a very common
problem, so it may be worth pausing to think about solving it.
Whenever someone writes in saying "what's up with these strange SSL
errors", generally the first question in response is "are you using
unicorn?" (for Ruby, 'gunicorn' for Python). The answer is almost
invariably yes. The remainder have renegotiation issues.

One solution for postgres involves to close(2) the socket returned by PQsocket() and then call PQfinish() to release resources without sending the termination sequence (as the socket is closed). This user request specifically asks for exposing a single function doing that.

They also propose just not caring about the opened socket and native PGConn allocation. This seems safe and of course portable/independent of the specific database. After all, the parent process likely doesn't have a thousand open connections which could lead to file descriptors exhaustion or a significant memory leak (but often just 1 open connection I would expect).


FWIW it seems ActiveRecord deals with this problem directly nowadays, by checking Process.pid (which is fairly cheap as it's not a syscall but just a function call at least on Linux): https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L1041
I also replied on #1431, where ActiveRecord seems to behave correctly.
Heroku seems to recommend to both disconnect in before_fork and reconnect in after_fork: https://devcenter.heroku.com/articles/forked-pg-connections
https://stackoverflow.com/questions/29918475/unicorn-do-i-need-to-disconnect-from-database-in-before-fork-block mentions there shouldn't be a need to disconnect in before_fork.

@jeremyevans
Copy link
Owner

To clarify, what would break exactly? The connection would be disconnected in the parent on fork, but I guess Sequel would automatically reconnect after. So what's the harm?

Well...

It's not ideal that on every fork the parent needs to disconnect (and reconnect later if the DB is accessed), I understand that.

I think you've answered your own question. Correct use of fork followed by exec or exit! should not be penalized.

This seems the meat of the issue, and the challenge is to do this correctly. It sounds fairly similar to IO.for_fd, on which autoclose must be disabled manually in Ruby (otherwise there is a double-close, or an early concurrent close() which is harmful).
So I think during the at_fork(:after_child) hook we'd need to empty/clear the connection pool, not send anything on those connections, but mark them (somewhat like IO#autoclose=) so indeed finalizers don't do anything, or alternatively just directly remove the finalizers on those objects.
Basically, we want to prevent these connections to communicate in any way with the database, since that would be done concurrently with the parent process.
Does that sound right? Are there any problems with this approach?

Yes, it's way more complex than a single Database#disconnect call in the appropriate place in the application when using a pre-forking webserver and loading the application before forking. Plus before Sequel could consider it, you would need to make sure all 13 database drivers that Sequel supports have a supported method for marking connections so they only close(2) the socket during GC finalization. Even if that were done I'm not sure the cost/benefit of making this change would be worth it.

I'm sorry, but you and I just have a different philosophy regarding this. You are willing to accept a large amount of complexity for an automated approach that benefits a particular case, without considering too much about other cases. I think this is best solved by a simple manual solution in the particular case where it is needed, where a similar manual solution would be needed for most other libraries that use sockets.

I've been maintaining Sequel for 10 years. Looking at the issues, there are only a handful that are related to this issue. It's simply not a major issue for most Sequel users. It's true that new users are occasionally bitten by it because they don't understand fork and didn't read the documentation for either Sequel or their webserver. However, I think that vast majority of Sequel users have no problems with the status quo in regards to this issue, and adding complexity to handle it automatically is not worth it.

@eregon
Copy link
Contributor Author

eregon commented Jul 21, 2018

You are willing to accept a large amount of complexity for an automated approach that benefits a particular case.

I'm trying to explore the available options. Agreed the approach of closing the socket but not communicating on the socket is tricky and needs special handling for each DB driver, sometimes not even exposed directly by the native DB library.

Probably removing the finalizer or disabling it is easier: we just need to know which object the finalizer is defined on (and use ObjectSpace.undefine_finalizer), or a way to mark the object as "should not be finalized". When the forked process exits, we can just let the OS release resources (the socket and native allocations for the connection).

From #1431 (comment):

In any case, yes, I think if you are using fork, you should know to use exec or exit! if you want to avoid the issue.

Could preforking webservers use this? Specifically, since they fork, they would need to ensure all the forked processes end with exit! so finalizers & at_exit hooks are not run multiple times if some finalizers & at_exit hooks were defined in the parent process.

That would avoid the double disconnect, but of course something else would still be needed so the connection is not reused in multiple processes.
A fairly simple

at_fork(:after_child) {
  Sequel::KEEP_ALIVE = DB.pool.available_connections.dup
  DB.pool.available_connections.clear
}

should be enough for that (if we assume the webserver/forked process uses exit!).

Alternatively, "a new object_id-like identifier which changes across fork: Thread.current.thread_id" as mentioned by Eric Wong could help to make this slightly easier than using a [PID, Thread] tuple (which I guess is the most common approach currently for fork+thread safety).

@jeremyevans
Copy link
Owner

Probably removing the finalizer or disabling it is easier: we just need to know which object the finalizer is defined on (and use ObjectSpace.undefine_finalizer), or a way to mark the object as "should not be finalized". When the forked process exits, we can just let the OS release resources (the socket and native allocations for the connection).

I would not even consider messing with the internal state of the connection objects to try to implement this. A prerequisite for support in Sequel would be support in the drivers. That's a necessary condition, but not a sufficient one.

Could preforking webservers use this?

I don't think they could do so automatically or by default. Calling exit! instead of exiting normally can break a whole number of things, and should only be used in specific circumstances. Consider the case where users are currently using preforking webservers correctly by calling Database#disconnect in the parent process before forking worker processes, but then using at_exit in each worker for some processing. If the preforking webservers started calling exit! instead of exiting normally, they break that case.

Alternatively, "a new object_id-like identifier which changes across fork: Thread.current.thread_id" as mentioned by Eric Wong could help to make this slightly easier than using a [PID, Thread] tuple (which I guess is the most common approach currently for fork+thread safety).

That would only affect the case of currently checked out connections on the thread that forks, which would end up like currently checked out connections on threads other than the one that forked. It doesn't affect connections that are not currently checked out, which would still be available to be checked out in both the parent and child.

@boazsegev
Copy link

@jeremyevans,

Thank you for your time and efforts.

I totally agree with your position... However, I find myself in a conundrum where the existing Sequel API doesn't allow me to provide an automated solution to this problem.

I want to offer Sequel the same support offered to ActiveRecord by the iodine web server (which I'm now in the process of updating/fixing)...

...However, unlike ActiveRecord, Sequel database connections are instance/driver specific, where DB is a convention rather then a promise.

Is there any way that the Sequel namespace could contain a method that closes all the connections in all the drivers / instances, similar to ActiveRecord::Base.connection.disconnect! and ActiveRecord::Base.establish_connection?

I think having a connection/database "registry" that would allow Sequel connections to be globally re-established, could solve this discussion by moving the responsibility to the framework / server using Sequel rather than the developers.

For your consideration.

Kindly,
B.

@janko
Copy link
Contributor

janko commented Nov 14, 2018

@boazsegev Sequel stores each Sequel::Database object created by Sequel.connect in the Sequel::DATABASES array. That's how it's able to infer the default database when using models, it just takes the first item from that array.

So you should be able to disconnect all Sequel database connections with:

Sequel.synchronize do
  Sequel::DATABASES.each { |database| database.disconnect }
end

@boazsegev
Copy link

@janko-m , Thanks!

I committed a patch to automate server fork-ing with iodine and Sequel. This will be part of the next iodine release 👍🏻🎉

... unless you have objections... I wouldn't want to introduce a default behavior that the Sequel team disapproves of...?

Kindly,
B.

P.S.

I think adding the code to the servers (or frameworks) will make it easier for developers... after all, by the time developers use Rack or lower level features, they should already know about the issues related to fork.

@jeremyevans
Copy link
Owner

@boazsegev The patch looks good, except that you should drop the usage of Sequel.synchronize.

@boazsegev
Copy link

@jeremyevans Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants