Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails 6 RC2 multi-db issue under multithreaded environment #36830

Closed
kiyot opened this issue Aug 1, 2019 · 22 comments · Fixed by #36843 or #36868
Closed

Rails 6 RC2 multi-db issue under multithreaded environment #36830

kiyot opened this issue Aug 1, 2019 · 22 comments · Fixed by #36843 or #36868
Assignees
Milestone

Comments

@kiyot
Copy link

kiyot commented Aug 1, 2019

When you use ActiveRecord::Middleware::DatabaseSelector with multithreaded puma, write access to db randomly fails with ActiveRecord::ReadOnlyError: Write query attempted while in readonly mode.

ActiveRecord::Middleware::DatabaseSelector has a mechanism to redirect read access to primary db for a moment. During that period, a flag named @prevent_writes on ActiveRecord::ConnectionAdapters::ConnectionHandler objects is set, and write access to db will raise ActiveRecord::ReadOnlyError.

However, ActiveRecord::ConnectionAdapters::ConnectionHandler objects are shared across threads. Changing @prevent_writes in one thread will cause unexpected effects on requests handled by the other threads.

Steps to reproduce

Clone this repo, and follow the instructions.
https://github.com/kiyot/rails6-multi-db-prevent-write-race

Expected behavior

Write access to db in writing role should succeed.

Actual behavior

Write access to db in writing role randomly fails with ActiveRecord::ReadOnlyError: Write query attempted while in readonly mode under multithreaded environment.

System configuration

Rails version:
6.0.0 RC2

Ruby version:
2.6.3

@eileencodes eileencodes self-assigned this Aug 1, 2019
@eileencodes eileencodes added this to the 6.0.0 milestone Aug 1, 2019
eileencodes added a commit to eileencodes/rails that referenced this issue Aug 2, 2019
Previously if an app attempts to do a write inside a read request it will be
impossilbe to switch back to writing to the primary. This PR adds an
argument to the `while_preventing_writes` so that we can make sure to
turn it off if we're doing a write on a primary.

Fixes rails#36830

Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
@kiyot
Copy link
Author

kiyot commented Aug 3, 2019

@eileencodes Thank you for the fix. But #36843 does not solve the problem completely, and there are still other error cases.

I have created a new repo for reproduction
https://github.com/kiyot/rails6-multi-db-prevent-write-race-2

Suggestion

I think the root problem is that @prevent_writes flag is shared across multiple threads and requests.
This flag should be local in each request, and should be moved to somewhere only one request can access. I guess ActiveRecord::Middleware::DatabaseSelector::Resolver would be a good candidate to have @prevent_writes flag.

@eileencodes
Copy link
Member

I guess ActiveRecord::Middleware::DatabaseSelector::Resolver would be a good candidate to have @prevent_writes flag.

That won't work since it's not meant to be used only in the middleware.

I'll work on a solution today.

@eileencodes eileencodes reopened this Aug 5, 2019
eileencodes added a commit to eileencodes/rails that referenced this issue Aug 6, 2019
As demonstrated in the test added and in rails#36830 the code that prevents
writes wasn't thread safe. If one thread does a read, then another does
a write, and then another does a read the second read will cause the
first write to be unwriteable.

This change removes the instance variable and instead uses a
getter/setter on Thread.current[:prevent_writes] for the connection
handler to set whether writes are allowed.

Fixes rails#36830
@jsierles
Copy link

We have been testing #36868 with Puma 4, and still are seeing writes going to the replica. In our case, we're using an around_filter in a specific controller - the only one that needs to use the replica. The contents of the filter:

ActiveRecord::Base.connected_to(database: :production_replica) do
      yield
    end

We'll dig in more to see what's happening. But I thought we should mention it here in case this leads to any clues.

@kiyot
Copy link
Author

kiyot commented Aug 20, 2019

@jsierles I encountered the same issue when I use connected_to directly.

According to f2de448, it looks like you should use ActiveRecord::Base.connection_handler.while_preventing_writes(false), too.

@eileencodes
Copy link
Member

We need more information from you @jsierles before I can determine there's an issue here. For one are you using the middleware? If you're using the middleware you shouldn't use connected_to an around filter to also send traffic to specific databases.

I'd need to see your code to be sure but it sounds like more is going on here. You shouldn't need to set while_preventing_writes directly when using the replica, but since you're using an around_filter more could be going on there.

To dig in more I need an app and reproduction steps.

@krames
Copy link

krames commented Aug 21, 2019

@eileencodes I investigated this issue with @jsierles. From my initial debugging it appears that the swap_connection_handler in connection_handling.rb is not swapping back to the original connection. This issue does not appear related to your thread safety write patch.

We ended up fixing the issue by using the connects_to in our ApplicationRecord class and accessing connections using role based access.

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
 
  connects_to database: { writing: :primary, reading: :primary_replica }
end

Unfortunately, I haven't been able to find time to hunt down the root cause, but hopefully this comment helps someone.

@eileencodes
Copy link
Member

The provided example is the right way - I'm curious how you were connecting/switching connections before that change?

@krames
Copy link

krames commented Aug 21, 2019

@eileencodes we had a controller with the following code

class ReporterController
  around_action :read_from_replica, unless: -> { ENV["SKIP_DB_REPLICA"] }
  
  def read_from_replica
    ActiveRecord::Base.connected_to(database: :production_replica) do
      yield
    end
  end

  def trip
    # controller logic here
  end
end	

In our database.yml we had the following:

  primary:
	    <<: *default
	    url: <%= (ENV['DATABASE_CONNECTION_POOL_URL'] || ENV['DATABASE_URL']).sub(/^postgres/, "postgis") %>
	  primary_replica:
	    <<: *default
	    url: <%= (ENV['REPLICA_CONNECTION_POOL_URL'] || ENV['REPLICA_DATABASE_URL'] || ENV['DATABASE_URL'] || "").sub(/^postgres/, "postgis") %>
	    replica: true

The only place we used the replica was in the above controller. Everything else was unchanged.

I was able to duplicate this issue on the console on my local machine by point the replica to my test database and executing the following in a console:

ActiveRecord::Base.connected_to(database: :production_replica) do
  User.count
end

That returned a count from the replica.

If I then did subsequently executed the following in the same console, I again got the same count as the replica:

User.count

I had assumed it would use the primary because this line was executed outside of the Base.connected_to block. Executing User.count on a fresh console would return count for the primary.

Is this a bug or a misunderstanding of how multiple databases are suppose to work in Rails 6?

@eileencodes
Copy link
Member

eileencodes commented Aug 21, 2019

You don't want to use the following except for one off connections (like in a script to connect to a slow replica or something like that). This will re-establish the connection every time it's called:

    ActiveRecord::Base.connected_to(database: :production_replica) do
      yield
    end

The other problem is since this is a one off and doesn't swap the role you're basically telling AR Base to connect to the production replica. If you need to swap back to the primary you'll need to do a new block.

So using the Role is the right way to use multiple databases in your controller. You also shouldn't need to do an around_filter since the middleware is supposed to choose the right database in your controller requests. Any custom code you need to push a request to a replica or a primary should be added in your app and then initialized with the middleware.

I think there may be some missing docs here that I can add to help make this easier.

@krames
Copy link

krames commented Aug 21, 2019

@eileencodes Thanks for the explanation!

@kiyot
Copy link
Author

kiyot commented Aug 22, 2019

@eileencodes

I am using the middleware and it works fine in most cases. But there are a few controllers that writes to a primary in GET requests.

I added connected_to(role: :writing) there, and it turned out to require this workaround. It seems to work, but maybe I'm on the wrong path.

Any custom code you need to push a request to a replica or a primary should be added in your app and then initialized with the middleware.

What is the supposed way to do this? Creating custom resolver class?

@noma4i
Copy link
Contributor

noma4i commented Aug 22, 2019

In this case connected_to doesn't need to be a block I guess? more likely good idea is to name it connect_to! as a method w/o yield.

@jorisvh
Copy link

jorisvh commented Aug 26, 2019

I have the same issue. We use the middleware but some of our controllers also use the impressionist gem, so some GET requests will try to write. This is what we do (at the end of our index or show methods) :

ActiveRecord::Base.connected_to(database: :master) do
      impressionist(...)
end

It will raise the following error maybe once in a thousand times :

Write query attempted while in readonly mode: INSERT INTO "impressions" ("impressionable_type", ...

@pinzonjulian
Copy link
Contributor

pinzonjulian commented Aug 26, 2019

My team has experienced the same issue and we've found something

We have a GET action that writes a log to the primary DB. When we do a plain old get this works just fine:

ActiveRecord::Base.connected_to(role: :writing) do
  # code that writes to the primary DB
end

However when the journey that leads to that action comes from a redirect from a POST or PATCH action, it breaks.

# ActiveRecord::ReadOnlyError: Write query attempted while in readonly mode

If we add @kiyot 's solution it works:

ActiveRecord::Base.connected_to(role: :writing) do
  ActiveRecord::Base.connection_handler.while_preventing_writes(false) do
    # code that writes to the primary DB
  end
end

@kiyot, @jorisvh , is this your case too?

We created this provisional method in ApplicationRecord that can be easily be found and replaced when there's a solution for this:

def self.provisional_connect_to(role: :writing, while_preventing_writes: false )
    ActiveRecord::Base.connected_to(role: role) do
      ActiveRecord::Base.connection_handler.while_preventing_writes(while_preventing_writes) do
        yield
      end
    end
  end

We'll try to recreate this today in a new app and let you know.

PS: Is this the correct issue to have this discussion in? I'm not sure if this relates to the main theme of this issue.

@eileencodes
Copy link
Member

eileencodes commented Aug 26, 2019

AFAICT this is not a bug. If you're using the middleware AND setting your own connected_to manually you'll need to make sure you're turning writes back on when you do the manual switch with connected_to.

eileencodes added a commit to eileencodes/rails that referenced this issue Aug 28, 2019
If a user is using the middleware for swapping database connections and
manually calling `connected_to` in a controller/model/etc without
calling `while_preventing_writes(false)` there is potential for a race
condition where writes will be blocked.

While the user could _just_ call `while_preventing_writes` in the same
place they call `connected_to` this would mean that all cases need to
call two methods.

This PR changes `connected_to` to call `while_preventing_writes`
directly. By default we'll assume you don't want to prevent writes, but
if called with `connected_to(role: :writing, prevent_writes: true)` or
from the middleware (which calls `connected_to` this way) the writes
will be blocked.

For replicas, apps should use readonly users to enforce not writing
rather than `while_preventing_writes` directly.

Should fix the remaining issues in
rails#36830
@vitobotta
Copy link

I am trying the read/write splitting and am having this problem with Devise (see this), because it makes changes to the database even with some GET requests. I tried @kiyot 's change in an around filter and the tests pass/things seem to work, but @eileencodes do we still need to do something to restore the default behaviour even when using a block like this?

ActiveRecord::Base.connected_to(role: :writing) do
  ActiveRecord::Base.connection_handler.while_preventing_writes(false) do
    # code that writes to the primary DB
  end
end

@eileencodes
Copy link
Member

@vitobotta I don't understand the question. Are you asking if you need to use while_preventing_writes? Or are you asking if there's anything in addition to while_preventing_writes?

I fixed Rails so you don't need to do that in #37065 but it's not in a released version yet.

@vitobotta
Copy link

Hi @eileencodes, basically I would like to know if it's safe to use that code snippet for now, until 6.0.1 is released with the fix you mentioned. Are there any negative consequences in using that code? Thanks!

@rept
Copy link

rept commented Nov 10, 2020

I'm still confused.

In this case connected_to doesn't need to be a block I guess? more likely good idea is to name it connect_to! as a method w/o yield.

This one got me as well. I have a controller method that lets users download a CSV from our datawarehouse.

This is wrapped in a block like this:

ActiveRecord::Base.connected_to(database: :datawarehouse) do
  result = ActiveRecord::Base.connection.execute(
      %q(
      select dc.device_id, pp.x, pp.y, pp.confidence, semantic_map, internal_name as corrected_person_position_type_id, case when vs.id is null then false else true end as validation_set
      ...
    ))
end

The file downloads fine, but subsequent actions are going to the datawarehouse DB instead of primary. I would expect that once out the block it would return to the primary, otherwise what's the use of the do block ?

@eileencodes it this fixed in Rails 6.1 (or am I using it wrong?)

BTW I'm using Puma.

@eileencodes
Copy link
Member

eileencodes commented Nov 10, 2020

I completely deprecated the database kwarg on connected_to in 6.1 because it's dangerous to use in requests and that's where everyone was using it. It's a feature that folks were trying to use for sharding and it does not work for that.

If you want to use sharding, you should use the shard key in connected_to on 6.1. This is all documented in the guides. The database kwarg will be removed in 6.2 with no replacement, I don't recommend using it at all.

@eileencodes
Copy link
Member

To explain more clearly: the problem with using the database kwarg is it's establishing connections on the fly which will replace the existing connections with the same connection_specification_name. In 6.0 there was no way to establish more than one connection per handler per class. That's fixed in 6.1 (if you're using sharding), but not for the database kwarg since it really doesn't do what anyone wants it to do. It was supposed to be for one off connections in a script outside your app (ie for connection to a replica that allows slow requests). If you use it in a request it will replace your connection for "ActiveRecord::Base" / "primary" connection_specification_name which is why you're not returning to the primary at the end of the block - it no longer exists in the pool.

@rept
Copy link

rept commented Nov 10, 2020

To explain more clearly: the problem with using the database kwarg is it's establishing connections on the fly which will replace the existing connections with the same connection_specification_name. In 6.0 there was no way to establish more than one connection per handler per class. That's fixed in 6.1 (if you're using sharding)

Ok thanks for the explanation. Good thing it was removed then. So if I get it correctly I would be able to do this like this in the controller (after I defined the shard in database.yml of course):

ActiveRecord::Base.connected_to(role: :reading, shard: :dwh) do
  result = ActiveRecord::Base.connection.execute('some complex query')
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants