Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add before_sideload hook #371

Merged
merged 3 commits into from
Mar 18, 2024
Merged

Conversation

jhnvz
Copy link
Contributor

@jhnvz jhnvz commented Aug 4, 2021

This PR adds a configurable before_sideload hook. This gives us the ability to restore the global state in sideload threads. We use a gem called ActsAsTenant (https://github.com/ErwinM/acts_as_tenant) to make sure we scope queries by a tenant, not having global state available breaks all our sideloads. This is how you can restore global state for existing application code or third-party dependencies:

Graphiti.configure do |c|
  c.before_sideload = proc do |context|
    ActsAsTenant.current_tenant = context[:company]
  end
end

I was not sure how to properly write a test for this new behavior (in the current setup) as I'm not seeing how to perform actual sideloads in a different thread.

@jhnvz
Copy link
Contributor Author

jhnvz commented Aug 5, 2021

I just stumbled upon another issue. We use the new multiple database feature of rails 6. In the following example all queries are pointed to the primary database (instead of the replica):

Graphiti.configure do |c|
  c.before_sideload = proc do |context|
    puts ActiveRecord::Base.current_role
  end
end

ActiveRecord::Base.connected_to(role: :reading) do
  SomeResource.all(include: 'other_resource')
end

# => :writing

We could solve this by implementing something that's being called around sideloading (middleware). An interface could look something like this:

connection_handler = proc do |context, &block|
  ActiveRecord::Base.connected_to(role: context[:database_role]) { block.call }
end

Graphiti.configure do |c|
  c.sideload_middleware do |chain|
    chain.add(connection_handler)
  end
end

Graphiti::Rails can add the middleware by default so we stay compatible with Rails out of the box. I'm still not sure how and when to store connection data in Graphiti::Context and I still need to look at the internals of ActiveRecord as they also support sharding and defining/nesting model-specific roles. Before doing so, I'd love to hear your guys' thoughts/take on this.

@richmolj
Copy link
Contributor

richmolj commented Aug 5, 2021

Thanks for this @jhnvz ❤️ ! I certainly agree with your middleware proposal. My biggest thought is this type of connection stuff is probably already happening in Rails (or maybe puma) middleware. Maybe take a look and we'll copy what they do? Would be a huge help!

@jhnvz
Copy link
Contributor Author

jhnvz commented Aug 6, 2021

@richmolj Thanks for the fast response ✌️ and advice! After doing some research I figured that adding middleware is overkill, and the huge downside of using a middleware strategy is that you have to implement middleware for application code and each third-party library that depends on state in threads (jhnvz@846b9b1 Yuck..).

This is what Puma does (single mode with multiple threads):

preload application code
\_ AR sets up connection pool (stored on class instead of thread)
   \_ thread 1 (clean locals)
      \_ We call `.connected_to(role: :reading)`, state gets stored in thread
         \_ Calling `connection.execute` connects with correct connection from handler
   \_ thread 2 (clean locals)
      \_ ...

Puma in clustered mode takes a fork from application code, reloading application code, thus setting up a new connection pool in the ActiveRecord class (instead of a thread).

This is what's happening in our case:

preload application code
\_ AR sets up connection pool (stored on class instead of thread)
   \_ thread 1 (clean locals)
      \_ We call `.connected_to(role: :reading)`, state gets stored in thread
         \_ Sideloading occurs \
     __________________________/
    /
    \_ Thread 3 (clean locals)
       \_ Calling `connection.execute` connects with incorrect connection (due to empty state)
    \_ Thread 4 (clean locals)
       \_ ...                       
   \_ thread 2 (clean locals)
      \_ ...

We can fix this by restoring thread local state of the parent thread. Implementation looks like this:

@query.sideloads.each_pair do |name, q|
  sideload = @resource.class.sideload(name)
  next if sideload.nil? || sideload.shared_remote?
  parent_resource = @resource
  graphiti_context = Graphiti.context
+ thread = {}.tap do |hash|
+   Thread.current.keys.each { |key| hash[key] = Thread.current[key] }
+ end
+ thread_variables = {}.tap do |hash|
+   Thread.current.thread_variables.each { |var| hash[var] = Thread.current.thread_variable_get(var) }
+ end  
  resolve_sideload = -> {
+   thread.each_pair { |key, value| Thread.current[key] = value }
+   thread_variables.each_pair { |key, value| Thread.current.thread_variable_set(key, value) }
    Graphiti.context = graphiti_context
    sideload.resolve(results, q, parent_resource)
    @resource.adapter.close if concurrent
  }
  if concurrent
    promises << Concurrent::Promise.execute(&resolve_sideload)
  else
    resolve_sideload.call
  end
end

This is what I was able to find in Thread.current.keys in our app:

[
    [ 0] :_rollbar_notifier,
    [ 1] :puma_server,
    [ 2] :with_force_shutdown,
    [ 3] :current_attributes_instances,
    [ 4] :"attr_ActionText::Content_renderer",
    [ 5] :"ActiveSupport::Cache::Strategy::LocalCache::LocalCacheRegistry",
    [ 6] :request_store_active,
    [ 7] :"activesupport_tagged_logging_tags:94600",
    [ 8] :"ActiveSupport::Notifications::InstrumentationRegistry",
    [ 9] :"ActiveSupport::SubscriberQueueRegistry",
    [10] :_timestack,
    [11] :"ActiveRecord::RuntimeRegistry",
    [12] :"ActiveRecord::ExplainRegistry",
    [13] :request_store,
    [14] :ar_prepared_statements_disabled_cache,
    [15] :"ActiveRecord::Scoping::ScopeRegistry",
    [27] :i18n_config,
    [28] :rescue_registry_context,
    [29] :searchkick_runtime,
    [30] :context,
    [31] :prevent_writes,
    [32] :time_zone
]

And this in Thread.current.thread_variables:

[
    [ 0] : ar_connection_handler
]

As you can see, currently there might be a lot more broken than active record connection handling (timezones, i18n, activesupport notifications). The great advantage of this solution is that we gain compatibility with all code that uses thread state out of the box. When we implement this solution, as far as I understand Graphiti.context, we can deprecate it or solely use it to make controller context available in resources.

I'm still figuring out a way to properly spec/test this new behavior. Any ideas are welcome. Shall I close this PR and start working on a new one?

@richmolj
Copy link
Contributor

richmolj commented Aug 7, 2021

Thanks so much @jhnvz ❤️ That was a great breakdown. I agree with your solution and something similar has come up before. I think Graphiti.context should still stick around (controller/action, backwards compat, .with_context, etc). But otherwise makes sense. I wonder if concurrent-ruby has pointers on testing?

@jkeen
Copy link
Collaborator

jkeen commented Feb 27, 2024

@jhnvz Thanks for your work on this! This will solve an issue I have in a multi-tenant app I have using graphiti.

I didn't quite follow the conversation re: Threads and middleware and wanted to make sure: as it stands now, is the code in this PR ready for merging?

@jkeen jkeen added the question label Feb 28, 2024
@jkeen jkeen merged commit f68b61f into graphiti-api:master Mar 18, 2024
35 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 18, 2024
# [1.5.0](v1.4.0...v1.5.0) (2024-03-18)

### Features

* add before_sideload hook ([#371](#371)) ([f68b61f](f68b61f))
Copy link

🎉 This PR is included in version 1.5.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

jkeen added a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
* Add before_sideload hook to allow restoration of global state in sideload threads, .e.g. setting current tenant for sideloads

Co-authored-by: Jeff Keen <jeff@keen.me>
jkeen pushed a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
@Japestrale
Copy link

@jhnvz / @richmolj I'm currently looking to solve the same issue you outlined above with regard to dealing with multiple databases I.e ensuring that sideloads are sent to the right database, rather than always being sent to the primary.

Before I go digging into writing trying to write tests I just wanted to check that no more work was done on the topic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants