Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Marginalia to Rails, via QueryLogs #42240

Merged
merged 1 commit into from Aug 9, 2021

Conversation

keeran
Copy link
Member

@keeran keeran commented May 17, 2021

Porting Marginalia to Rails

This PR brings Marginalia SQL comments to Rails as a native feature. I have attempted to closely follow the Marginalia approach to solving the problem whilst introducing a new performance benefit for large applications.

The implementation (both in Marginalia and in this PR) can be broken down into 3 key areas:

Comment configuration & construction

Marginalia defines an array of components to construct an SQL comment. For each item in components, a method is called on Marginalia::Comment and the resulting key-value pair is concatenated into a comment string. This array of components can be modified by the application developer.

In Marginalia:

  self.components.each do |c|
    component_value = self.send(c)
    if component_value.present?
      ret << "#{c}:#{component_value},"
    end
  end

In this PR we have adjusted the method-calling approach to work with Procs instead. Default tags are defined as Procs in Railtie callback handlers (added to the taggings attribute), and users are able to pass custom tag definitions during application configuration.

  tags.flat_map { |i| [*i] }.filter_map do |tag|
    key, value_input = tag
    val = case value_input
          when nil then instance_exec(&taggings[key]) if taggings.has_key? key
          when Proc then instance_exec(&value_input)
          else value_input
    end
    "#{key}:#{val}" unless val.nil?
  end.join(",")

If a custom comment component is required, a Hash can be added to the components array representing the key/value pair for the tag. If a Proc is passed as a value, it will be called when the comment is constructed and can reference internal state via the context Hash.

    config.active_record.query_log_tags_enabled = true
    config.active_record.query_log_tags = [
      :application,
      { single_custom: -> { "static_in_proc" } },
      {
        multiple: "new",
        values: "added",
        at: -> { Time.now }
      }
    ]

  # /*application:MyApp,single_custom:static_in_proc,multiple:new,values:added,at:2021-07-27 12:57:40 +0100*/

Rails integration - ActionController & ActiveJob

The core benefit of Marginalia is being able to decorate SQL queries with details of the context of the source of the query. By default Marginalia will add comment components representing the controller or job context and this is achieved by making use of callback filters.

Marginalia updates the controller context with around_* callbacks:

  if respond_to?(:around_action)
    around_action :record_query_comment
  else
    around_filter :record_query_comment
  end
...
  def record_query_comment
    Marginalia::Comment.update!(self)
    yield
  ensure
    Marginalia::Comment.clear!
  end

A similar approach is followed in this PR:

  context_extension = ->(controller) do
    around_action :expose_controller_to_query_logs

    private
    def expose_controller_to_query_logs(&block)
      ActiveRecord::QueryLogs.set_context(controller: self, &block)
    end
  end

  ActionController::Base.class_eval(&context_extension)
  ActionController::API.class_eval(&context_extension)

In order for these context-specific components to function, Marginalia maintains references to the controller & job in Thread.current:

  def self.marginalia_controller=(controller)
    Thread.current[:marginalia_controller] = controller
  end

  def self.marginalia_controller
    Thread.current[:marginalia_controller]
  end

Rather than create a potentially unbounded set of keys in Thread.current, I have opted to create a hash of context information (to include the controller and job) and only store that context hash in Thread.current, accessible via ActiveRecord::QueryLogs#update_context & ActiveRecord::QueryLogs#set_context.

Adapter SQL execution interception

Finally, when the SQL comment is ready to be inserted into the query, Marginalia performs a series of checks before alias_method chaining into the appropriate adapter methods.

Some of the complexity in this approach results from supporting older versions of Rails. Given we won’t have the same backwards-compatibility requirements here, I have simplified the instrumentation to prepend ing the execution methods on any subclass of AbstractAdapter:

  ActiveSupport.on_load(:active_record) do
    ConnectionAdapters::AbstractAdapter.descendants.each do |klass|
      klass.prepend(QueryLogs::ExecutionMethods) if klass.descendants.empty?
    end
  end
...
  module ExecutionMethods
    def execute(sql, *args, **kwargs)
      super(self.class.add_query_log_tags_to_sql(sql), *args, **kwargs)
    end

    def exec_query(sql, *args, **kwargs)
      super(self.class.add_query_log_tags_to_sql(sql), *args, **kwargs)
    end
  end

Notable changes

Comment caching

Introduces the ability to cache the component elements of a query comment to avoid needing to rebuild the same string repeatedly during a request / job if the context and content does not change.

ActiveRecord::QueryLogs.cache_query_log_tags = true

The cached comment is stored per thread and is reset whenever the context is updated via #update_context, #set_context, or via an explicit operation:

ActiveRecord::QueryLogs.cached_comment = nil

Migration

Rails 7 upgrade guide content in progress: migrating from Marginalia. (to be rewritten / adjusted as necessary when this feature has been completed)

/cc @eileencodes & @arthurnn

@keeran
Copy link
Member Author

keeran commented May 17, 2021

Looking into the CI failures - seems some state has leaked between tests.

@keeran
Copy link
Member Author

keeran commented May 17, 2021

Pushed a change which only attempts to add the comment to the query if the feature is enabled. This is largely to check if anything else is having trouble in CI - currently investigating another approach to prepending the execution methods.

@keeran keeran marked this pull request as draft May 17, 2021 22:49
@keeran
Copy link
Member Author

keeran commented May 17, 2021

Back to draft while I look into problems with the railties and AR coverage.

@kaspth
Copy link
Contributor

kaspth commented May 17, 2021

Hey, thanks for working on this!

I'm finding the name QueryComments somewhat confusing. It sounds like we're enriching queries almost ala optimizer_hints? Second, a comment is conceptually closer to a sentence than a word, I think it gives the wrong impression of what this is used for (edit: hadn't spotted this came from Marginalia, but I still think we should revisit for this). I think of this as more annotations or tags — which leads me to:

This feels conceptually similar to config.log_tags = [ :request_id ] and TaggedLogging, is there a reason why we want to carve out new terminology versus doing something like QueryLogTags?

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assorted stuff + some style notes. My earlier comment about log_tags is more important, but I left this in case it's useful too.

We are also making Active Job and Action Controller depend on Active Record through the their railtie requires. @rafaelfranca knows more about what to do about that.

activerecord/lib/active_record/railties/query_comments.rb Outdated Show resolved Hide resolved
Comment on lines 185 to 187
def escape_sql_comment(str)
return str unless str.include?("*")
while str.include?("/*") || str.include?("*/")
str = str.gsub("/*", "").gsub("*/", "")
end
str
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we own the string at this point? Could we do this?

Suggested change
def escape_sql_comment(str)
return str unless str.include?("*")
while str.include?("/*") || str.include?("*/")
str = str.gsub("/*", "").gsub("*/", "")
end
str
end
def escape_sql_comment(comment)
comment.gsub!("/*", "")
comment.gsub!("*/", "")
comment
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same feature as sanitize_as_sql_comment. #35660

Copy link
Member Author

@keeran keeran May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied over the regexp approach from sanitize_as_sql_comment - because it's operating on the connection it was getting into a recursive call when establishing the connection, I think.

This could be avoided if the method was on the class, instead of the connection instance - could we consider changing that?

@keeran
Copy link
Member Author

keeran commented May 18, 2021

Hey, thanks for working on this!

I'm finding the name QueryComments somewhat confusing.

Thank you for the feedback @kaspth ! 🙇‍♂️ I'm not particularly bound to the naming, happy to adjust to QueryLogTags - the origin was that we're adding SQL comments to the query so I ran with it - naming things is hard :)

I'll work on your other suggestions along with other CI-related issues I need to address and then swing back to the rename so I can do that in one hit without confusing myself.

@keeran
Copy link
Member Author

keeran commented May 18, 2021

Looking into the railtie integration test failure, as I suspect this is caused by these changes:


LoadingTest#test_frameworks_aren't_loaded_during_initialization:

RuntimeError: active_job loaded!

    test/application/loading_test.rb:415:in `block (2 levels) in <class:LoadingTest>'

    test/application/loading_test.rb:414:in `block in <class:LoadingTest>'

@keeran keeran force-pushed the keeran/query-comments branch 4 times, most recently from de3db76 to 0f82995 Compare May 19, 2021 13:04
@keeran keeran changed the title Add Marginalia to Rails, via QueryComments Add Marginalia to Rails, via QueryLogTags May 19, 2021
@keeran keeran force-pushed the keeran/query-comments branch 3 times, most recently from a8f0e3d to b35b9cc Compare May 19, 2021 16:08
@keeran keeran marked this pull request as ready for review May 19, 2021 16:23
@keeran
Copy link
Member Author

keeran commented May 19, 2021

I've updated the name of the feature to QueryLogTags - will need to make changes to the migration content. Have also changed the test behaviour to prevent LogTags from showing up in other tests where they're not expected.

I think all other comments have been addressed in some shape or form :)

Hopefully the decision to inline the SQL sanitisation regex is OK in this instance, otherwise we can look into an alternative approach.

@keeran
Copy link
Member Author

keeran commented May 20, 2021

Updated to tighten up some of the terminology around comment -> tag in the lib and in tests. Have left in instances where it's referring to an actual SQL comment string. ✅

@keeran
Copy link
Member Author

keeran commented May 25, 2021

Thanks for the review @eugeneius - some great points. I'll work on them and keep the commit history around for a bit longer (and clean it up when we're happy) - apologies to everyone for making it harder to track what I've updated.

Will think some more on how we can keep close to the original API but also make it easier to add static (for now) components via application configuration vs reopening the module.

@rails-bot rails-bot bot added the railties label Jun 4, 2021
Co-authored-by: Eileen M. Uchitelle <eileencodes@users.noreply.github.com>
Co-authored-by: Kasper Timm Hansen <kaspth@gmail.com>
@kaspth kaspth merged commit 5422751 into rails:main Aug 9, 2021
@kaspth
Copy link
Contributor

kaspth commented Aug 9, 2021

Woop woop, welcome aboard! 🙌🎯

@eileencodes
Copy link
Member

🎉 Really excited for this!

@hmcfletch
Copy link
Contributor

Not sure if this will still be an issue with the port, but with the original gem I ran into some performance issues using the :line option. The scanning and regexing each line of the call stack for every DB query has a pretty significant cost.

If might be worth adding a note about potential performance impact when using the :line option.

I had opened an issue to add a note to the docs, but wasn't able to get it merged.

Note: Some more rigorous testing might be worthwhile, so it isn't just my say so. :p

It's going to be awesome that this is builtin now!!!

@keeran
Copy link
Member Author

keeran commented Aug 13, 2021

Hey @hmcfletch ! We opted to lose :line as a default tag, as we already have LogSubscriber in Active Record.

Comment on lines +157 to +158
when nil then instance_exec(&taggings[key]) if taggings.has_key? key
when Proc then instance_exec(&value_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm trying to update our app to replace Marginalia, and I'm having a bit of trouble with this instance_exec. My main issue it that sorbet really doesn't like it, and that's on us to fix, but ultimately I wonder if value_input.call(context) wouldn't be better.

First it would avoid exposing all private methods as an API, and also it would allow passing anything that responds to call, not just a proc.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll PR it.

Comment on lines +359 to +361
ConnectionAdapters::AbstractAdapter.descendants.each do |klass|
klass.prepend(QueryLogs::ExecutionMethods) if klass.descendants.empty?
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this might be a bit too brittle.

For context we use https://github.com/Shopify/activerecord-pedantmysql2-adapter, which is a subclass of MySQL2Adapter. This means MySQL2Adapter isn't tagged at all in our app.

We also used to have a fake adapter for test purpose that was causing trouble.

I also feel that having to patch both exec_query and execute is a bit of a smell, there must be a better way we can hook in there with a single entry point. @kaspth any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting! I would have expected the class to be picked up:

irb(main):009:0> ActiveRecord::Base.connection.class
=> ActiveRecord::ConnectionAdapters::PedantMysql2Adapter
irb(main):010:0> ActiveRecord::ConnectionAdapters::AbstractAdapter.descendants
=> [ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter, ActiveRecord::ConnectionAdapters::Mysql2Adapter, ActiveRecord::ConnectionAdapters::PedantMysql2Adapter]
irb(main):011:0> ActiveRecord::ConnectionAdapters::PedantMysql2Adapter.descendants.empty?
=> true
irb(main):012:0> ActiveRecord::QueryLogs.tags = [ :application ]
irb(main):013:0> ActiveRecord::QueryLogs.add_query_log_tags_to_sql("")
=> " /*application:MysqlDevApp*/"

I can't remember exactly why these two entry points were required - I think both are the bare minimum used across multiple adapters for different types of query, but not 100% sure. Would love to bring that down to a single place though!

On holiday right now (hah:), can dig more if needed later tonight :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On holiday right now (hah:), can dig more if needed later tonight :)

Don't worry. I'll have a look. Enjoy your vacations!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my main problem happens because the in the MySQLAdapter exec_query calls execute_and_free which calls execute.

I assume that's also why there's these return if sql.include?(comment), that really sound like a big smell to me.

I'll try to work on refactoring that to eliminate that double call.

@casperisfine
Copy link
Contributor

@keeran 👋 I'm working around this class again, and I'd like to question the usefulness of the inline_tags. Is this something you use? It complexity the implementation quite a bit, but seems like it could just be boolean checks inside the context, no?

@keeran
Copy link
Member Author

keeran commented Oct 26, 2021

Hi hi 👋 yes we use them for tagging specific types of queries - I'm not sure what you mean about replacing them with boolean checks though. They're a distinct set of tags used only within a block, outside of the broader / default set.

Maybe a rough example would help?

@casperisfine
Copy link
Contributor

Sure. What I mean is:

QueryLog.push_tag("foo") do
  QueryLog.push_tag("bar") do
    ....
  end
end

could potentially just be:

QueryLog.set_context(foo: true) do
  QueryLog.set_contex(bar: true) do
    ....
  end
end

I suppose you'd get foo:true, instead of just foo, but is it a big deal? I'm ok to keep that, but with 7.0 approaching I just want to make sure we're not offering APIs we might later regret.

@keeran
Copy link
Member Author

keeran commented Oct 26, 2021

Hmm unless things have changed, that wouldn't necessarily result in new tags within that block unless there were custom tags defined which looked for those context entries.

So for any adhoc inline tags we use in an app, we would have to predefine custom tags which look for those context entries and only add the tag when they were present, I think?

@casperisfine
Copy link
Contributor

So for any adhoc inline tags we use in an app, we would have to predefine custom tags which look for those context entries and only add the tag when they were present, I think?

Yes that's what I was suggesting. But again I saw no need for these tags in our app, so I might be missing some of their usefulness.

@keeran
Copy link
Member Author

keeran commented Oct 26, 2021

OK great thanks for explaining. I'm not sure what the knock-on effects would be of changing that annotation structure as there are observability / tooling systems involved which are outside of my domain. I will report back asap :)

@keeran
Copy link
Member Author

keeran commented Oct 26, 2021

I will report back asap :)

Sorry for the delay here. Can confirm we would be able to safely adapt to the approach you have suggested without too much trouble.

It makes sense to remove this adhoc tagging from QueryLogs if it cleans up the API and implementation, especially given the differences in approach (key=value vs plain string), and that#annotate exists. 👍

If it's just a case of removing the feature, would you like me to open a PR doing so?

@casperisfine
Copy link
Contributor

If it's just a case of removing the feature, would you like me to open a PR doing so?

If you'd like to do it sure! I'll happily merge. And thanks a lot for looking into this :)

@keeran
Copy link
Member Author

keeran commented Oct 26, 2021

OK cool thanks - I'll prepare our app to use update_context and custom tags etc (or set_context when it's ready) and then open the PR here so that we can update to main without any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants