Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row_count field to sql.active_record notification #50887

Merged

Conversation

marvin-bitterlich
Copy link
Contributor

Motivation / Background

In Intercom we are working on detecting queries with large result sets, as Vitess based databases (like in our case Planetscale) block queries that return more than (by default) 100k result set rows.

We have an internal patch for this that works well for us, but thought to contribute this upstream, so others can benefit

Detail

This Pull Request changes {*}Adapter.log, which emits sql.active_record notifications.

Previously the instrument call only logs information that goes into a request, but allows the caller inside the block to modify the payload. We make use of this inside each adapter to add a new field to each sql.active_record notification with the number of result rows being returned in each query.

This allows consumers of this notification (example) to consume this metric.

There is previous work in this regard with the RecordFetchWarning module that can be included, which demonstrates the need to monitor queries with large result sets. In our case these log lines weren't easy to integrate into our observability stack, and having a numerical value attached to each sql trace allowed us to use the full power of our observability suite.

This pr also includes three tests in the instrumentation test suite that verify this working for model instanciation queries, direct value queries like pluck and raw ActiveRecord::Base.connection.execute(sql) queries.

Performance

The change asks the resultset how many elements it has. Expectation was that that should be a constant lookup, and benchmarks suggest this to be true:

I have tested this change with examples/performance on adapters sqlite3, mysql2 and postgresql with no measurable difference in performance (less than 0.1% when standard deviation was 1%).

Additional information

I originally approached this inside abstract_adapter but it turns out every adapters Result uses a different method, so I moved the call into each adapters respective usage of log

RecordFetchWarning works by patching exec_queries, so it aggregates the result set size of multiple queries into one, which is less helpful for this case, as the limits imposed by Vitess are per query. Additionally, this does not support the case of executing raw SQL on the connection, which is something that this patch captures.

As for the field name on the payload row_count seemed like a good name as it follows the same pattern as result_count does in https://guides.rubyonrails.org/active_support_instrumentation.html#instantiation-active-record

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@marvin-bitterlich marvin-bitterlich force-pushed the marvin.bitterlich/result-set branch 3 times, most recently from e077e76 to 4e57153 Compare January 26, 2024 14:18
activerecord/test/cases/instrumentation_test.rb Outdated Show resolved Hide resolved
activerecord/test/cases/instrumentation_test.rb Outdated Show resolved Hide resolved
activerecord/test/cases/instrumentation_test.rb Outdated Show resolved Hide resolved
guides/source/active_support_instrumentation.md Outdated Show resolved Hide resolved
This field returns the amount of rows returned by the query that emitted the notification.    
This metric is useful in cases where one wants to detect queries with big result sets.
Copy link
Member

@eugeneius eugeneius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I'll leave it open a bit longer in case others have feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants