Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing raw SQL with comments can be very slow #45011

Closed
yassenb opened this issue May 3, 2022 · 2 comments · Fixed by #45012
Closed

Executing raw SQL with comments can be very slow #45011

yassenb opened this issue May 3, 2022 · 2 comments · Fixed by #45012

Comments

@yassenb
Copy link

yassenb commented May 3, 2022

I noticed the problem after upgrading to Rails 6.1 which starts considering comments starting with --. We have some ugly tests executing raw SQL fixture files that have some comments in them and those tests started running so slowly they practically hang. The issue is the Regexp here: https://github.com/rails/rails/blob/main/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L39 Unfortunately I'm not so good with regular expressions to explain why this is slow but intuitively it's something to do with multiline matching and a lot of *-s causing an exponential number of backtracks when scanning the string. I tested with the corresponding regex in Java so it's not Ruby's regex implementation to blame. A faster solution should be to strip the comments in advance and then run the rest of the regex on the remaining SQL. The problem should boil down to the following script:

# simplified version of https://github.com/rails/rails/blob/main/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L39
r = Regexp.new('\A(?m:(--.*\n)*)*SELECT')
s = "-- SELECT 1\n"
r.match?(s * 17) # takes several seconds on my laptop

Steps to reproduce

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails", branch: "6-1-stable"
  gem "sqlite3"
end

require "active_record"
require "minitest/autorun"

# This connection will do for database-independent bug reports.
# Have to set replica: true so that `write_query?` is executed
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:", replica: true)

class BugTest < Minitest::Test
  def test_ignoring_comments_in_raw_sql
    Timeout::timeout(1) do
      sql = "-- select 1\n"
      ActiveRecord::Base.connection.execute(sql * 20)
    end
  end
end

Expected behavior

The comments in the raw SQL are ignored and the query runs fast

Actual behavior

The query runs exponentially slow, based on the amount of comments in it

System configuration

Rails version: 7-0-stable

Ruby version: 3.1.0

@matthewbauer
Copy link

I think removing the extra * improves things, but it is still pretty slow due to the m option.

@ghiculescu
Copy link
Member

@matthewbauer you're right - it seems to have improved a little bit, but not as much as removing m would.

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "benchmark-ips"
end

prev    = %r{(?:--.*\n)*|/\*(?:[^*]|\*[^/])*\*/}m # before https://github.com/rails/rails/pull/45012
current = %r{(?:--.*\n)|/\*(?:[^*]|\*[^/])*\*/}m
no_m    = %r{(?:--.*\n)|/\*(?:[^*]|\*[^/])*\*/}

input = "-- select 1\n" * 30

Benchmark.ips do |x|
  x.report("prev") { prev.match?(input) }
  x.report("current") { current.match?(input) }
  x.report("no m") { no_m.match?(input) }
  x.compare!
end
Warming up --------------------------------------
                prev   166.299k i/100ms
             current   167.387k i/100ms
                no m   649.890k i/100ms
Calculating -------------------------------------
                prev      1.588M (± 5.3%) i/s -      7.982M in   5.041297s
             current      1.648M (± 3.6%) i/s -      8.369M in   5.084507s
                no m      6.280M (± 4.1%) i/s -     31.845M in   5.079175s

Comparison:
                no m:  6280379.3 i/s
             current:  1648121.9 i/s - 3.81x  (± 0.00) slower
                prev:  1588447.8 i/s - 3.95x  (± 0.00) slower

Do you want to make a PR? It would be good to see if this breaks any tests, or if there's any extra tests worth adding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants