New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a multi-statement query when inserting fixtures #31422

Merged
merged 3 commits into from Jan 24, 2018

Conversation

Projects
None yet
7 participants
@Edouard-chin
Contributor

Edouard-chin commented Dec 12, 2017

Build a multi-statement query when inserting fixtures:

  • The insert_fixtures method can be optimized by making a single multi statement query for all fixtures having the same connection instead of doing a single query per table
    • The previous code was bulk inserting fixtures for a single table, making X query for X fixture files
    • This patch builds a single multi statement query for every tables. Given a set of 3 fixtures (authors, dogs, computers):
      # before
      %w(authors dogs computers).each do |table|
        sql = build_sql(table)
        connection.query(sql)
      end
    
      # after
    
      sql = build_sql(authors, dogs, computers)
      connection.query(sql)
  • insert_fixtures is now deprecated, insert_fixtures_set is the new way to go with performance improvement
  • My tests were done with an app having more than 700 fixtures, the time it takes to insert all of them was around 15s. Using a single multi statement query, it took on average of 8 seconds
  • In order for a multi statement to be executed, mysql needs to be connected with the MULTI_STATEMENTS flag, which is done before inserting the fixtures by reconnecting to da the database with the flag declared. Reconnecting to the database creates some caveats:
    1. We loose all open transactions; Inside the original code, when inserting fixtures, a transaction is open. Multple delete statements are executed and finally the fixtures are inserted. The problem with this patch is that we need to open the transaction only after we reconnect to the DB otherwise reconnecting drops the open transaction which doesn't commit all delete statements and inserting fixtures doesn't work since we duplicated them (Primary key duplicate exception)...
    • In order to fix this problem, the transaction is now open directly inside the insert_fixtures method, right after we reconnect to the db
    • As an effect, since the transaction is open inside the insert_fixtures method, the DELETE statements need to be executed here since the transaction is open later
    1. The same problem happens for the disable_referential_integrity since we reconnect, the FOREIGN_KEY_CHECKS is reset to the original value
    • Same solution as 1. , the disable_referential_integrity can be called after we reconnect to the transaction
    1. When the multi statement query is executed, no other queries can be performed until we paginate over the set of results, otherwise mysql throws a "Commands out of sync" Ref
    • Iterating over the set of results until mysql_client.next_result is false. Ref
  • Removed the active_record.sql "Fixture delete" notification, the delete statements are now inside the INSERT's one
  • On mysql the max_allowed_packet is looked up:
    1. Before executing the multi-statements query, we check the packet length of each statements, if the packet is bigger than the max_allowed_packet config, an ActiveRecordError is raised
    2. Otherwise we concatenate the current sql statement into the previous and so on until the packet is < max_allowed_packet
@rails-bot

This comment has been minimized.

rails-bot commented Dec 12, 2017

r? @kamipo

(@rails-bot has picked a reviewer for you, use r? to override)

@sgrif

sgrif approved these changes Dec 12, 2017

I'd like to see the commit message and PR description updated to make it more clear that this is moving from one query per table to one query per table sent as a single string. This generally looks good to me, though.

Only other point of note is that it's a shame that this has so much git churn. Perhaps we can keep the existing insert_fixtures definition as a separate method, which we call in a loop so there's less rightward drift and git churn?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Dec 12, 2017

Thanks for the quick feedback @sgrif ! I split my PR into 2 commits. The first one makes all change in order for the feature to work. The second is cosmetic and remove all trailing spaces that we don't need since they were part of a inner block that is now removed.

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
disable_referential_integrity do
transaction(requires_new: true) do
tables_to_delete.each { |table| delete "DELETE FROM #{quote_table_name(table)}", "Fixture Delete" }

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 12, 2017

Contributor

I don't know if people rely on the sql.active_record notification when fixtures are deleted, but otherwise the DELETE statements can also be part of the same query when inserting fixtures which would make things even faster

activerecord/lib/active_record/connection_adapters/abstract_mysql_adapter.rb Outdated
def insert_fixtures(fixture_set, tables_to_delete = [])
iterate_over_results = -> { while raw_connection.next_result; end; }
without_sql_mode("NO_AUTO_VALUE_ON_ZERO") do

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

This changes a session variable...

activerecord/lib/active_record/connection_adapters/abstract_mysql_adapter.rb Outdated
def insert_fixtures(*)
without_sql_mode("NO_AUTO_VALUE_ON_ZERO") { super }
def insert_fixtures(fixture_set, tables_to_delete = [])
iterate_over_results = -> { while raw_connection.next_result; end; }

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

Do no other adapters need something like this?

It doesn't feel like great layering for insert_fixtures to be touching raw_connection at all, really.

This comment has been minimized.

@sgrif

sgrif Dec 13, 2017

Member

No, MySQL is the only backend which forces you to care about what's in the network buffers, and manually flush it to avoid errors.

This comment has been minimized.

@matthewd

matthewd Dec 13, 2017

Member

Okay, cool. As I'm paranoid, I think I'd like to have a test that does a compound fixture load with an insert that succeeds followed by one that fails (inserting null into a not null column, say), and confirms we get the exception we're expecting. i.e., that all drivers [including out-of-tree] not only receive a compound result set without stuffing up the connection, but actually notice if a later result contains an error.

This comment has been minimized.

@sgrif

sgrif Dec 13, 2017

Member

For reference, the behavior that most other backends provide is that all queries are executed, but only the return value of the final query is provided. This is almost always handled at the client level, it's rare for the server to actually accept semicolon-delimited SQL strings (MySQL is the only one in this case AFAIK).

I agree we should have the test you've described though.

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

Ack, will take care of writing the test. Thanks for the explanations

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 14, 2017

Contributor

Done in 9173701

activerecord/lib/active_record/fixtures.rb Outdated
end
deleted_tables[conn] << table
end
fixture_sets_by_connection.each_value do |set|

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

each would give us conn for free

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

The fixture_sets_by_connection was original grouped by connection_id (now that I think of the variable name was confusing).

My original reasoning was that I didn't know how expensive it was to group by the connection itself. I probably overthought this though

activerecord/lib/active_record/fixtures.rb Outdated
conn.reset_pk_sequence!(fs.table_name)
table_rows.each do |table, rows|
unless deleted_tables[conn].include? table
deleted_tables[conn] << table

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

Is deleted_tables[conn] just table_rows_for_connection.keys?

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

Yes good catch !

activerecord/test/cases/fixtures_test.rb Outdated
@@ -64,7 +64,9 @@ def initialize
end
def call(_, _, _, _, values)
@events << values[:sql] if values[:sql] =~ /INSERT/
return unless values[:name] == "Fixtures Insert"

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

I think the conditional insertion is a bit easier to read than the double-negative of an early exit on such a simple method.

(Why the change of condition?)

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
end
disable_referential_integrity do
transaction(requires_new: true) do

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

Won't this break if there's a surrounding transaction?

I guess the reconnect would break such a transaction no matter what, actually. 😕

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

I guess the reconnect would break such a transaction no matter what, actually.

Yes the reconnect would break any open transaction, is this something we should be concerned about?

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
disable_referential_integrity do
transaction(requires_new: true) do
tables_to_delete.each { |table| delete "DELETE FROM #{quote_table_name(table)}", "Fixture Delete" }

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

Is it not just as slow to run lots of deletes? (and PK resets, for that matter)

activerecord/test/cases/fixtures_test.rb Outdated
assert_equal 1, subscriber.events.size, "It takes one INSERT query to insert two fixtures"
if current_adapter?(:Mysql2Adapter)

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

Beyond being ugly, we can't do this because other adapters use our test suite, and have their own name-quoting mechanisms.

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

TIL, didn't know other adapters uses rails test suite. For my own curiosity could you point which one, I'm curious to see how they do to reuse the test suite

activerecord/test/cases/fixtures_test.rb Outdated
assert_equal 1, subscriber.events.size, "It takes one INSERT query to insert two fixtures"
if current_adapter?(:Mysql2Adapter)
assert_match(/INSERT INTO `bulbs`.*INSERT INTO `authors`.*INSERT INTO `computers`/m, subscriber.events.first)

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

This loses [the assertion of] the previous bulk behaviour: that it all happens in a single INSERT per table, and not one per row -- this would pass even if we were sending a huge stream of single-row INSERTs.

This comment has been minimized.

@Edouard-chin

Edouard-chin Dec 13, 2017

Contributor

👍 , I decided to write a separate test instead

activerecord/test/cases/fixtures_test.rb Outdated
if current_adapter?(:Mysql2Adapter)
assert_match(/INSERT INTO `bulbs`.*INSERT INTO `authors`.*INSERT INTO `computers`/m, subscriber.events.first)
else
assert_match(/INSERT INTO "bulbs".*INSERT INTO "authors".*INSERT INTO "computers"/m, subscriber.events.first)

This comment has been minimized.

@matthewd

matthewd Dec 12, 2017

Member

If we're going to go to this much effort, maybe we should also consider other fast-insert mechanisms. PostgreSQL is much faster with COPY, for example.

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Dec 13, 2017

Thanks for the great review @matthewd and @sgrif !

Outlining this since it got hidden by GH after my push but it's not resolved.

If we're going to go to this much effort, maybe we should also consider other fast-insert mechanisms. PostgreSQL is much faster with COPY, for example.

Do we need to consider max_allowed_packet and/or similar settings for other backends?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 8, 2018

Hello friends and happy new year !

Trying to get back on this, there is 2 points that weren't yet addressed in my latest commit.

If we're going to go to this much effort, maybe we should also consider other fast-insert mechanisms. PostgreSQL is much faster with COPY, for example.

Will be happy to work on this, just need confirmation this is something we want for PostgreSQL. Do you think needs to be done inside the same PR ?

Do we need to consider max_allowed_packet and/or similar settings for other backends?

What do you think should be the appropriate behaviour if ER_NET_PACKET_TOO_LARGE is thrown? Shall we try to reconnect to the DB and try again with a smaller insert query, or just warn the user?

@rafaelfranca rafaelfranca added this to the 5.2.0 milestone Jan 8, 2018

@matthewd

This comment has been minimized.

Member

matthewd commented Jan 9, 2018

Do you think [PostgreSQL COPY] needs to be done inside the same PR ?

No, that was really just me thinking aloud, as a generalization of the "fixture insertion is worth strongly optimizing" thesis that this PR implies.

What do you think should be the appropriate behaviour if ER_NET_PACKET_TOO_LARGE is thrown? Shall we try to reconnect to the DB and try again with a smaller insert query, or just warn the user?

I'm not sure... it seems like a lot of bother for us to arrange a retry, but just telling the user that we can't insert their fixtures (while not really offering much they can do to solve the problem) doesn't seem terribly helpful, either. Are there a known default values for that limit? (e.g. the base MySQL default, plus any distro configured defaults across a handful of major Linuxes)

Might it be practical for us to proactively look up the limit, and then use it when combining the tables together? Outright giving up if a single table is too big feels perfectly fine to me, while applying a limit across the entire DB seems less so. And while it's more code than ignoring limits entirely, I suspect it'd be simpler/neater to always consider the limit than to change behaviour upon retry.

If it's reasonable to assume the limit is practically infinite, then I'm fine with just ignoring it, at least until we get reports of it causing problems for people. But this PR's premise is essentially that people have lots of tables; if (lots of tables) * (large-but-reasonable data set) is at risk of nudging the limit, it seems worth some consideration.

This changes a session variable... -- #31422 (comment)

... and won't that be immediately reset by the reconnect in with_multi_statements?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 10, 2018

Thanks for the detailed comment, it helped me greatly understand what you had in mind!

This changes a session variable... -- #31422 (comment)
... and won't that be immediately reset by the reconnect in with_multi_statements?

Good catch! You are right, this was immediately reset by the reconnect. Since no tests were broken I have opened another PR #31663 to remove it.
For now, I have modified this PR to reconnect first and then modify the session variable.

Might it be practical for us to proactively look up the limit, and then use it when combining the tables together?

It's a good idea and I implemented it in 899ae72 , it makes things way more easier than implementing a retry logic.

Last but not least I created a new insert_fixtures_set method in order to not touch the insert_fixtures one which was public API.

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 10, 2018

Ah shoot one test is failing on CI but not locally due to privileges

ActiveRecord::StatementInvalid: Mysql2::Error: Access denied; you need (at least one of) the SUPER privilege(s) for this operation: SET GLOBAL max_allowed_packet=16777216

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 16, 2018

Easiest solution is to grant super privileges to the rails user here

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 16, 2018

Also just to document this somewhere public, for the record there is a limit on PG. It's a limit at the protocol level, 2^31 bytes. Shouldn't be a problem for this use case unless someone has gigs of binary data in their fixtures.

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
transaction do
tables_to_delete.each { |table| delete "DELETE FROM #{quote_table_name(table)}", "Fixture Delete" }
packet_size = total_sql.bytesize
raise ActiveRecordError, "Fixtures set is too large (#{packet_size} bytes). Consider increasing the max_allowed_packet variable." if packet_too_large?(packet_size)

This comment has been minimized.

@matthewd

matthewd Jan 17, 2018

Member

I was hoping we could do this check a few lines above, and turn total_sql into an array of < max_allowed_packet-sized chunks.

activerecord/test/cases/fixtures_test.rb Outdated
error = assert_raises(ActiveRecord::ActiveRecordError) { create_fixtures(*fixtures) }
assert_match(/Fixtures set is too large/, error.message)
ensure
ActiveRecord::Base.connection.execute("SET GLOBAL max_allowed_packet=#{old_max_allowed_packet}")

This comment has been minimized.

@matthewd

matthewd Jan 17, 2018

Member

Does this take effect immediately, or only upon reconnection? I think I saw a suggestion it could be the latter somewhere. That's fine for the earlier one, because we already reconnect to do the insertion.. but might mean we need another reconnect here?

Alternatively, it may be simpler to create new fixture(s) bigger than the default limit (using erb; no need for big actual files)?

@matthewd

This comment has been minimized.

Member

matthewd commented Jan 17, 2018

Easiest solution is to grant super privileges to the rails user

I'd rather our test suite not require DB superpowers to run. If it's the best option, it'd be reasonable to do that for CI, but ensure the test just skips when running with less privileges... but all else being equal, it seems nice to have CI enforcing the "no privileged operations" rule.

PG [has] a limit at the protocol level, 2^31 bytes

👍 yep, that meets my definition of 'practically infinite' in this context. (I wouldn't be surprised if other DBs had very similar limitations at about that point... PG's is hardly the only protocol developed at a time when 32 bits was Big Enough.)

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 17, 2018

I was hoping we could do this check a few lines above, and turn total_sql into an array of < max_allowed_packet-sized chunks.

💯 , done in my latest commit

Split the total_sql into chunks:

  • Proactively looking for the max_allowed_packet and split the insert statement into smaller chunks to avoid mysql throwing a ER_NET_PACKET_TOO_LARGE
  • If a single table is too big, raising a preventive ActiveRecordError since we won't be able to execute the query anyway
  • Decided to stub the max_allowed_packet inside tests in order to not have to deal with super privileges. Added a memoization on the max_allowed_packet, my rational being that it's unlikely this value changes, but maybe I'm too naive /shrug
@sgrif

This comment has been minimized.

Member

sgrif commented Jan 17, 2018

I'd rather our test suite not require DB superpowers to run. If it's the best option, it'd be reasonable to do that for CI, but ensure the test just skips when running with less privileges...

The only other option is to leave the max_packet_size stuff on MySQL untested. We could perhaps have the test check if it has super privileges and skip if we don't? (We should assert that we're not on travis though)

@sgrif

sgrif approved these changes Jan 17, 2018

One comment. Looks fine to me once that's resolved, or made clear why it can't be resolved

def with_multi_statements
previous_flags = @config[:flags]
@config[:flags] = Mysql2::Client::MULTI_STATEMENTS
reconnect!

This comment has been minimized.

@sgrif

sgrif Jan 17, 2018

Member

Why does this require reconnecting? This should just require a call to mysql_set_server_option on the C side

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 18, 2018

Contributor

Nice, wasn't aware of mysql_set_server_option. I can't find any reference of it in mysql2 though :(

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

Yeah, looks like we might have to go with this for now, and then see about getting that method added to mysql2 in future.

(Maybe you can get it sneaked in to the upcoming release.)

This comment has been minimized.

@kamipo

kamipo Jan 18, 2018

Member

How about adding MULTI_STATEMENTS by default?
We already have FOUND_ROWS to make it the same as PostgreSQL's behavior. Related #25370 (comment).

def mysql2_connection(config)
config = config.symbolize_keys
config[:flags] ||= 0
if config[:flags].kind_of? Array
config[:flags].push "FOUND_ROWS".freeze
else
config[:flags] |= Mysql2::Client::FOUND_ROWS
end

This comment has been minimized.

@sgrif

sgrif Jan 18, 2018

Member

I'd prefer not to have MULTI_STATEMENTS by default. It has a lot of gotchas on MySQL which can cause really subtle bugs

yield
ensure
@config[:flags] = previous_flags
reconnect!

This comment has been minimized.

@sgrif

sgrif Jan 17, 2018

Member

Still shouldn't require reconnecting. I'll just leave this here (I assume mysql2 exposes that function somewhere)

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
values = fixtures.map do |fixture|
fixture = fixture.stringify_keys
fixture_set.each do |table_name, fixtures|

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

How about something more like:

fixture_inserts = fixture_set.map { |table_name, fixtures| build_fixture_sql(fixtures, table_name) }
total_sql = combine_multi_statements(fixture_inserts)

Where combine_multi_statements in abstract is [statements.join], and all the details of why some statements may not be combinable (i.e., there are length limits) can be pushed down to the MySQL adapter.

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
Arel.sql("DEFAULT")
disable_referential_integrity do
transaction do
tables_to_delete.each { |table| delete "DELETE FROM #{quote_table_name(table)}", "Fixture Delete" }

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

The above would also give us a handy helper to squish these together too

@@ -358,33 +358,33 @@ def insert_fixture(fixture, table_name)
def insert_fixtures(fixtures, table_name)

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

We can't kill/change this because it's documented, but we can deprecate it. Especially as it is itself fairly new, I don't think we have any strong motivation to keep it long term.

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 18, 2018

Contributor

Done in my latest commit

end
end
conn.insert_fixtures_set(table_rows_for_connection, table_rows_for_connection.keys)

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

Should we just leave the .keys up to that method to handle?

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 18, 2018

Contributor

conn.insert_fixtures_set(table_rows_for_connection, table_rows_for_connection)

felt weird to me and thought of just doing conn.insert_fixtures_set(table_rows_for_connection) but
since insert_fixtures_set is public API, if one wants to insert fixtures without deleting any tables, he wouldn't be able to anymore. WDYT?

def with_multi_statements
previous_flags = @config[:flags]
@config[:flags] = Mysql2::Client::MULTI_STATEMENTS
reconnect!

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

Yeah, looks like we might have to go with this for now, and then see about getting that method added to mysql2 in future.

(Maybe you can get it sneaked in to the upcoming release.)

@@ -373,6 +373,18 @@ def insert_fixtures(rows, table_name)
end
end
def insert_fixtures_set(fixture_set, tables_to_delete = [])
disable_referential_integrity do
transaction(requires_new: true) do

This comment has been minimized.

@matthewd

matthewd Jan 18, 2018

Member

Why is this requires_new but the other isn't?

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 18, 2018

Contributor

Good catch, it's my bad. The other should have it too. It was extracted from here

connection.transaction(requires_new: true) do

@dhh dhh assigned matthewd and unassigned kamipo Jan 18, 2018

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 18, 2018

Addressed the suggestion/comments. Thanks again for the time spent reviewing this ❤️

activerecord/lib/active_record/connection_adapters/abstract_mysql_adapter.rb Outdated
end
def max_allowed_packet
@max_allowed_packet ||= execute("SELECT @@SESSION.max_allowed_packet").first[0].to_i

This comment has been minimized.

@kamipo

kamipo Jan 18, 2018

Member

How about using @max_allowed_packet ||= show_variable("max_allowed_packet")?

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 18, 2018

Contributor

very cool, wasn't aware of this method!

activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Outdated
with_yaml_fallback(bind.value_for_database)
else
Arel.sql("DEFAULT")
total_sql = Array.wrap(combine_multi_statements(fixture_inserts, tables_to_delete))

This comment has been minimized.

@matthewd

matthewd Jan 19, 2018

Member

Sorry, that part was unclear:

I meant insert_fixtures_set can call combine_multi_statements a second time... or actually, even just combine them:

table_deletes = tables_to_delete.map { |table| "DELETE FROM #{quote_table_name table}" }
total_sql = Array.wrap(combine_multi_statements(table_deletes + fixture_inserts))

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 19, 2018

Contributor

👍 it will make things even faster. I was just unsure if people relied on the activerecord.sql notifications when deleting fixtures (#31422 (comment)).

This comment has been minimized.

@matthewd

matthewd Jan 19, 2018

Member

Yeah, I think it's fine for those to change; arguably people are more likely to be interested in the inserts than the deletes, and we're already radically changing those -- and no-one complained last time.

(If it's going to be doing deletes and inserts, I guess we should change the label on the execute to "Fixtures Load" or something, too.)

elsif previous_packet.nil?
false
else
(current_packet.bytesize + previous_packet.bytesize) > max_allowed_packet

This comment has been minimized.

@matthewd

matthewd Jan 19, 2018

Member

Do we need to allow for any padding etc here? "Packet" sounds to me like it could be including a surrounding struct of some sort. I just tried to check this experimentally, and ended up just confusing myself: in the mysql CLI tool, after setting max_allowed_packet to 1024, I was able to SELECT 'xxxx' for 10 KB of 'x's 😕

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 19, 2018

Contributor

I got confused when creating the PR as well and should have explained beforehand.

The packet message buffer is initialized to net_buffer_length bytes, but can grow up to max_allowed_packet bytes when needed. This value by default is small, to catch large (possibly incorrect) packets.

Because the net-buffer-length by default is set to 16384 bytes, setting the max_allowed_packet to 1024 has actually no effect. It will have effect only if max_allowed_packet is higher than the net-buffer-length, otherwise it doesn't need to shrink.

This comment has been minimized.

@matthewd

matthewd Jan 19, 2018

Member

Ah, thanks. After setting them both to 1024, I was able to run queries up to length 1022, plus a trailing semicolon and/or newline (omitting them didn't allow me to add another character, so I'm not counting them). I haven't dug in to work out whether that's a quirk of the mysql CLI (e.g., maybe it always adds a semicolon if one isn't present?)

That's certainly close enough that it clearly doesn't count any surrounding data... but can you perhaps experiment with this real code, and work out whether we need to give it a 1-2 byte buffer? I guess it's just a matter of starting with a pair of fixtures whose inserts sum to ~1030, confirming they get sent as two executes, then removing one character at a time down to ~1020 (which should be sent as a single multi), and ensuring that each one of those runs succeeds.

@matthewd

This comment has been minimized.

Member

matthewd commented Jan 19, 2018

This is looking great! Thank you for the time spent working on it! ❤️ ❤️

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 22, 2018

@matthewd Is there anything else you think is absolutely blocking this? We can fix the 1-2 byte margin of error in 5.2.1 IMO. The CI failure looked unrelated, but it was on AR:mysql2, which is definitely the thing that would be affected by this so I restarted the job. Are you good with merging once CI is green and we can fix the minor stuff that's left later?

@Edouard-chin Are you planning on wrapping up the changes to this today/tomorrow?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 22, 2018

Are you planning on wrapping up the changes to this today/tomorrow?

@sgrif Absolutely ! I'll address the last comment this afternoon

Edouard-chin added some commits Dec 12, 2017

Build a multi-statement query when inserting fixtures:
- The `insert_fixtures` method can be optimized by making a single multi statement query for all fixtures having the same connection instead of doing a single query per table
  - The previous code was bulk inserting fixtures for a single table, making X query for X fixture files
  - This patch builds a single **multi statement query** for every tables. Given a set of 3 fixtures (authors, dogs, computers):
  ```ruby
    # before
    %w(authors dogs computers).each do |table|
      sql = build_sql(table)
      connection.query(sql)
    end

    # after

    sql = build_sql(authors, dogs, computers)
    connection.query(sql)
  ```
- `insert_fixtures` is now deprecated, `insert_fixtures_set` is the new way to go with performance improvement
- My tests were done with an app having more than 700 fixtures, the time it takes to insert all of them was around 15s. Using a single multi statement query, it took on average of 8 seconds
- In order for a multi statement to be executed, mysql needs to be connected with the `MULTI_STATEMENTS` [flag](https://dev.mysql.com/doc/refman/5.7/en/c-api-multiple-queries.html), which is done before inserting the fixtures by reconnecting to da the database with the flag declared. Reconnecting to the database creates some caveats:
  1. We loose all open transactions; Inside the original code, when inserting fixtures, a transaction is open. Multple delete statements are [executed](https://github.com/rails/rails/blob/a681eaf22955734c142609961a6d71746cfa0583/activerecord/lib/active_record/fixtures.rb#L566) and finally the fixtures are inserted. The problem with this patch is that we need to open the transaction only after we reconnect to the DB otherwise reconnecting drops the open transaction which doesn't commit all delete statements and inserting fixtures doesn't work since we duplicated them (Primary key duplicate exception)...
    - In order to fix this problem, the transaction is now open directly inside the `insert_fixtures` method, right after we reconnect to the db
    - As an effect, since the transaction is open inside the `insert_fixtures` method, the DELETE statements need to be executed here since the transaction is open later
  2. The same problem happens for the `disable_referential_integrity` since we reconnect, the `FOREIGN_KEY_CHECKS` is reset to the original value
    - Same solution as 1. , the disable_referential_integrity can be called after we reconnect to the transaction
  3. When the multi statement query is executed, no other queries can be performed until we paginate over the set of results, otherwise mysql throws a "Commands out of sync" [Ref](https://dev.mysql.com/doc/refman/5.7/en/commands-out-of-sync.html)
    - Iterating over the set of results until `mysql_client.next_result` is false. [Ref](https://github.com/brianmario/mysql2#multiple-result-sets)
- Removed the `active_record.sql "Fixture delete"` notification, the delete statements are now inside the INSERT's one
- On mysql the `max_allowed_packet` is looked up:
  1. Before executing the multi-statements query, we check the packet length of each statements, if the packet is bigger than the max_allowed_packet config, an `ActiveRecordError` is raised
  2. Otherwise we concatenate the current sql statement into the previous and so on until the packet is `< max_allowed_packet`
@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 22, 2018

I squashed all my commits but the last one which address #31422 (comment)

Also modified the commit message to reflect all the changes since the PR was open

@matthewd

This comment has been minimized.

Member

matthewd commented Jan 23, 2018

@sgrif I'm worried that if we ship this with a yet-to-be-investigated possible 0.000006% chance of failure, we'll forget about it and it'll just [extremely occasionally] annoy people who get very unlucky. If it's hard to check, I'm fine with shipping as is.. but it seems like it should be pretty straight-forward to determine while we're here.

(Alternatively, we could just subtract two [or more] from the calculated length. That way the worst case scenario is that we possibly split something a few bytes too early; there's no real downside to underestimating the length available, compared to the possibility of an actual failure if we overestimate.)

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 23, 2018

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 23, 2018

So you were right @matthewd, played around with this real code, and we need a 2 bytes margin. Taking care of it now

Allow a 2 bytes margin:
- mysql will add a 2 bytes margin to the statement, so given a `max_allowed_packet` set to 1024 bytes, a 1024 bytes fixtures will no be inserted (mysql will throw an error)
- Preventing this by decreasing the max_allowed_packet by 2 bytes when doing the comparison with the actual statement size
@matthewd

This comment has been minimized.

Member

matthewd commented Jan 24, 2018

Awesome, thanks so much for checking that!

:shipit: :shipit: 🚢 🇮🇹

@matthewd matthewd merged commit bc26f30 into rails:master Jan 24, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Jan 24, 2018

Amazing! It was a fun feature to implement. Thanks all for your guidance

@Edouard-chin Edouard-chin deleted the Edouard-chin:multistatement-fixtures branch Jan 24, 2018

kamipo added a commit that referenced this pull request Jan 25, 2018

Add test case for deprecated `insert_fixtures`
Since #31422, `insert_fixtures` is deprecated.
with_yaml_fallback(bind.value_for_database)
else
Arel.sql("DEFAULT")
table_deletes = tables_to_delete.map { |table| "DELETE FROM #{quote_table_name table}".dup }

This comment has been minimized.

@pdobb

pdobb Jan 29, 2018

Why the .dup on the brand new string literal?

This comment has been minimized.

@kamipo

This comment has been minimized.

@Edouard-chin

Edouard-chin Jan 29, 2018

Contributor

Yeah Kamipo pointed the correct reason

Edouard-chin added a commit to Edouard-chin/mysql2 that referenced this pull request Mar 1, 2018

Expose the `mysql_set_server_option`:
- Use case: I'd like to be able to do multiple statements query without having to reconnect to the db first. Without this feature, if I want to do a multi statement query **after** the connection is established without the `MULTI_STATEMENTS` flag, I'd have to set the flag on the connection and reconnect
- One of the main motivation for this is because Rails is now inserting fixtures inside a multi-statements query. We used the workaround I described above, but it would be great if we could use the mysql function mysql_set_server_option . For more context [Ref](rails/rails#31422 (comment))
- Ref https://dev.mysql.com/doc/refman/5.5/en/mysql-set-server-option.html

Edouard-chin added a commit to Edouard-chin/mysql2 that referenced this pull request Mar 1, 2018

Expose the `mysql_set_server_option`:
- Use case: I'd like to be able to do multiple statements query without having to reconnect to the db first. Without this feature, if I want to do a multi statement query **after** the connection is established without the `MULTI_STATEMENTS` flag, I'd have to set the flag on the connection and reconnect
- One of the main motivation for this is because Rails is now inserting fixtures inside a multi-statements query. We used the workaround I described above, but it would be great if we could use the mysql function mysql_set_server_option . For more context [Ref](rails/rails#31422 (comment))
- Ref https://dev.mysql.com/doc/refman/5.5/en/mysql-set-server-option.html

sodabrew added a commit to brianmario/mysql2 that referenced this pull request Mar 1, 2018

Expose the `mysql_set_server_option`: (#943)
- Use case: I'd like to be able to do multiple statements query without having to reconnect to the db first. Without this feature, if I want to do a multi statement query **after** the connection is established without the `MULTI_STATEMENTS` flag, I'd have to set the flag on the connection and reconnect
- One of the main motivation for this is because Rails is now inserting fixtures inside a multi-statements query. We used the workaround I described above, but it would be great if we could use the mysql function mysql_set_server_option . For more context [Ref](rails/rails#31422 (comment))
- Ref https://dev.mysql.com/doc/refman/5.5/en/mysql-set-server-option.html

y-yagi added a commit to y-yagi/database_rewinder that referenced this pull request Aug 9, 2018

Add support for multi-statement query
Since Rails 5.2, fixtures use multi-statement query when inserting.
rails/rails#31422

However, since `database_rewinder` does not support multi-statement
query, tables inserted by fixture is not cleaned.
This patch also adds support to allow to retrieve the table where insert
was done even if a query is multi-statement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment