Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change sqlite3 boolean serialization to use 1 and 0 #29699

Merged
merged 1 commit into from
Jul 12, 2017

Conversation

lugray
Copy link

@lugray lugray commented Jul 6, 2017

Summary

Abstract boolean serialization has been using 't' and 'f', with MySQL overriding that to use 1 and 0. This change is a first step towards unifying the behaviour in the abstract adapter to always use 1 and 0. Fixes #17062

Other Information

Beyond the goal of unifying the representations, this has the advantage that SQLite natively recognizes 1 and 0 as true and false, but does not natively recognize 't' and 'f'. PostgreSQL natively recognizes both representations (and others).

This change in serialization requires a migration of stored boolean data for SQLite/PostgreSQL databases, so it's implemented behind a configuration flag whose default false value is deprecated. The flag itself can be deprecated in a future version of Rails. While loaded models will give the correct result for boolean columns without migrating old data, where() clauses will interact incorrectly with old data.

r? @rafaelfranca

@rails-bot
Copy link

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @rafaelfranca (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@lugray lugray force-pushed the represent_boolean_as_integer branch from a5bc765 to ebd461d Compare July 6, 2017 19:58
@matthewd
Copy link
Member

matthewd commented Jul 6, 2017

PostgreSQL has a real boolean type; it shouldn't need any configuration or migration (unless we're particularly concerned about people who've explicitly marked a varchar column as being a boolean attribute, which I don't think we are).

I'm generally dubious about going out of our way to unify the adapters on a particular non-standard serialization, though. Strictly speaking the most correct form for the abstract adapter would be the (unquoted) TRUE and FALSE literals from SQL:1999... but few backends support that.

I guess my claim is that while a lot of DBMSes choose to emulate booleans with some form of 1/0 pair, it's still a decision they're each making individually.

@lugray
Copy link
Author

lugray commented Jul 7, 2017

PostgreSQL has a real boolean type; it shouldn't need any configuration or migration

You're right. Testing on PostgreSQL

irb(main):001:0> Example.create!(boolean_column: true)
   (0.2ms)  BEGIN
  SQL (0.5ms)  INSERT INTO "examples" ("boolean_column", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["boolean_column", "t"], ["created_at", "2017-07-07 11:08:57.470308"], ["updated_at", "2017-07-07 11:08:57.470308"]]
   (2.1ms)  COMMIT
=> #<Example id: 1, boolean_column: true, created_at: "2017-07-07 11:08:57", updated_at: "2017-07-07 11:08:57">
irb(main):002:0> Example.where(boolean_column: 0)
  Example Load (0.2ms)  SELECT  "examples".* FROM "examples" WHERE "examples"."boolean_column" = $1 LIMIT $2  [["boolean_column", 0], ["LIMIT", 11]]
=> #<ActiveRecord::Relation []>
irb(main):003:0> Example.where(boolean_column: 1)
  Example Load (0.3ms)  SELECT  "examples".* FROM "examples" WHERE "examples"."boolean_column" = $1 LIMIT $2  [["boolean_column", 1], ["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<Example id: 1, boolean_column: true, created_at: "2017-07-07 11:08:57", updated_at: "2017-07-07 11:08:57">]>

shows that the boolean type means that old data will work. I do think the change is worthwhile for SQLite for the reasons outlined in the original issue, and I think it's worthwhile to make the change at the abstract adapter level. I don't think it's going very far out of our way, and I contest the idea that 1 and 0 are a non-standard serialization of boolean values - they are the most standard serialization.

I'll write up a change to my documentation to make clear the differences between SQLite and PostgreSQL.

@matthewd
Copy link
Member

matthewd commented Jul 8, 2017

To be clear, I definitely agree we should make the change for SQLite -- I'm only questioning which adapter layer's responsibility it should be.

I contest the idea that 1 and 0 are a non-standard serialization of boolean values - they are the most standard serialization

They are the most common serialization; SQL:1999 (through SQL:2016) defines the standard representation.

While a conforming BOOLEAN type is rare amongst the major RDBMSes, some research suggests that MySQL does provide enough aliases to pretty effectively emulate it. Given that, I'm inclined to argue we should in fact be using TRUE and FALSE in the abstract adapter, and overriding them to 1 & 0 only for SQLite: we would thus be giving properly-conformant PostgreSQL what it wants, and be delegating the decision on how to emulate it to MySQL. Thus we only dirty ourselves with a manual approximation for SQLite.


To break out my thinking a little:

I'd prefer not to default to assuming abstract adapter subclasses want to use 1 & 0 to represent booleans (though a NumericBooleans module, or something, would seem like a fine idea).

I'd strongly prefer not to give Postgres ones and zeroes.. it just feels wrong to force it to accomodate the lowest common denominator, when it has a real actual type available -- and it seems too easy for the abstraction to leak anyway: anyone truly storing 0/1 will be happy with my_column = 1; PostgreSQL needs my_column = '1'. Switching to use proper values on PostgreSQL would improve type safety, which the current 't' & 'f' already leave wanting -- thus improving it at the same time as SQLite (and incidentally reducing risk in the change... though we could also meet that goal by just retaining 't' and 'f' down in PostgreSQLAdapter).

@lugray
Copy link
Author

lugray commented Jul 9, 2017

@matthewd Thanks for the extra explanation. My one concern with standardizing on TRUE/FALSE, is that while PostgreSQL's boolean will accept 'true'/'false' as quoted values, MySQL will support only TRUE/FALSE, 1/0, and '1'/'0': the only quotable value is the numeric version. This would leave us using TRUE/FALSE with quoted 'true'/'false' in the abstract, and specializing for both MySQL and SQLite in the same way, or with providing

def _quote(value)
  case value
  when true       then unquoted_true
  when false      then unquoted_false
  ...
end

which feels really wrong. What do you see as the best path forwards?

@matthewd
Copy link
Member

Ah, excellent point.

Hmm... I think it would be right for _quote(true) to return TRUE (without quotes, like it does for nil → NULL, and for integers) -- it calls it "quoted", but it actually means "string suitable for direct use in an SQL query as a value".

Interestingly, it looks like the MySQL adapter already does a variant on that: quoted_true returns QUOTED_TRUE, which is "1" -- a ruby string, but no SQL quotes.

As for _type_cast / unquoted_true etc, there we want a ruby value that the raw adapter will convert to the SQL value we need: for PostgreSQL, I believe that could/should be actual true and false. I'm not sure whether mysql2 knows how to convert those: if it does, great; if not, MySQL would still want them to be 1 and 0 integers, as they are now.

SQLite would indeed want what MySQL currently has: 1 and 0 for unquoted, and "1" and "0" for quoted -- pure integers as far as SQL is concerned, unlike the current t/f SQL char strings.

@lugray lugray force-pushed the represent_boolean_as_integer branch from a5ae466 to a93fffa Compare July 10, 2017 17:24
@lugray
Copy link
Author

lugray commented Jul 10, 2017

@matthewd thanks for all your help with this - I think it's in a far better state now.

@lugray lugray changed the title Change abstract boolean serialization to use 1 and 0 Change sqlite3 boolean serialization to use 1 and 0 Jul 10, 2017
@lugray lugray force-pushed the represent_boolean_as_integer branch 2 times, most recently from 46437ed to 033b1bb Compare July 10, 2017 17:58
@lugray
Copy link
Author

lugray commented Jul 10, 2017

Don't re-review yet. I'm working on changing the configuration to use ActiveRecord::ConnectionAdapters::SQLite3Adapter.represent_boolean_as_integer instead.

Copy link
Member

@matthewd matthewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this looks great!

Thank you for doing all the work -- the "sit on the side-lines and throw opinions" role is always much easier 😊

@@ -4,6 +4,8 @@
module ActiveRecord
module ConnectionAdapters # :nodoc:
module Quoting
QUOTED_TRUE, QUOTED_FALSE = "TRUE".freeze, "FALSE".freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip the constants.. with modern string freezing, it's actually (infinitesimally) slower than just doing it inline

# true. Conversion can be accomplshed by setting up a rake task which runs
#
# ExampleModel.where(boolean_column: true).update_all(boolean_column: 1)
# ExampleModel.where(boolean_column: false).update_all(boolean_column: 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth using "boolean_column = 't'" here, so it's valid regardless of the setting? (And actually, does this definitely work? Doesn't update_all do a typecast on the value?)

Either way, maybe we should plan to include a gist in the upgrade guide that'll generate this by scanning a schema. Especially as this only affects SQLite, it seems reasonable to assume their DB is small enough to obviate any more complex piecemeal migration strategy.

Perhaps a single update per table, in fact? UPDATE example_models SET boolean_one = CASE boolean_one WHEN 't' THEN 1 WHEN 'f' THEN 0 ELSE boolean_one END, boolean_two = CASE .. END;? It's uglier, but presumably more efficient... especially if no-one ever has to actually see it. 🤷‍♀️

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code definitely works, I've used it against a SQLite db. (Including your suggested modification - I like that it removes the order dependency between running it and setting the flag.) My concern with providing a gist that generates the code by examining the schema is that it necessarily makes assumptions that I'm not comfortable making. By providing example code, the responsibility of writing the conversion is with the people who understand their particular configuration, but we give them a good starting point.

initializer "active_record.check_represent_sqlite3_boolean_as_integer" do
unless config.active_record.represent_sqlite3_boolean_as_integer
ActiveSupport::Deprecation.warn <<-MSG
Setting the flag `represent_sqlite3_boolean_as_integer` to false is deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like our usual phrasing, but it occurs to me it could be confusing for people: almost everyone seeing this will be seeing it because they're not setting the flag at all -- so their (well, my, at least) first-instinct response to such a message (grep) won't help.

Any thoughts on an alternate phrasing? "Leaving the flag .. set to false", perhaps?

Setting the flag `represent_sqlite3_boolean_as_integer` to false is deprecated.
SQLite databases have used 't' and 'f' to serialize boolean values and must have
old data converted to 1 and 0 (its native boolean serialization) before setting
this flag to true. Conversion can be accomplshed by setting up a rake task which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: accomplshed

assert_equal "f", @conn.type_cast(false)

ActiveRecord::Base.represent_sqlite3_boolean_as_integer = true
assert_equal 0, @conn.type_cast(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should make sure to leave the flag in the state they found it.. while it's likely nothing else cares now, it could cause an accidental test-ordering dependency later.

@lugray lugray force-pushed the represent_boolean_as_integer branch 3 times, most recently from 0d5bfe0 to cc8d96b Compare July 10, 2017 20:05
@lugray
Copy link
Author

lugray commented Jul 10, 2017

@matthewd ready for another look. 🙂

Copy link
Member

@matthewd matthewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync up the three descriptive blobs, and add a changelog entry, and I think this is good to go 👌🏻

@lugray lugray force-pushed the represent_boolean_as_integer branch from cc8d96b to 9ba5a33 Compare July 11, 2017 11:31
@lugray
Copy link
Author

lugray commented Jul 11, 2017

Updated.

@eugeneius
Copy link
Member

Do we need another changelog entry here describing how the default serialization for booleans provided by AbstractAdapter has changed from 't'/'f' to TRUE/FALSE?

I don't know how changes that only affect third-party database adapters are typically documented, so the answer to my question may well be "no".

@yahonda
Copy link
Member

yahonda commented Jul 11, 2017

As a one of third party adapter Oracle enhanced adapter maintainer, let me show one of 3rd party adapter implementation.
Oracle database itself does not have native boolean type, Oracle enhanced adapter handles :boolean type as NUMBER(1) sql type by default.

Just FYI.

@lugray
Copy link
Author

lugray commented Jul 11, 2017

@yahonda That probably means you already override quoted/unquoted true/false and will be unaffected?

@yahonda
Copy link
Member

yahonda commented Jul 11, 2017

Yes. Oracle enhanced adapter implements (overrides) these 4 methods.

https://github.com/rsim/oracle-enhanced/blob/master/lib/active_record/connection_adapters/oracle_enhanced/quoting.rb#L100-L118

@rafaelfranca
Copy link
Member

Do we need another changelog entry here describing how the default serialization for booleans provided by AbstractAdapter has changed from 't'/'f' to TRUE/FALSE?

I think in this case it is fine to leave it outside the CHANGELOG. If a 3rd-party does not support native boolean type they would be already overriding the abstract adapter implementation, so it would be surprising if changing the behavior of the abstract adapter affects them.


```
ActiveRecord::ConnectionAdapters::SQLite3Adapter.represent_boolean_as_integer = true
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think the indentation coupled with the backticks are creating odd markdown for this part. If you click "view" on this file it shows the backticks inside a codeblock. I think if you un-indent this the extra backticks will go away. If not it's a bug in github's markdown and I'll report it to the appropriate team.

screen shot 2017-07-11 at 2 35 42 pm

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that catch! The problem was the unindented paragraph above this block which broke out of the list item. The format of the code block was correct for inside a list item, but incorrect for outside of one.

Abstract boolean serialization has been using 't' and 'f', with MySQL
overriding that to use 1 and 0.

This has the advantage that SQLite natively recognizes 1 and 0 as true
and false, but does not natively recognize 't' and 'f'.

This change in serialization requires a migration of stored boolean data
for SQLite databases, so it's implemented behind a configuration flag
whose default false value is deprecated. The flag itself can be
deprecated in a future version of Rails.  While loaded models will give
the correct result for boolean columns without migrating old data,
where() clauses will interact incorrectly with old data.

While working in this area, also change the abstract adapter to use
`"TRUE"` and `"FALSE"` as quoted values and `true` and `false` for
unquoted.  These are supported by PostreSQL, and MySQL remains
overriden.
@lugray lugray force-pushed the represent_boolean_as_integer branch from 9ba5a33 to 52e050e Compare July 11, 2017 18:52
@matthewd matthewd merged commit 58f10a3 into rails:master Jul 12, 2017
@lugray lugray deleted the represent_boolean_as_integer branch July 12, 2017 13:53
@rafaelfranca
Copy link
Member

👏

@ioquatix
Copy link
Contributor

ioquatix commented Jul 12, 2017

Awesome, thanks for doing this. We no longer use SQLite3 for testing but this is a good change none-the-less and should fix a host of issues (anyone trying to do boolean queries in the DB).

kamipo added a commit to kamipo/rails that referenced this pull request Jul 20, 2017
Since rails#29699, abstract boolean serialization has been changed to use
`TRUE` and `FALSE` literals. MySQL also support the literals.
So we can use the abstract boolean serialization even for MySQL.
@thisismydesign
Copy link

Just migrated to 5.2.0.rc1 to make use of this fix. It seems that Rails translates booleans to integers in queries by default without having turned on the configuration (Rails.application.config.active_record.sqlite3.represent_boolean_as_integer). Explicitly setting it to false resolves the issue.

Looking at the PR it is only set to true in railties/lib/rails/application/configuration.rb. Maybe this helps anyone wanting to look into it.

@matthewd
Copy link
Member

matthewd commented Feb 1, 2018

@thisismydesign thanks for the note. That should only be set if you call load_defaults with "5.2". Did you change that value as part of your upgrade process? Maybe we need to make it clearer that you shouldn't.

@thisismydesign
Copy link

@matthewd Thanks. Yes I did.

I think rather the default value of the setting is confusing. Going through the docs again it doesn't explicitly state what's the current default but the wording ("migrate and then explicitly set to true") suggests that the default is false. Also I'm not quite sure about this line then which is probably what people would find looking for the default.

@matthewd
Copy link
Member

matthewd commented Feb 1, 2018

Yeah, the ambiguity comes from the fact that the default is intended to differ between people generating a new 5.2 application (load_defaults 5.2 -> true), and people upgrading an existing app (load_defaults 5.1 [or 5.0, or not present at all] -> false).

load_defaults is pretty new, so I think we just need to give it some discussion in the upgrading guide.

@ioquatix
Copy link
Contributor

ioquatix commented Feb 1, 2018

Thanks everyone for your effort to fix this issue. It looks like the final solution is a great one given the context.

# (its native boolean serialization) before setting this flag to true.
# Conversion can be accomplished by setting up a rake task which runs
#
# ExampleModel.where("boolean_column = 't'").update_all(boolean_column: 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for anyone else that might have run into this and have tried to migrate a databare from f/t to 0/1:

If your migration specified a default for your boolean column this will continue to produce f/t. The columns default value also needs to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants