Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema cache in YAML #27042

Merged
merged 1 commit into from
Dec 13, 2016
Merged

Schema cache in YAML #27042

merged 1 commit into from
Dec 13, 2016

Conversation

kirs
Copy link
Member

@kirs kirs commented Nov 14, 2016

WTF is schema cache: Rails has support for storing the schema information in db/schema_cache.dump to avoid hitting database with SHOW FULL FIELDS. This feature was introduced in #5162.

The problem now is that we dump all many things into Marshal, including internal column classes.
When you try to read schema cache generated by Rails 4.2 in Rails 5.0:

uninitialized constant ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter::Column (NameError)

(because obviously, internal AR classes have been refactored).

Solution proposal

By serializing basic schema information into YAML instead of Marshal dump, we could make schema cache compatible between Rails versions and avoid exceptions like a described above.

Concerns

  1. I had to make connection.column_definitions method public
  2. Instead of messing with SchemaCache classes and its dynamic cache methods, I introduced PersistedSchemaCache that would be immutable.

review @rafaelfranca @sgrif

@rails-bot
Copy link

r? @eileencodes

(@rails-bot has picked a reviewer for you, use r? to override)

@kirs
Copy link
Member Author

kirs commented Nov 14, 2016

It would be great to get some feedback about the code while I'm fixing the CI failures.

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

def data_source_exists?(name)
@data_sources.include?(name)
end
alias table_exists? data_source_exists?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need those aliases here.

end

# Clears out internal caches
def clear!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not being called so why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a public method that someone may call, and since this class is immutable, we'd like to avoid that call.

@rafaelfranca
Copy link
Member

Another thing. Instead of custom methods we should implement init_with and encode_with to be able to call Yaml.load/dump

@kirs
Copy link
Member Author

kirs commented Nov 14, 2016

I don't know if it is a good thing to have this PersistedSchemaCache class. It is duplicating a lot of code that we already have in SchemaCache and there is no real gain for the users and neither in terms of code maintenance in the Rails codebase. Now instead of one place to change the schema cache you have to change two places.

That's a good point that I was going to discuss here.
SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

In a contrast, PersistedSchemaCache provides the cache that always has the complete (finite) knowledge about tables. It will never do any additional queries.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

@rafaelfranca
Copy link
Member

rafaelfranca commented Nov 14, 2016

SchemaCache has a very dynamic nature: it only knows about some tables that has been loaded, and when you request a table that wasn't loaded yet it will query that table. It means that the knowledge about tables that SchemaCache has is always incomplete.

That is exactly the behavior we want. If the SchemaCache is loaded by a old dump we don't want the application to raise an exception, so it is better that it query that table. The schema cache should behave like a cache, not as the ultimate source of truth. The database should be the ultimate source of truth.

That said, to make SchemaCache know about all tables we currently have to do con.data_sources.each { |table| con.schema_cache.add(table) } and that looks a bit like a hack.

I'd expect that the SchemaCache already have all information about the tables of that connection when duping and what is exactly what was being done https://github.com/rails/rails/pull/27042/files#diff-28a5ae383b291583c513ad8eeed99a3aL274, you just moved this "hack" to inside of the new class.

Another reason that we can't simply implement SchemaCache#encode_with is because SchemaCache stores the columns that are already initialized. In the YAML cache that would work between Rails versions we'd like to store the result of SHOW FULL TABLE, without internal Column classes.

I don't get it. We can write any format we want with encond_with, so we don't need to write the internal Column classes.

@kirs
Copy link
Member Author

kirs commented Nov 15, 2016

Imagine we get rid of PersistedSchemaCache and move into SchemaCache#init_with and SchemaCache#encode_with.

      def encode_with(coder)
        data_sources = @data_sources.keys
        coder['columns'] = data_sources.each_with_object({}) do |table_name, acc|
          # we still have to call `column_definitions` from `encode_with` to get raw column information
          # because we don't want to encode built Column objects
          acc[table_name] = connection.column_definitions(table_name).to_a
        end

        coder['primary_keys'] = data_sources.each_with_object({}) do |table_name, acc|
          acc[table_name] = primary_keys(table_name)
        end

        coder['version'] = ActiveRecord::Migrator.current_version
      end

This one would work, beside the fact that we'll still need to query SHOW FULL FIELDS to get the raw data about columns.

But I don't see a way to use init_with here:

      def init_with(coder)
        columns_by_table = {}
        columns_hash = {}

        payload.fetch(:columns).each do |table_name, columns|
          columns_by_table[table_name] = columns.map do |column_meta|
            # here, we need to call `new_column_from_field` but we don't have access to connection
            # from init_with
            connection.new_column_from_field(table_name, column_meta)
          end

          columns_hash[table_name] = Hash[columns_by_table[table_name].map { |col|
            [col.name, col]
          }]
        end
      end

Because connection is not available there.

self.connection_pool.schema_cache = cache.dup
cache = YAML.load(File.read(filename))
if cache.fetch(:version) == ActiveRecord::Migrator.current_version
schema_cache = ActiveRecord::ConnectionAdapters::PersistedSchemaCache.build_from_yaml(connection, cache)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need this line anymore

self.connection.schema_cache = cache
self.connection_pool.schema_cache = cache.dup
cache = YAML.load(File.read(filename))
if cache.fetch(:version) == ActiveRecord::Migrator.current_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is just cache.version again

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise LGTM

Is there any backward compatibility concerns by replacing the dump file?

@@ -267,17 +267,17 @@ db_namespace = namespace :db do
namespace :cache do
desc "Creates a db/schema_cache.dump file."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

open(filename, "wb") { |f| f.write(Marshal.dump(con.schema_cache)) }
conn.schema_cache.clear!
conn.data_sources.each { |table| conn.schema_cache.add(table) }
open(filename, "wb") { |f| f.write(YAML.dump(conn.schema_cache)) }
end

desc "Clears a db/schema_cache.dump file."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_cache.dump -> schema_cache.yml

@kirs
Copy link
Member Author

kirs commented Nov 21, 2016

@eugeneius thanks!

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of rubocop rules are broken in this PR. autocorrect should fix them.

Is there any backward compatibility concerns by replacing the dump file?

No. This actually make it backwards compatible. The only non-backward compatible problem we have now is Rails 5.1 not being able to load the dump file using marshal and the first deploy will have to generate the yaml version.

private

def schema_dump_path
'test/assets/schema_dump_50.yml'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file should be 'test/assets/schema_dump_5_1.yml'. I don't think it is going to be safe to backport this.

@kaspth
Copy link
Contributor

kaspth commented Nov 27, 2016

@rafaelfranca got it 👍

@kirs
Copy link
Member Author

kirs commented Nov 28, 2016

@rafaelfranca thanks for review! I updated the PR.

@kirs
Copy link
Member Author

kirs commented Dec 13, 2016

ping @rafaelfranca

@rafaelfranca rafaelfranca merged commit ddf81c5 into rails:master Dec 13, 2016
rafaelfranca added a commit that referenced this pull request Dec 13, 2016
@kaspth
Copy link
Contributor

kaspth commented Dec 13, 2016

BAM 🖐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants