-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActiveSupport::Cache serialization bug, possible connection to ActiveStorage #36522
Comments
In the process of documenting this issue, I dug a little deeper into the underlying cause, which appears to materialize during Marshal.dump serialization
Until I dig further into the Rails codebase, I'm not exactly certain where the issue arises. (ActiveStorage attachments seem to be the trigger, but the root cause may be deeper, e.g. related to how ActiveRecord objects are marshalled.) If others feel this behavior represents a bug in the Rails codebase, I'd be excited to work on & submit a pull request that resolves it. |
@alipman88 I am still learning Ruby and this is my first attempt at making a contribution. I used your code and started a whole new Rails app and can duplicate your issue. I have tried a bunch of different things to get the code to work but still can't. I did find some issues people had in the older Rails versions as well as some workaround people have used: Hopefully with your expertise, these links will help you more than me. |
Thanks for taking the time to replicate, @ladybando, and for doing some research. One of the articles you shared (https://alisdair.mcdiarmid.org/fixing-rails-auto-loading-for-serialized-objects/) deals with a different serialization format, but it also makes me think that perhaps the problem I'm experiencing may be deeper than Rails, and might indicate an issue in the way Ruby serializes & deserializes objects via Marshal. (Although it feels kind of unlikely to discover a bug in the Ruby language itself.) I'm going to leave this issue open until I get to the bottom of it, but if someone on the Rails team feels this isn't a Rails issue please feel free to close it. |
@alipman88 I was able to reproduce the issue, and it looks like it has to do with how ActiveRecord objects are being marshalled. |
OK, I've isolated the issue into an executable test case. require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "rails", "6.0.0.rc1"
gem "sqlite3"
end
require "active_record"
require "active_storage"
require "active_storage/reflection"
require "active_support"
require "minitest/autorun"
require "logger"
ActiveSupport.on_load(:active_record) do
include ActiveStorage::Reflection::ActiveRecordExtensions
ActiveRecord::Reflection.singleton_class.prepend(ActiveStorage::Reflection::ReflectionExtension)
require "active_storage/../../app/models/active_storage/attachment"
end
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Schema.define do
create_table "active_storage_attachments", force: :true do |t|
t.string "name", null: false
t.string "record_type", null: false
t.integer "record_id", null: false
t.integer "blob_id", null: false
t.datetime "created_at", null: false
t.index ["blob_id"], name: "index_active_storage_attachments_on_blob_id"
t.index ["record_type", "record_id", "name", "blob_id"], name: "index_active_storage_attachments_uniqueness", unique: true
end
create_table "active_storage_blobs", force: :true do |t|
t.string "key", null: false
t.string "filename", null: false
t.string "content_type"
t.text "metadata"
t.bigint "byte_size", null: false
t.string "checksum", null: false
t.datetime "created_at", null: false
t.index ["key"], name: "index_active_storage_blobs_on_key", unique: true
end
create_table "users", force: :true do |t|
t.string "name"
end
add_foreign_key "active_storage_attachments", "active_storage_blobs", column: "blob_id"
end
class User < ActiveRecord::Base
include ActiveStorage::Attached::Model
has_one_attached :avatar
end
ActiveRecord::Base.connection.execute("INSERT INTO users (name) VALUES ('aaron')")
########################################################################
# In both the following tests, deserialized_array[1] should match the
# original nil value of original_array[1] after marshaling &
# de-marshaling.
class BugTest < Minitest::Test
def test_marshal_1
user = User.first
original_array = [user, nil]
serialized_array = Marshal.dump(original_array)
deserialized_array = Marshal.load(serialized_array)
assert_nil deserialized_array[1]
end
def test_marshal_2
user = User.first
user.avatar
original_array = [user, nil]
serialized_array = Marshal.dump(original_array)
deserialized_array = Marshal.load(serialized_array)
assert_nil deserialized_array[1]
end
end I believe what we're looking at here is part of an array's first element overflowing into subsequent elements during marshalization/de-marshalization. |
OK, so examining at the first several characters of the Marshal.dump output from the above test case, I'm seeing what looks like a serialized instance of the user object, with 11 instance variables: user = User.last user.avatar Marshal.dump(user)[0...10] => "\x04\bo:\tUser\x10" In the above output, Let's take a look at what happens when we record the user object's instance variables, before and after marshallization: user = User.last ; user.avatar ; ivars_before_marshallization = user.instance_variables ; marshalled_user = Marshal.dump(user) ivars_after_marshallization = user.instance_variables ivars_before_marshallization => [:@new_record, :@attributes, :@association_cache, :@primary_key, :@readonly, :@destroyed, :@marked_for_destruction, :@destroyed_by_association, :@_start_transaction_state, :@transaction_state, :@active_storage_attached_avatar] ivars_after_marshallization => [:@new_record, :@attributes, :@association_cache, :@primary_key, :@readonly, :@destroyed, :@marked_for_destruction, :@destroyed_by_association, :@_start_transaction_state, :@transaction_state, :@active_storage_attached_avatar, :@attachment_changes] So, when we started marshallizing our user object, it only had eleven instance variables. But, a twelfth instance variable, :@attachment_changes, was set during the marshallization process. However, because our user object was initially recorded as only having eleven instance variables, in my test case the twelfth overflows into subsequent elements of the array upon demarshallization. We can actually isolate this behavior in plain Ruby: require 'test/unit'
class Foo
attr_accessor :bar, :baz
def initialize
self.bar = Bar.new(self)
end
end
class Bar
attr_accessor :foo
def initialize(foo)
self.foo = foo
end
def marshal_dump
self.foo.baz = :problem # uh oh!
{foo: self.foo}
end
def marshal_load(data)
self.foo = data[:foo]
end
end
class BugTest < Test::Unit::TestCase
def test_marshalization
array = [Foo.new, nil]
marshalled_array = Marshal.dump(array)
demarshalled_array = Marshal.load(marshalled_array)
assert_nil demarshalled_array[1]
end
end So, this could be viewed as unexpected behavior within Ruby itself, although I'd argue that it's not necessarily a Ruby bug: Ruby gives programmers a lot of power in allowing custom marshal_dump & marshal_load methods - but it's on us to implement them without unintended side-effects. Anyway, there's the root cause. I'd like to take a few days to think over the most elegant solution, and hope to submit a patch/pull request shortly. |
Hi @alipman88, thanks for sharing your process and discoveries :) I attempted to write an executable test case but failed to properly load ActiveStorage, thanks for sharing that. After reading what you shared, I noticed that the instance variable that is getting set on marshal_dump is def attachment_changes #:nodoc:
@attachment_changes ||= {}
end I'm guessing this method is being called from I attempted just that, and I moved the initialization to a Here's a PR with that fix: https://github.com/rails/rails/pull/36583/files I noticed there's a couple of tests that are breaking because of this change. I'm not sure if it's the best way to tackle this, but I'm sharing it in case it's a direction worth exploring PS. Any tips on trying to understand the rails booting/initialization process? Thanks! |
Thanks for the proposed fix, @rodrei. That would resolve my immediate issue, but it seems possible there may be other instance variables that could get added during marshallization of an ActiveRecord object's associations/attachments. I'd like to explore editing the custom marshal_dump methods within the Rails codebase to ensure that no instance variables get added or removed during marshallization. I'm afraid I don't have much insights into the Rails booting/initialization process (this is actually the first time I've taken a deep look into the Rails codebase) - In order to load ActiveStorage, I used issue 35200 as a starting point, and got my test case working with some trial and error. Anyway, after deliberating, I submitted a bug report to the Ruby team: https://bugs.ruby-lang.org/issues/15968 @nobu quickly added a fix to the ruby 2.7.0-dev branch, which raises an error if an instance variable is added or removed during marshallization. (Thanks!) This resolves the underlying Ruby issue, but I lean in favor of making a corresponding update to the Rails codebase - when running my above test case on the 2.7.0-dev branch of Ruby, it now raises a RuntimeError, as instance variables can no longer be added or removed during marshallization:
|
@alipman88 Nice! Thanks for the update! It was a good call to submit the issue to the Ruby team. On the idea to edit the custom I think something we could do is add some kind of safeguard to check if an instance variable gets added during a marshal_dump, and if it happens, then raise an exception. Pretty much implementing what Nobu did on ruby 2.7, but doing it on Rails (ActiveSupport maybe?). |
@alipman I'm so happy the Rails team added a fix. I know you added some code above that I am going to review but can you tell me exactly what steps you took to uncover the issue initially? How did you find out it was a marshallization issue? |
Hi @ladybando, I first started looking into a marshallization issue after examining the source code contained in activesupport/lib/active_support/cache I never previously considered what format Rails might use to store cached entries, but taking a look at the
This, combined with the unexpected behavior I was observing (the instance variables of objects within an array "leaking" into subsequent array elements upon demarshallization) led me to realize I was looking at an underlying bug within the Ruby programming language, rather than Ruby on Rails. |
I initially guessed this behavior might be triggered by custom Calling user.avatar sets the user's @active_storage_attached_avatar instance variable to an instance of ActiveStorage::Attached::One This instance of ActiveStorage::Attached::One delegates missing methods to its The So basically, any time a class uses I think there are a few paths towards resolving this: 1.) Don't bother changing anything - future versions of Ruby will raise a RuntimeError, which at least may point Rails developers encountering this issue in the direction of their own solution. 2.) Patch the classes using the 3.a.) Patch the 3.b.) Add an
I've submitted a pull request going with option 3.a., but I'd appreciate any input from others (especially @gsamokovarov, the author of the |
Exclude missing marshal_dump and _dump methods from being delegated to an object's delegation target via the delegate_missing_to extension. This avoids unintentionally adding instance variables to an object during marshallization, should the delegation target be a method which would otherwise add them. In current versions of Ruby, a bug exists in the way objects are marshalled, allowing for instance variables to be added or removed during marshallization (see https://bugs.ruby-lang.org/issues/15968). This results in a corrupted serialized byte stream, causing an object's instance variables to "leak" into subsequent serialized objects during demarshallization. In Rails, this behavior may be triggered when marshalling an object that uses the delegate_missing_to extension, if the delegation target is a method which adds or removes instance variables to an object being marshalled - when calling Marshal.dump(object), Ruby's built in behavior will check whether the object responds to :marshal_dump or :_dump, which in turn triggers the delegation target method in the responds_to_missing? function defined in activesupport/lib/active_support/core_ext/module/delegation.rb While future versions of Ruby will resolve this bug by raising a RuntimeError, the underlying cause of this error may not be readily apparent when encountered by Rails developers. By excluding marshal_dump and _dump from being delegated to an object's target, this commit eliminates a potential cause of unexpected behavior and/or RuntimeErrors. Fixes rails#36522
…legate_missing_to Exclude marshal_dump & _dump methods from being delegated via delegate_missing_to extension, fix #36522
Exclude missing marshal_dump and _dump methods from being delegated to an object's delegation target via the delegate_missing_to extension. This avoids unintentionally adding instance variables to an object during marshallization, should the delegation target be a method which would otherwise add them. In current versions of Ruby, a bug exists in the way objects are marshalled, allowing for instance variables to be added or removed during marshallization (see https://bugs.ruby-lang.org/issues/15968). This results in a corrupted serialized byte stream, causing an object's instance variables to "leak" into subsequent serialized objects during demarshallization. In Rails, this behavior may be triggered when marshalling an object that uses the delegate_missing_to extension, if the delegation target is a method which adds or removes instance variables to an object being marshalled - when calling Marshal.dump(object), Ruby's built in behavior will check whether the object responds to :marshal_dump or :_dump, which in turn triggers the delegation target method in the responds_to_missing? function defined in activesupport/lib/active_support/core_ext/module/delegation.rb While future versions of Ruby will resolve this bug by raising a RuntimeError, the underlying cause of this error may not be readily apparent when encountered by Rails developers. By excluding marshal_dump and _dump from being delegated to an object's target, this commit eliminates a potential cause of unexpected behavior and/or RuntimeErrors. Fixes #36522
Steps to reproduce
1.) Create a new Rails app
2.) Set
config.cache_store = :memory_store
inconfig/test.rb
(andconfig/development.rb
, to replicate via Rails console)3.) Install Active Storage
4.) Create a model with an Active Storage attachment - we'll use a User model with an attached avatar for this example:
5.) Add fixtures to populate testing DB:
Or, if reproducing via the Rails console, add seed data via
db/seeds.rb
:6.) The following integration tests demonstrate the unexpected behavior. In both cases, a block is passed to
Rails.cache.fetch
, the last line of which contains a two-element array: The first element of the array is an array of users, and the second element can be any object - we'll use the integer 0 for demonstration.However, in the second test, when calling
Rails.cache.read
to look up the previously created entry, the second element of the array is not equal to that passed toRails.cache.fetch
(0).This may be replicated via the Rails console, if the database has been seeded with multiple user records:
Here's a fresh Rails app containing the above code: https://github.com/alipman88/rails-cache-bug
Expected behavior
The evaluated value of last line of a block passed to Rails.cache.fetch should be that of the cache entry that gets created.
Actual behavior
The value of the cache entry created appears to be altered during serialization via Marshal.dump.
System configuration
Rails version: 6.0.0.rc1
Ruby version: 2.6.3
The text was updated successfully, but these errors were encountered: