Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AR nullifies one of the primary_key if we nullify the belongs_to association with composite_keys #49671

Open
prog-supdex opened this issue Oct 17, 2023 · 9 comments

Comments

@prog-supdex
Copy link

Steps to reproduce

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails", branch: "main"
  gem "sqlite3"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :organizations do
  end

  create_table :items, primary_key: %i[id organization_id] do |t|
    t.integer :id, null: false
    t.integer :organization_id, null: false
    t.integer :group_id
  end
end

class Organization < ActiveRecord::Base
end

class Item < ActiveRecord::Base
  self.primary_key = %i[id organization_id]

  belongs_to :organization
  belongs_to :group,
             class_name: 'Item',
             optional: true,
             query_constraints: %i[group_id organization_id]
end

class BugTest < Minitest::Test
  def test_nullify_association
    organization = Organization.create!
    first_item = Item.create!(id: [1, organization.id])

    second_item = Item.create!(id: [2, organization.id])
    second_item.group = first_item
    second_item.save!

    second_item.group = nil
    
    refute_nil(second_item.organization_id)
    assert_nil(second_item.group_id)
  end
  
  def test_adding_association
    organization = Organization.create!
    first_item = Item.create!(id: [1, organization.id])

    second_item = Item.new
    second_item.group = first_item

    assert_nil(second_item.organization_id)
    assert_equal(second_item.group_id, first_item.id.first)
  end
end

Expected behavior

The expected behaviour for the test_nullify_association case is not nullifying the composite primary_keys of the Item record.

#<Item id: 2, organization_id: 1, group_id: nil>

The expected behaviour for the test_adding_association case doesn't add an organization_id

#<Item id: nil, organization_id: nil, group_id: 1>

Actual behavior

In the case test_nullify_association we get one of the Item's composite primary_keys organization_id as nil

#<Item id: 2, organization_id: nil, group_id: nil>

In the case test_adding_association we get filled field organization_id

#<Item id: nil, organization_id: 1, group_id: 1>

System configuration

Rails version: 7.1.1 and also main

Ruby version: 3.2.2

@fatkodima
Copy link
Member

You need to change your configuration to this for things to start working

diff --git a/bug_report.rb b/bug_report.rb
index ea70bb0..335c529 100644
--- a/bug_report.rb
+++ b/bug_report.rb
@@ -27,6 +27,7 @@ ActiveRecord::Schema.define do
     t.integer :id, null: false
     t.integer :organization_id, null: false
     t.integer :group_id
+    t.integer :group_organization_id
   end
 end

@@ -40,7 +41,7 @@ class Item < ActiveRecord::Base
   belongs_to :group,
              class_name: 'Item',
              optional: true,
-             query_constraints: %i[group_id organization_id]
+             query_constraints: %i[group_id group_organization_id]
 end

 class BugTest < Minitest::Test

@nvasilevski
Copy link
Contributor

This is intended behavior of the current API. Since query_constraints: option treats all of the column names equally - it has to nullify/populate all of them when it comes to assigning associations. This is to support a case like:

class Car < ApplicationRecord
  self.primary_key = [:make, :model]
end

class CarBrochure < ApplicationRecord
  belongs_to :car, query_constraints: [:car_make, :car_model]
end

And when you do car_brochure.car = nil you expect both car_make and car_model to be nullified as well as both of these column be populated when you assign a record like car_brochure.car = Car.new(make: "Tesla", model: "Model 3")

The challenge here is to know that organization_id is special and conveys more info than just "one of the query constraints columns", in ruby code Rails sees both columns as [col1, col2] and there is no additional info to which one of these columns is organization_id and it needs to be preserved.

I guess one thing we could try is to make Rails to look into presence of an association where organization_id is a single-column foreign key and avoid nullifying it but then it may either be slow performance-wise or it will break expectation of a setup like:

class CarBrochure < ApplicationRecord
  belongs_to :car_make, foreign_key: :car_make
  belongs_to :car, query_constraints: [:car_make, :car_model]
end

where we still want both car_make and car_model columns to be nullified even though car_make is being used in another application.

So to summarize, this is an unfortunate limitation of the current API. I think what we'd want in this case is to have a special option that can tell that organization_id has a special meaning in our setup, for example:

  belongs_to :group,
             class_name: 'Item',
             optional: true,
             foreign_key: :group_id,
             tenant_key: :organization_id

which will let Rails know that organization_id has a special meaning and it should be treated differently but still included in the SQL. But I have to say it's unlikely we'll see such API in Rails anytime soon since for Rails it would mean supporting some kind of tenant-based sharding which is a big feature on its own. Unless we can find a better name for "something that is not a foreign key but still should be present in SQL queries"

@palkan
Copy link
Contributor

palkan commented Oct 17, 2023

This is intended behavior of the current API.

I think what we'd want in this case is to have a special option

Maybe, we can leverage existing options? More options => more complexity.

I think, the real limitation and the source of confusion with the current API is the naming: when I see "query_constraints" I expect it to be used for building queries only, not for setting foreign keys.

One option I'd like to explore is making the following work as we expect (by "we" I mean @prog-supdex and myself 🙂):

belongs_to :group,
             class_name: 'Item',
             optional: true,
             foreign_key: :group_id,
             query_constraints: %i[group_id organization_id]

So, if the foreign_key is defined explicitly, it's used in #replace_keys method. Otherwise, we infer the foreign key from the query constrains as it is now.

For that, we may want to refactor the #foreign_key method (or introduce another one for building queries).

@nvasilevski
Copy link
Contributor

I see "query_constraints" I expect it to be used for building queries only, not for setting foreign keys.

Fair, it just happened to do both since we deliberately didn't extend foreign_key: option.

So, if the foreign_key is defined explicitly, it's used in #replace_keys method. Otherwise, we infer the foreign key from the query constrains as it is now.

I actually like this idea. I think somewhere deep in the design thoughts this is how I imagined it to be. And maybe this is the answer for the future if Rails ever allows passing foreign_key: as an array. foreign_key will configure both attribute replacement and querying while query_constraints: will be the way to control queries without touching the attribute setters.

One of the challenges we need to figure out is how to properly map the foreign_key: :group_id to Item#id column and not Item#organization_id since now we will have the following setup:

foreign_key = :group_id
query_constraints = [:group_id, :organization_id]
primary_key = [:id, :organization_id]

pk_column_group_id_fk_points_to = ??? # should be somehow resolved to `:id` 

So a few options:

pk_column_group_id_fk_points_to = (primary_key - query_constraints).first # => :id

Relies on the fact that "the other" column will serve as a tenant key and will be called the same for both query_constraints & primary_key

Or perhaps a little bit more verbose option where we derive the "tenant key" concept first:

tenant_key = (query_constraints - foreign_key).first
pk_column_group_id_fk_points_to = (primary_key - tenant_key).first

This heavily relies on the fact that PK/query constraints setup consists of 2 columns where one of the columns behaves like a tenant key but I believe this is going to be a very common setup so might be worth teaching Rails about supporting this particular use-case first.

Also I don't think we should use "tenant key" name in the actual code but I don't have a better name at the moment 🙃

@matthewd
Copy link
Member

This is intended behavior of the current API

I think I might s/intended/expected/ there -- it's certainly desired in the case of a true two+ part foreign key, but it does seem bad (and tbh not something that had occurred to me) in the common-tenant-key case. There's also a related edge case of "assigning a non-nil object that has a different tenant key", where "raise that your data model makes that impossible" is probably a more appropriate response than "silently reparent the current record".

It doesn't get us there globally, but perhaps a first cut (which I'm leaning toward calling a bug fix) would be to exempt columns derived from the parent model's query constraints from overwriting assignments in consumer associations: those are by definition expected to be shared with other associations and the model itself, and so should be rewritten directly when intended?


I also find the "foreign_key as subset of query_constraints" approach interesting (and API-available, as they're currently mutually exclusive) as a possible way of affecting this at the single association level... but yeah reconciling that with the association's :primary_key option, in a way that works for worst-case situations of a multi-part tenant key + multi-part locally-unique key, will need some thought.

@palkan
Copy link
Contributor

palkan commented Oct 18, 2023

to exempt columns derived from the parent model's query constraints from overwriting assignments in consumer associations

Yeah, that's about how we temporary solved it.

be shared with other associations and the model itself

That's interesting: what if take other associations into account and perform replacement depending on that? For example:

ActiveRecord::Schema.define do
  create_table :organizations do
  end

  create_table :items, primary_key: %i[id organization_id] do |t|
    t.integer :id, null: false
    t.integer :organization_id, null: false
    t.integer :group_id
  end
end

class Organization < ActiveRecord::Base
end

class Item < ActiveRecord::Base
  belongs_to :group,
             class_name: 'Item',
             optional: true,
             query_constraints: %i[group_id organization_id]
end

class OrganizedItem < Item
  belongs_to :organization
end

class PrimaryItem < Item
    self.primary_key = %i[id organization_id]
end

organization = Organization.create!
first_item = Item.create!(id: [1, organization.id])

# Here, we set both group_id and organization_id, we consider them to be a foreign key for the association
base_item = Item.new(group: first_item)
base_item.organization_id #=> first_item.organization_id 
base_item.group_id #=> first_item.id.first


# Here, we know that there is an association using organization_id as a foreign_key
organized_item = OrganizedItem.new(organization:, group: first_item)
organized_item.organization_id #=> organization.id
organized_item.group_id #=> first_item.id.first

organized_item.group = nil
# Stays as is because its referred by another association
organized_item.organization_id #=> organization.id
organized_item.group_id #=> nil

# Another feature—consistency check!
organized_item = OrganizedItem.new(group: first_item) #=> raise "Foreign key doesn't match: nil vs <>"

# Similarly, when we have a composite primary key 
primary_item = PrimaryItem.new(group: first_item)

# We shouldn't set the primary key from the association
primary_item.organization_id #=> nil
primary_item.group_id #=> first_item.id.first

# Not sure how to enforce consistency in this case, though

Seems rather complicated but.. the whole feature is like that; so, another option worth exploring.

@erikbelusic
Copy link

We ran into this recently as well (using the CPK gem) which is what I believe the native rails CPK code is based on. I then noticed the issue persisted all the way through to the port of the code into rails mainline.

Here is an issue I opened and subsequently closed once I realized what was going on. For our rails and CPK version we fixed this behavior temporarily with a monkey patch:

  module ActiveRecord
    module AttributeMethods
      module Write
        # Patched from https://github.com/composite-primary-keys/composite_primary_keys/blob/13.0.7/lib/composite_primary_keys/attribute_methods/write.rb
        # Method identical in https://github.com/composite-primary-keys/composite_primary_keys/blob/14.0.7/lib/composite_primary_keys/attribute_methods/write.rb
        def _write_attribute(attr_name, value) # :nodoc:
          # CPK
          if attr_name.kind_of?(Array)
            attr_name.each_with_index do |attr_child_name, i|
              child_value = value ? value[i] : value
              # BEGIN PATCH
              # Skip changing attributes that are part of the primary key unless the whole PK is being changed explicitly
              # or the part of the PK we want to change is currently null.
              next if composite? && self.class.primary_key.include?(attr_child_name) && public_send(attr_child_name).present?
              # END PATCH

              @attributes.write_from_user(attr_child_name.to_s, child_value)
            end
          else
            @attributes.write_from_user(attr_name.to_s, value)
          end

          value
        end
      end
    end
  end

I think from an user perspective, if columns in the relationship are shared with columns in the primary key, it would NOT be expected that your primary key values change. I propose the correct behavior is to only allow changing of one/some/all of the primary key columns when the value is currently null (since I believe PK columns must all be NOT NULL) and its likely you are trying to set/change them in that case. Once the PK columns have values, they should not change IMO. And if someone wishes to change them, they could technically still be modified directly.

@pjambet
Copy link
Contributor

pjambet commented Jan 23, 2024

We recently ran into a similar but different issue because we are not using a composite primary key, but a "regular" single column primary key named id:

(In this example, we're assuming a multi-tenant blog app where multiple blogs are stored, and each post and comment belong to a blog, with a blog_id tenant key)

# typed: false
# frozen_string_literal: true

require "bundler/inline"
require 'debug'

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails"
  # If you want to test against edge Rails replace the previous line with this:
  # gem "rails", github: "rails/rails", branch: "main"

  gem "sqlite3"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.integer(:blog_id)
  end

  create_table :comments, force: true do |t|
    t.integer(:blog_id)
    t.integer(:post_id)
  end
end

class Post < ActiveRecord::Base
  self.primary_key = :id
  query_constraints :id, :blog_id

  has_many :comments
end

class Comment < ActiveRecord::Base
  self.primary_key = :id
  query_constraints :id, :blog_id

  belongs_to :post
end

class BugTest < Minitest::Test
  def test_association_nullifies_additional_key
    post = Post.create!(blog_id: 1)
    comment = Comment.create(blog_id: 1, post: post)

    # This works, only changes `post_id`
    comment.update!(post_id: nil)
    assert_equal(1, comment.blog_id)

    # Now reset the comment
    comment.update!(post_id: post.id)
    comment.reload

    # This also changes `blog_id` on `comments`
    comment.update!(post: nil)
    assert_equal(1, comment.blog_id) # Fails
  end
end

I do agree with the suggestion from @matthewd, seems like that would work for us.


For context, I also tried a few workarounds with extra association options:

Explicitly configuring the association with the following restores the expected behavior to not override blog_id:

belongs_to :post, inverse_of: :comments, foreign_key: :post_id, primary_key: :id

But now calling comment.post only includes the WHERE posts.id = ? clause, without the blog_id one.

Attempting to restore this with the following causes two more issues:

belongs_to :post, inverse_of: :comments, foreign_key: :post_id, primary_key: :id, query_constraints: [:post_id, :blog_id]
  • Calling Comment.create(blog_id: 1, post: post) doesn't work as previously, when setting the post association, it overrides the blog_id value to nil, resulting in this query: INSERT INTO "comments" ("post_id") VALUES (?) RETURNING "id". Explicitly setting the post_id value works with Comment.create(blog_id: 1, post_id: post.id) as a temporary workaround for the test
  • The original behavior is back, calling comment.update!(post: nil) also nullifies blog_id on the comments row

@nvasilevski
Copy link
Contributor

nvasilevski commented Mar 14, 2024

Hey folks, we allocated some time to work on this and I have a proposal on how we are going to address it.
The proposal I'm going to share focuses on the long-term solution which will not be backported but it doesn't mean we won't consider any intermediate changes. It's just I believe that the long-term solution will benefit both users and maintainers.

The idea

Decouple query_constraints and foreign_key making query_constraints to only impact what SQL queries are issued and foreign_key to control how association attributes are assigned/reset along with controlling SQL if query_constraints is not set.

How we are going to achieve it

  1. (optional) Allow primary_key: option to accept an array. This is not mandatory but very natural change we should do. primary_key value under the hood can be composite so we should allow explicitly setting it to an Array as well. belongs_to association doesn't work with composite primary_key option #50850
  2. Allow foreign_key: option to accept an Array. Many people shared their frustration that using query_constraints to configure a "composite foreign key" feels unusual and not very conventional. Especially because at this moment query_contraints and foreign_key are strongly coupled and basically represent the same thing.
  3. Deprecate using query_constraints and ask applications to use foreign_key: [:col1, :col2] instead since it will result in exactly the same behavior and for many setups will make more sense. The intention of the deprecation is not to remove the option but warn about changing behavior of the option. Some apps may decide to keep using query_constraints if the new behavior is something they prefer more.
  4. Introduce a :nodoc: _query_constraints: option (temp name) which will be a decoupled version of the current query_constraints and eventually become the original query_constraints: option. We can introduce the _query_constraints early, as an undocumented feature allowing applications to try it but with no promises that the API will stay exact until it's actually released.

The approximate timeline for that would be to introduce deprecations in Rails 7.2 potentially along with the prototype of _query_constraints and Rails 8 will be released with decoupled foreign_key: and query_constraints association options

Here is a commit that shows the basic idea, it's far from being complete but sufficient to get reproduction scripts in this issue working with a slight and meaningful configuration change.
Shopify@5019f91

As was already mentioned, belongs_to :group will need to explicitly specify foreign_key as

  belongs_to :group,
             class_name: 'Item',
             optional: true,
             foreign_key: :group_id,
             query_constraints: %i[group_id organization_id]

making group association to be queried / preloaded with organization_id while organization_id won't be changed during association assignments.

And Comment.belongs_to :post will have to have an explicit foreign_key: :post_id, _query_constraints: [:post_id, :blog_id] to disable foreign key being derived from query_constraints. This is something that may not be necessary after we go through a deprecation cycle since foreign_key: :post_id is the default convention and should not require explicit value.

Let me know if you see any immediate issues or have any concerns, but on paper this seems to be a correct decision and honestly something that we aimed for from the beginning so I'd like to pursue it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants