Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple child associations in changeset trigger "has already been taken" error #4091

Closed
davidarnold opened this issue Feb 1, 2023 · 10 comments
Labels

Comments

@davidarnold
Copy link

davidarnold commented Feb 1, 2023

Elixir version

1.14.1

Database and Version

PostgreSQL 11.18

Ecto Versions

ecto 3.9.4, ecto_sql 3.9.2

Database Adapter and Versions (postgrex, myxql, etc)

postgrex 0.16.5

Current behavior

Minimal example demonstrating behavior:

defmodule Parent do
  use Ecto.Schema

  import Ecto.Changeset

  embedded_schema do
    field :name, :string

    has_many :children, Child
  end

  def changeset(parent, attrs) do
    cast(parent, attrs, [:name])
    |> cast_assoc(:children)
  end
end

defmodule Child do
  use Ecto.Schema

  import Ecto.Changeset

  embedded_schema do
    belongs_to :parent, Parent

    field :name, :string
    field :age, :integer
  end

  def changeset(child, attrs) do
    cast(child, attrs, [:name, :age])
  end
end

defmodule Test do
  import ExUnit.Assertions

  def run do
    params_with_missing_ids = %{
      "id" => "",
      "name" => "Bob",
      "children" => [
        %{
          "name" => "Timmy",
          "age" => "5"
        },
        %{
          "name" => "Sally",
          "age" => "7"
        }
      ]
    }

    params_with_nil_ids = %{
      "id" => "",
      "name" => "Bob",
      "children" => [
        %{
          "id" => nil,
          "name" => "Timmy",
          "age" => "5"
        },
        %{
          "id" => nil,
          "name" => "Sally",
          "age" => "7"
        }
      ]
    }

    params_with_empty_ids = %{
      "id" => "",
      "name" => "Bob",
      "children" => [
        %{
          "id" => "",
          "name" => "Timmy",
          "age" => "5"
        },
        %{
          "id" => "",
          "name" => "Sally",
          "age" => "7"
        }
      ]
    }

    data = %Parent{children: []}

    # Passes
    assert %{valid?: true} = Parent.changeset(data, params_with_missing_ids)

    # Passes
    assert %{valid?: true} = Parent.changeset(data, params_with_nil_ids)

    # Fails with "has already been taken" on second child ID
    assert %{valid?: true} = Parent.changeset(data, params_with_empty_ids)
  end
end

Test.run()

Expected behavior

Ecto supports the notion that empty string is equivalent to nil in some contexts. Specifically, HTML forms cannot submit a true nil, so features like the :empty_values option to Ecto.Changeset.cast/4 ease the burden of converting empty strings to nil.

The problematic behavior shown above arises in the attempt to validate uniqueness of multiple nested child records processed with Ecto.Changeset.cast_assoc/3. Although the changeset of the child records properly understands that an empty string should be treated as nil (e.g. when inserting an auto-increment ID field), the uniqueness detection process does not.

When there are new records involved (i.e. the child changeset action is :insert), Ecto.Changeset.Relation peeks into the incoming params and collects the values into a map of "seen PKs." No attempt to convert the empty IDs occurs other than a basic Ecto.Type.cast/2 to the :id type, which fails and subsequently falls back to returning the original empty string. Since these empty strings are not nil, it decides that any records past the first child are duplicates.

This problem has been reported before on Elixir Forum, without a conclusive solution:

Various unsatisfactory workarounds are possible:

  1. Remove the hidden ID field from the HTML form for child records, but only for new children since existing children need to roundtrip their ID.
  2. "Scrub" the incoming data before passing to Ecto, which would involve recursively rewriting the incoming params tree.
  3. Replace the :id type on child records with a custom Ecto type that converts empty string to nil rather than returning :error

It seems to me that the correct solution is for the uniqueness detection in cast_assoc to respect the same logic as the actual casting of the child records themselves, which is to say that somehow the empty strings should be understood as nil during PK collection. Supporting the :empty_values option (with default [""], even) on cast_assoc seems like a good idea to me, but I am seeking your expert opinion.

Thank you!

@josevalim
Copy link
Member

Thanks for the great report @davidarnold!

To play devil's advocate: someone could have an ID of an empty string somewhere and we would now ignore it, especially because IDs can be integers. At the moment, I would prefer to not generate an ID and Phoenix.HTML form helpers are smart enough to not generate an ID in said cases when using inputs_for.

@josevalim
Copy link
Member

Another reason why the above is preferable: we perform validation on "" => nil because those are user inputs. The ID is always generated by the programmer so it is reasonable to ask the programmer to not generate bad input in the first place. For example, we don't handle "" => nil on the change API either. :)

@davidarnold
Copy link
Author

Thanks for the great report @davidarnold!

😊 Thanks for the super speedy reply!

To play devil's advocate: someone could have an ID of an empty string somewhere and we would now ignore it, especially because IDs can be integers. At the moment, I would prefer to not generate an ID and Phoenix.HTML form helpers are smart enough to not generate an ID in said cases when using inputs_for.

We're using a component based form instead of the deprecated input helper approach of inputs_for. Aside from the implementation detail, it seems like you're advocating for the workaround I labeled 1) above, "Remove the hidden ID field from the HTML form for child records, but only for new children since existing children need to roundtrip their ID."

However, given the example Elixir Forum posts, it's pretty unexpected that the hidden ID field needs to make itself "unexist" from the HTML if and only if the underlying record is "new." It also makes it harder to share a form component between a new and edit template. And for what it's worth, even the edit template would be confusing if it allows new children to be added dynamically in a LiveView because the nested form for the existing children would need to include the ID field and the nested form for the new children would need to omit it.

Another reason why the above is preferable: we perform validation on "" => nil because those are user inputs. The ID is always generated by the programmer so it is reasonable to ask the programmer to not generate bad input in the first place. For example, we don't handle "" => nil on the change API either. :)

That sounds like it would also be an argument against :empty_values in cast, but [""] is the default and it is very useful.

To summarize my case, it is very unexpected only this one place in Ecto doesn't agree that the empty string is actually a nil. The example above with params_with_empty_ids works if there is only one child. Ecto converts both the parent and child empty string IDs to nil before it inserts. Again, only this one aspect of PK detection in associations disagrees with the rest of Ecto's behavior, which is why this is such a weird gotcha to developers.

In particular, there is big difference between how Ecto.Changeset.Relation peeks into raw params in the new child scenario instead of casted data for the existing child scenario. If it looked at the data, it would see the proper nil that has been set by cast. Because it is doing its own determination, it disagrees with cast. To be honest, I don't see why it doesn't try to look at the casted data in both scenarios. The current code's understanding of the PKs at play in an association is objectively incorrect ("I think we are about to insert empty strings") when compared to the actual behavior of cast and the ultimate INSERT of those params ("we are about to insert nils").

@josevalim
Copy link
Member

josevalim commented Feb 1, 2023

It also makes it harder to share a form component between a new and edit template.

It shouldn't. What would be wrong with:

<input :if={data.id} name="id" value={data.id} type="hidden" />

To summarize my case, it is very unexpected only this one place in Ecto doesn't agree that the empty string is actually a nil.

The point I was making is precisely that it is not the only place. Ecto chagnesets are built around a distinction on what is user input and what is programmatic input. The difference is that you think "id" is user input and I consider it to be programmatic.

@josevalim
Copy link
Member

If it looked at the data, it would see the proper nil that has been set by cast. Because it is doing its own determination, it disagrees with cast.

IDs are never cast. If it looked at the data, it would find nil instead of "". But it would also find nil if the ID was "123".

@davidarnold
Copy link
Author

It also makes it harder to share a form component between a new and edit template.

It shouldn't. What would be wrong with:

<input :if={data.id} name="id" value={data.id} type="hidden" />

I mean harder conceptually, not harder to type :)

This isn't what the generators output and it is non-trivial to figure out that this kind of approach is necessary when the only feedback on a standard hidden input without an :if conditional is a "has already been taken" error once they have more than one new child.

To summarize my case, it is very unexpected only this one place in Ecto doesn't agree that the empty string is actually a nil.

[...] The difference is that you think "id" is user input and I consider it to be programmatic.

Well, I don't see it this way, but I don't think the concept of user input vs. programmatic is clear.

If a field that is accepted by cast, then in some form or the other it seems like "user input" to me. It certainly seems like user input when the hidden field roundtrips a value for an existing child's ID, since that can be manipulated on the client side.

If on the other hand, you simply mean by "user input" that the field is visible and editable and by "programmatic" that it is invisible or has no provided design affordance to change it, then I'm not sure why Ecto is forcing a validation of the uniqueness of the association at all, since I should be able to "programmatically" ensure that there are no uniqueness conflicts.

I actually wouldn't mind being able to disable the checking by putting unique: false on the has_many relation, but that option is currently forbidden by @valid_has_options.

If it looked at the data, it would see the proper nil that has been set by cast. Because it is doing its own determination, it disagrees with cast.

IDs are never cast. If it looked at the data, it would find nil instead of "". But it would also find nil if the ID was "123".

Never cast...? Maybe there is a distinction I am missing in your usage, but this updated example seems fairly clear:

defmodule Parent do
  use Ecto.Schema

  import Ecto.Changeset

  @primary_key {:id, :id, autogenerate: true}

  embedded_schema do
    field :name, :string

    has_many :children, Child, on_replace: :delete
  end

  def changeset(parent, attrs) do
    cast(parent, attrs, [:name])
    |> cast_assoc(:children)
  end
end

defmodule Child do
  use Ecto.Schema

  import Ecto.Changeset

  @primary_key {:id, :id, autogenerate: true}

  embedded_schema do
    belongs_to :parent, Parent

    field :name, :string
    field :age, :integer
  end

  def changeset(child, attrs) do
    cast(child, attrs, [:id, :name, :age])
  end
end

defmodule Test do
  import ExUnit.Assertions

  def run do
    data = %Parent{
      name: "Bob",
      children: [
        %Child{id: 1, name: "Timothy", age: 5}
      ]
    }

    params = %{
      "name" => "Bob",
      "children" => [
        # Edited name on existing child
        %{
          "id" => "1",
          "name" => "Timmy",
          "age" => "5"
        },
        # New child, explicit new ID for the sake of example
        %{
          "id" => "123",
          "name" => "Sally",
          "age" => "7"
        },
        # New child, unspecified ID
        %{
          "name" => "Becky",
          "age" => "1"
        }
      ]
    }

    # Passes
    assert [
             %{id: 1, name: "Timmy"},
             %{id: 123, name: "Sally"},
             %{id: nil, name: "Becky"}
           ] =
             Parent.changeset(data, params)
             |> Ecto.Changeset.fetch_field!(:children)
  end
end

Test.run()

My point here is that if Ecto.Changeset.Relation were looking at the changeset that comes out of the cast instead of fishing around in the params, it would see [1, 123, nil] instead of ["1", "123", ""]

@josevalim
Copy link
Member

My point here is that if Ecto.Changeset.Relation were looking at the changeset that comes out of the cast instead of fishing around in the params, it would see [1, 123, nil] instead of ["1", "123", ""]

It doesn't work the way you describe. Ecto needs to find which data to give to Child.changeset/2 BEFORE the function is called. Therefore we cannot on the data after it because we need to match the IDs before. And, if for some reason we called it first, the ID would always be nil, as it is never cast.

@davidarnold
Copy link
Author

@josevalim The "never cast" statement was very confusing to me, but now I think I see the essence of the point you're trying to make.

Thanks for spending the time to explain things to me and for the update to the Ecto.Changeset.cast_assoc/3 documentation!

@LostKobrakai
Copy link
Contributor

@davidarnold Consider taking a look at phoenixframework/phoenix_live_view#2411. There are ways for the form to tell you which hidden inputs to place instead of adding them all the time and then needing to add conditionals to not render them in places they shouldn't show up.

@davidarnold
Copy link
Author

@LostKobrakai Thank you for the suggestion!

I copied your draft implementation into our project, removed our explicit hidden ID fields and everything works great. I also read through the relevant code and finally figured out that the for comprehension in form_for_hidden in the Ecto.Changeset implementation of Phoenix.HTML.FormData implicitly drops any nil valued primary keys. Very hard to spot!

I hope your new component makes it into a release soon 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants