Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(upsert): add conflictFields option #13723

Merged
merged 7 commits into from Dec 11, 2021

Conversation

wbourne0
Copy link
Member

@wbourne0 wbourne0 commented Nov 29, 2021

Adds support for the conflictFields option to Model.upsert.
This is used for options.upsertKeys in QueryInterface.prototype.upsert if provided for specifying the fields
used at ON CONFLICT({fields}) rather than relying on the default logic.

Pull Request Checklist

Please make sure to review and check all of these items:

  • Have you added new tests to prevent regressions?
  • Does npm run test or npm run test-DIALECT pass with this change (including linting)?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Did you update the typescript typings accordingly (if applicable)?
  • Does the description below contain a link to an existing issue (Closes #[issue]) or a description of the issue you are solving?
  • Did you follow the commit message conventions explained in CONTRIBUTING.md?

Description Of Change

Allows for the override of the upsertKeys value passed to the query generator via options.conflictFields (for Model.upsert).

Note that the different name is intentional - upsertKeys isn't as clear as conflictFields IMO.

Our current logic tries to guess which fields are used, but there's some cases where it isn't accurate (and with #13411, this becomes more of an issue) especially partial indexes. Though it'd be nice to have the logic which infers upsertKeys smarter, I think there's some edge cases where it wouldn't be able to (safely) infer which column to use (e.g. in cases where multiple could apply).

Example use case (using a many:many joiner table)

const Memberships: typeof Model = sequelize.define(
  'my_joiner_table',
  {
    id: {
      primaryKey: true,
      autoIncrement: true,
      type: DataTypes.INTEGER,
      allowNull: false,
    },
    group_id: {
      type: DataTypes.INTEGER,
      primaryKey: true,
      allowNull: false,
      unique: 'my_constraint',
    },
    user_id: {
      type: DataTypes.INTEGER,
      primaryKey: true,
      allowNull: false,
      unique: 'my_constraint',
    },
  },
  { timestamps: true }
);

// In usage
await Memberships.upsert(
  {
    group_id: 5,
    team_id: 3,
  },
  {
    conflictFields: ['group_id', 'user_id'],
  }
); // `ON CONFLICT ("group_id", "user_id")` instead of `ON CONFLICT ("group_id")` 

Adds support for the `conflictFields` option to `Model.upsert`.
This is used for `options.upsertKeys` in `QueryInterface.prototype.upsert` if provided for specifying the fields
used at `ON CONFLICT({fields})` rather than relying on the default logic.
@wbourne0 wbourne0 self-assigned this Nov 29, 2021
@fzn0x
Copy link
Member

fzn0x commented Nov 30, 2021

How about the performance from the old ones, is it better?

@wbourne0
Copy link
Member Author

wbourne0 commented Dec 2, 2021

How about the performance from the old ones, is it better?

As in vs v5? This should be considerably faster, the previous logic would:

Try an insert query, then an update query if the insert query threw a unique index error.

This makes it possible to specify which fields to use in ON CONFLICT({HERE}) which can give significant performance benefits if you have the right indexes if sequelize's default logic doesn't pickup the right fields.

(though in most cases, seqeuelize's default logic will work perfectly fine, this is mostly an issue when partial indexes come into play)

Copy link
Member

@sdepold sdepold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL and MSSQL doesn't support it?

@sdepold
Copy link
Member

sdepold commented Dec 3, 2021

Overall I don't see a particular issue with the code change. I'm just wondering if the default logic should be smarter somehow. In your tests you have multi column indexes and seem to use the same columns in the conflictFields afterwards. Should we somehow prefer the multi column indexes?! Or is that too brittle and the developer should decide for him/herself?

@sdepold sdepold self-assigned this Dec 3, 2021
@wbourne0
Copy link
Member Author

wbourne0 commented Dec 5, 2021

Overall I don't see a particular issue with the code change. I'm just wondering if the default logic should be smarter somehow. In your tests you have multi column indexes and seem to use the same columns in the conflictFields afterwards. Should we somehow prefer the multi column indexes?! Or is that too brittle and the developer should decide for him/herself?

I think we should do both, this is just the easiest one to do and works until we can take care of the other one.

I think the primary case for this (where we'll never be able to determine which index to use) would be when there are two indexes which could be used in an index but we only want to use one.

For example:

const Users: typeof Model = sequelize.define(
  'my_joiner_table',
  {
    username: {
      type: DataTypes.STRING,
      allowNull: false,
    },
    email: {
      type: DataTypes.STRING,
      allowNull: false,
    },
  },
  {
    timestamps: true,
    indexes: [
      {
        unique: true,
        fields: ['username'],
      },
      {
        unique: true,
        fields: ['email'],
      },
    ],
  }
);

// In this instance, the conflicting key is ambiguous - we could either be 
// changing the email of the user with username `myUsername` or
// changing the username of the username with email `myEmail@domain.tld`.
// This is fine for inserts, but if we're updating a user it could have unintended side effects.
await Users.upsert({
  username: 'myUsername',
  email: 'myEmail@domain.tld',
});

// In this case, if a user exists with username `myUsername`, we'll update their email
// to be `myEmail@domain.tld`; otherwise we'll create a new user.
await Users.upsert(
  {
    username: 'myUsername',
    email: 'myEmail@domain.tld',
  },
  {
    conflictFields: ['username'],
  }
);

// In this case, if a user exists with email `myEmail@domain.tld`, we'll update their username
// to be `myUsername`; otherwise we'll create a new user.
await Users.upsert(
  {
    username: 'myUsername',
    email: 'myEmail@domain.tld',
  },
  {
    conflictFields: ['email'],
  }
);

However where this issue is more relevant is when / if #13412 is merged (was approved but I accidentally dismissed that, oops).

When working with partial indexes, the where clause must be specified in ON CONFLICT for postgres / sqlite to use that partial index.

So if we don't specify { conflictWhere: conditionForIndex } the index won't apply, meaning we can't infer that the index is to be used (unless we add logic which determines if a where clause implements an index's clause but that sounds very complicated).

I definitely think that we should and can have smarter logic here, but I think there's a lot of edge cases which make having this an option very useful.

@@ -39,7 +39,8 @@ AbstractDialect.prototype.supports = {
inserts: {
ignoreDuplicates: '', /* dialect specific words for INSERT IGNORE or DO NOTHING */
updateOnDuplicate: false, /* whether dialect supports ON DUPLICATE KEY UPDATE */
onConflictDoNothing: '' /* dialect specific words for ON CONFLICT DO NOTHING */
onConflictDoNothing: '', /* dialect specific words for ON CONFLICT DO NOTHING */
conflictFields: false /* whether the dialect supports specifying conflict fields or not */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not supported for mysql and mssql?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope; same for snowflake iirc. From what I gathered whilst looking at their respective docs, they don't let / make you explicitly specify the fields which could run into a UNIQUE constraint issues. Instead, they have an ON DUPLICATE clause which is triggered when any unique constraint would error.

Copy link
Member

@sdepold sdepold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify the question above.

@sdepold sdepold merged commit 496bede into sequelize:main Dec 11, 2021
@github-actions
Copy link
Contributor

🎉 This PR is included in version 6.12.0-beta.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Contributor

🎉 This PR is included in version 6.12.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

aliatsis pushed a commit to creditiq/sequelize that referenced this pull request Jun 2, 2022
* feat(upsert): add conflictFields option

Adds support for the `conflictFields` option to `Model.upsert`.
This is used for `options.upsertKeys` in `QueryInterface.prototype.upsert` if provided for specifying the fields
used at `ON CONFLICT({fields})` rather than relying on the default logic.

* add conflictFields to the right type

Co-authored-by: Sascha Depold <sdepold@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants