Postgres upsert behavior inconsistent with "do nothing" #4242

noahw3 · 2023-04-20T01:56:09Z

Describe the bug
This is somewhere between a bug report and a feature request. When performing an upsert call using only the unique key(s) of the entity, the underlying postgres call gets created with an "on conflict do nothing" clause. The problem with this is that, if there is a conflict, postgres will not return anything in the "returning" clause.

This causes a few issues:

Confusion on what the expected return type of upsert is. If the expectation is that it should always return a value, that expectation is broken.
Inconsistency in the upsert API. If used with entity properties in addition to the unique key(s), it will always perform an update and always return a result. If only used with the unique key(s), it may not return the full result.
This poisons the identity map. If an entity is upserted and nothing is returned, that value is cached in the identity map. If you then try to populate a relationship later (without doing any other explicit lookup to refresh the value), it will reuse the partially hydrated value without doing a full lookup.

To Reproduce
Steps to reproduce the behavior:

Create a basic entity with a primary key and another optional/default column
Add a row to the table
Attempt to upsert a value with the same primary key (and only the primary key)
Inspect the result, and see that the return value only includes the primary key
Create another entity with a ManyToOne relationship to the first entity
Attempt to query for a value of the second entity and populate the first entity. Note that if this is done without first attempting to upsert the first entity, it will include all information. If done after attempting to upsert the first entity, it will only include the provided primary key.

Expected behavior
Upsert should always return a full value.

Additional context
I imagine the simplest way to achieve this would be to always perform the update (just using the provided unique key(s)). I suppose this could potentially introduce more writes in situations where it wasn't previously, and it could trigger automatic column updates, but that could already be happening if users were specifying non-unique keys in the query.

Versions

Dependency	Version
node	18.15.0
typescript	4.9
mikro-orm	5.6.15
your-driver	postgres@5.6.15

The text was updated successfully, but these errors were encountered:

B4nan · 2023-04-21T11:53:16Z

Can you share the reproduction as code, please?

noahw3 · 2023-04-21T22:17:41Z

Should look something like this.

Entities:

@Entity()
export class B extends BaseEntity<B, "id"> {
  @PrimaryKey({ type: "uuid", defaultRaw: "uuid_generate_v4()" })
  id!: string;

  @ManyToOne(() => D, { onDelete: "cascade", ref: true })
  d!: Ref<D>;
}

@Entity()
@Unique({
  properties: ["tenantWorkflowId"],
})
export class D extends BaseEntity<D, "id"> {
  @PrimaryKey({ type: "uuid", defaultRaw: "uuid_generate_v4()" })
  id!: string;

  @Property()
  tenantWorkflowId: number;

  @Property({ length: 6, defaultRaw: "now()", onUpdate: () => new Date() })
  updatedAt: Date = new Date();
}

Application code:

const ds = [1];
const loadedDs = await this.em.upsertMany(
  D,
  ds.map(({ tenantWorkflowId }) => ({
    tenantWorkflowId,
  }))
);
await this.em.flush();
// loadedDs should have an id, tenantWorkflowId, and updatedAt. With debug statements on, the query issued was
`insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning "id", "updated_at"`

const loadedDs2 = await this.em.upsertMany(
  D,
  ds.map(({ tenantWorkflowId }) => ({
    tenantWorkflowId,
  }))
);
await this.em.flush();
// loadedDs2 only has tenantWorkflowId set. With debug statements on, the query issued was
`insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning "id", "updated_at"`
`select "w0"."id" from "d" as "w0" where ("w0"."tenant_workflow_id" = 1)`

const loadedDs3 = await this.em.upsertMany(
  D,
  ds.map(({ tenantWorkflowId }) => ({
    tenantWorkflowId,
    updatedAt: new Date(),
  }))
);
await this.em.flush();
// loadedDs3 has id, tenantWorkflowId, and updatedAt set. With debug statements on, the query issued was
`insert into "d" ("tenant_workflow_id", "updated_at") values (1, '2023-04-21T22:01:07.995Z') on conflict ("tenant_workflow_id") do update set "updated_at" = excluded."updated_at" returning "id", "updated_at"`

Populating the relation:

// Assuming that tenantWorkflowId 1 already exists in the database
const ds = [1];
const loadedDs = await this.em.upsertMany(
  D,
  ds.map(({ tenantWorkflowId }) => ({
    tenantWorkflowId,
  }))
);
await this.em.flush();

await const b = await this.em.upsert(B, {
  d: loadedDs[0],
  order: 0,
  updatedAt: new Date(),
});
await this.em.flush();

// await b.d.load() This only includes the id and tenantWorkflowId, it's missing updatedAt (or any other additional columns on the entity)

B4nan · 2023-05-14T11:48:00Z

Confusion on what the expected return type of upsert is. If the expectation is that it should always return a value, that expectation is broken.

By "a value" you are referring to what exactly? The updatedAt? The return value is an array of managed entities, and that works as expected - they don't have to be fully initialized, the only invariant is that they will be managed, therefore they will have a primary key.

Maybe we can have a flag to enforce reloading of the entity fully? Maybe it would be even a good default? I really don't know, I was hesitant to add this API because I never needed it myself, so I don't have any first-class experience with the problems it should be solving. From what I understood, people often provide all the values explicitly, and for that it

Inconsistency in the upsert API. If used with entity properties in addition to the unique key(s), it will always perform an update and always return a result. If only used with the unique key(s), it may not return the full result.

This method is dynamic by definition, I don't see this as a problem, e.g. when you use upsert based on PK and the entity is already part of the identity map, you just get an assign on it, without any queries being made. This method is there to ensure some entity is created in the database, it is not meant as a replacement for find. But as said above, maybe that is something to improve.

This poisons the identity map. If an entity is upserted and nothing is returned, that value is cached in the identity map. If you then try to populate a relationship later (without doing any other explicit lookup to refresh the value), it will reuse the partially hydrated value without doing a full lookup.

If this part is what your "Populating the relation:" code snippet should be reproducing, it seems to be working fine on my end (with latest version).

I wasn't really sure what you mean by the snippets, is that supposed to be a sequence, so you literally call upsert + flush 3 times or is that supposed to be a comparison of 3 different versions? I'd appreciate a test case so its clear. Here is what I compiled from your snippets so far, note that I replaced the flush calls with clear as that feels more appropriate if you want to compare how things differ. Note that this is also dependent on the database state, e.g. if I wipe the database between the calls, it all behaves the same.

import { Entity, ManyToOne, PrimaryKey, Property, Ref, Unique } from '@mikro-orm/core';
import { MikroORM } from '@mikro-orm/postgresql';

@Entity()
class B {

  @PrimaryKey({ type: 'uuid', defaultRaw: 'uuid_generate_v4()' })
  id!: string;

  @ManyToOne(() => D, { onDelete: 'cascade', ref: true })
  d!: Ref<D>;

  @Property({ unique: true })
  order!: number;

  @Property({ length: 6, defaultRaw: 'now()', onUpdate: () => new Date() })
  updatedAt: Date = new Date();

}

@Entity()
@Unique({ properties: ['tenantWorkflowId'] })
class D {

  @PrimaryKey({ type: 'uuid', defaultRaw: 'uuid_generate_v4()' })
  id!: string;

  @Property()
  tenantWorkflowId!: number;

  @Property({ length: 6, defaultRaw: 'now()', onUpdate: () => new Date() })
  updatedAt: Date = new Date();

}

let orm: MikroORM;

beforeAll(async () => {
  orm = await MikroORM.init({
    entities: [B, D],
    dbName: `4242`,
  });

  await orm.schema.ensureDatabase();
  await orm.schema.execute('CREATE EXTENSION IF NOT EXISTS "uuid-ossp"');
  await orm.schema.refreshDatabase();
});

afterAll(async () => {
  await orm.close(true);
});

beforeEach(async () => {
  await orm.schema.clearDatabase();
});

test('4242 1/2', async () => {
  const loadedDs = await orm.em.upsertMany(D, [
    { tenantWorkflowId: 1 },
  ]);
  console.log(loadedDs);
  orm.em.clear();
  // loadedDs should have an id, tenantWorkflowId, and updatedAt. With debug statements on, the query issued was
  //   `insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning "id", "updated_at"`

  const loadedDs2 = await orm.em.upsertMany(D, [
    { tenantWorkflowId: 1 },
  ]);
  console.log(loadedDs2);
  orm.em.clear();
  // loadedDs2 only has tenantWorkflowId set. With debug statements on, the query issued was
  //   `insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning "id", "updated_at"`
  //     `select "w0"."id" from "d" as "w0" where ("w0"."tenant_workflow_id" = 1)`

  const loadedDs3 = await orm.em.upsertMany(D, [
    { tenantWorkflowId: 1, updatedAt: new Date() },
  ]);
  console.log(loadedDs3);
  orm.em.clear();
  // loadedDs3 has id, tenantWorkflowId, and updatedAt set. With debug statements on, the query issued was
  //   `insert into "d" ("tenant_workflow_id", "updated_at") values (1, '2023-04-21T22:01:07.995Z') on conflict ("tenant_workflow_id") do update set "updated_at" = excluded."updated_at" returning "id", "updated_at"`
});

test('4242 2/2', async () => {
  // Assuming that tenantWorkflowId 1 already exists in the database
  const loadedDs4 = await orm.em.upsertMany(D, [
    { tenantWorkflowId: 1 },
  ]);
  console.log(loadedDs4);
  await orm.em.flush();

  const b = await orm.em.upsert(B, {
    d: loadedDs4[0],
    order: 0,
    updatedAt: new Date(),
  });
  await orm.em.flush();
  console.log(b);

  await b.d.load(); // this corrertly populates the relation, it is not marked as initialized before the load() call
  console.log(b.d);
});

noahw3 · 2023-05-15T23:14:42Z

At a high level, the way that I'd want to use upsert is as a "findAndUpdateOrCreate" method. Because the return type is the entity, my assumption is that it should always return a fully initialized entity - otherwise if it isn't a fully loaded entity I'd expect it to be a reference. However, given that the entire point of using upsert is to reduce the number of database calls under the hood, I don't think it makes a ton of sense to return a partial result that then needs to hit the database again to fully initialize.

The implication of this then is that:

All columns necessary to fully initialize the entity should be included in the "returning" clause.
Upon failure to update, instead of only querying to populate the primary key it should query on all columns. Alternatively, instead of specifying "do nothing" in the conflict clause, it could always perform a no-op update to force a return value in the original query.

I think what I was trying to show in the example code was the differing behavior across subsequent requests. On the first call, because nothing exists yet it does a full create and returns the full entity. On the second call the same code returned a partial result and fired an extra query because the entity had already been created. The third call demonstrates how always forcing an update (for example, by always updating updatedAt) also forces a return value.

The other snippet shows how calling load on the relation doesn't return fully initialized values, because the prior upsert call put partially initialized values into the identity map.

B4nan · 2023-05-20T13:11:47Z

Thanks for the details, it was very helpful! I completely agree with the points, already working on a PoC.

Some interesting reading about the topic can be found here. So I guess it will be better to keep the separate query for reloading the values, in the end, it will be always there for MySQL that does not have returning statements.

Already got some promising results (#4370):

[query] insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning * [took 3 ms]
[
  D {
    tenantWorkflowId: 1,
    updatedAt: 2023-05-20T13:14:15.783Z,
    id: 'ef103b8a-2143-43f4-8736-7e0b4582ff2c'
  }
]

[query] insert into "d" ("tenant_workflow_id") values (1) on conflict ("tenant_workflow_id") do nothing returning * [took 2 ms]
[query] select "d0"."id", "d0"."tenant_workflow_id", "d0"."updated_at" from "d" as "d0" where "d0"."tenant_workflow_id" = 1 [took 1 ms, 1 result]
[
  D {
    tenantWorkflowId: 1,
    updatedAt: 2023-05-20T13:14:15.783Z,
    id: 'ef103b8a-2143-43f4-8736-7e0b4582ff2c'
  }
]

[query] insert into "d" ("tenant_workflow_id", "updated_at") values (1, '2023-05-20T13:14:15.806Z') on conflict ("tenant_workflow_id") do update set "updated_at" = excluded."updated_at" returning * [took 1 ms]
[
  D {
    tenantWorkflowId: 1,
    updatedAt: 2023-05-20T13:14:15.806Z,
    id: 'ef103b8a-2143-43f4-8736-7e0b4582ff2c'
  }
]

Not sure if it makes more sense to whitelist the colums in returning clause, I guess I will do that in the end, so its in line with the select clause.

It will need more tests and polishing, let's see if I can find some time for that tomorrow, need to go AFK now.

Closes #4242

B4nan · 2023-05-21T15:43:15Z

Released as 5.7.8, let me know how it works, maybe there are more cases that will need improvements.

B4nan · 2023-05-21T16:47:56Z

What do you think about how the upsert works for already managed entities (loaded in context)? Now it does just the assign call and you need to flush explicitly for that, but maybe we could fire update query instead? Also if the entity is just a reference, it will still return a reference (with the data being assigned on it, after flushing it will become initialized but still might be missing optional data that were not part of the upsert call).

B4nan · 2023-08-29T22:40:14Z

FYI next version will improve the behavior of "no nothing" in postgres, and make things explicitly configurable, see #4669

B4nan added the bug Something isn't working label May 20, 2023

B4nan added a commit that referenced this issue May 20, 2023

fix(core): ensure em.upsert returns initialized entity

7a61b20

Closes #4242

B4nan mentioned this issue May 20, 2023

fix(core): ensure em.upsert returns initialized entity #4370

Merged

B4nan added a commit that referenced this issue May 21, 2023

fix(core): ensure em.upsert returns initialized entity

bf74dce

Closes #4242

B4nan added a commit that referenced this issue May 21, 2023

fix(core): ensure em.upsert returns initialized entity

8c1d4ec

Closes #4242

B4nan added a commit that referenced this issue May 21, 2023

fix(core): ensure em.upsert returns initialized entity

0634920

Closes #4242

B4nan added a commit that referenced this issue May 21, 2023

fix(core): ensure em.upsert returns initialized entity

60cbcae

Closes #4242

B4nan closed this as completed in #4370 May 21, 2023

B4nan added a commit that referenced this issue May 21, 2023

fix(core): ensure em.upsert returns initialized entity (#4370)

bad0b37

Closes #4242

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres upsert behavior inconsistent with "do nothing" #4242

Postgres upsert behavior inconsistent with "do nothing" #4242

noahw3 commented Apr 20, 2023

B4nan commented Apr 21, 2023

noahw3 commented Apr 21, 2023 •

edited

Loading

B4nan commented May 14, 2023

noahw3 commented May 15, 2023

B4nan commented May 20, 2023 •

edited

Loading

B4nan commented May 21, 2023

B4nan commented May 21, 2023 •

edited

Loading

B4nan commented Aug 29, 2023

Postgres upsert behavior inconsistent with "do nothing" #4242

Postgres upsert behavior inconsistent with "do nothing" #4242

Comments

noahw3 commented Apr 20, 2023

B4nan commented Apr 21, 2023

noahw3 commented Apr 21, 2023 • edited Loading

B4nan commented May 14, 2023

noahw3 commented May 15, 2023

B4nan commented May 20, 2023 • edited Loading

B4nan commented May 21, 2023

B4nan commented May 21, 2023 • edited Loading

B4nan commented Aug 29, 2023

noahw3 commented Apr 21, 2023 •

edited

Loading

B4nan commented May 20, 2023 •

edited

Loading

B4nan commented May 21, 2023 •

edited

Loading