Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient Bulk Create/Insert With One Database Call #3357

Closed
shadyanwar opened this issue Jul 15, 2019 · 14 comments
Closed

Efficient Bulk Create/Insert With One Database Call #3357

shadyanwar opened this issue Jul 15, 2019 · 14 comments
Assignees
Labels

Comments

@shadyanwar
Copy link

Suggestion

The createAll() method provides the ability to bulk create records, however, it is translated into multiple insert statements since the dao layer inserts records one by one. I suggested that, in MySQL for example, LB4 uses the multiple-row INSERT syntax which aims at reducing the communication overhead between the client and the server.

Use Cases

Less server overhead and better performance. In case of inserting around 10k records, the performance impact due to the current implementation is huge. A query might take something like 10 seconds or more and raise the server utilization extensively. Whereas with the multiple-row INSERT syntax the same query could take something like 1 second.

Examples

Optimally, instead of:

INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(10,20,30,40);
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(100,200,300,400);

We would have:
INSERT INTO `tbl`(`fld1`,`fld2`,`fld3`,`fld4`) VALUES(1,2,3,4),(10,20,30,40),(100,200,300,400);

Acceptance criteria

TBD - will be filled by the team.

@pankajcheema
Copy link

I am agree with the point of @shadyanwar . I am also looking for the same .

@pankajcheema
Copy link

any update on the same ?

@dhmlau
Copy link
Member

dhmlau commented Jun 10, 2020

Talked to @jannyHou, this will be at the connector level.
@shadyanwar @pankajcheema, which connector are you using?

@shadyanwar
Copy link
Author

shadyanwar commented Jun 10, 2020

Talked to @jannyHou, this will be at the connector level.
@shadyanwar @pankajcheema, which connector are you using?

@dhmlau
MySQL

@stale stale bot added the stale label Dec 25, 2020
@shadyanwar
Copy link
Author

@dhmlau I’m happy to help

@stale stale bot removed the stale label Dec 25, 2020
@dishantr16
Copy link

I'm using Mongodb facing the same issue.... have any one found the solution?

@stale stale bot added the stale label Aug 22, 2021
@achrinza achrinza removed the stale label Aug 22, 2021
@loopbackio loopbackio deleted a comment from stale bot Aug 22, 2021
@loopbackio loopbackio deleted a comment from stale bot Aug 22, 2021
@samarpanB
Copy link
Contributor

@dhmlau @achrinza This is a very significant issue. We fixed it by overriding the createAll method in repository. But that's not right, because I have to do this at multiple places. I can create a mixin and do this too, but since it might be a performance bug for most of the databases, should we not have it in LB4 itself ?

Having said that, I tried to look for a solution to this in connector level. I checked postgres connector, for instance, but there was no reference of createAll here. Then I checked its base class in loopback-connector. There also I can only see create method. I checked juggler-bridge then. There I saw the createAll method making an internal call to create method. The create method invokes modelclass.create, which eventually invokes the connector create method. See below.

async createAll(entities: DataObject<T>[], options?: Options): Promise<T[]> {
    // perform persist hook
    const data = await Promise.all(
      entities.map(e => this.entityToData(e, options)),
    );
    const models = await ensurePromise(this.modelClass.create(data, options));
    return this.toEntities(models);
  }

I am basically trying to raise a PR to have this capability of either using create method multiple times or using insert query for multiple records. This definitely should go to connectors actually. Can you confirm, I have to modify the juggler bridge function first and then move to connectors, or am I getting it wrong ?

@achrinza
Copy link
Member

A summary of the code flow is:

  1. LB4 Repository calls Juggler Model Class .create
  2. Juggler Model Class gets the above method (and others) from being mixin-ed Juggler DataAccessObject by default
  3. Juggler DataAccessObject .create calls the Juggler SQL Connectors' .create
  4. The Juggler SQL Connectors inherit from loopback-connector's SQLConnector

So:

  1. "crux" of the issue is that the connector needs to implement .createAll.
  2. Since the SQL Connectors inherit from a common SQLConnector, updating that should be sufficient (assuming the SQL dialect isn't too different).
  3. The Juggler DataAccessObject also needs to be updated to implement .createAll, which would be mixin-ed into the Juggler Model Class during runtime
  4. Finally, LB4 DefaultCrudRepository's .createAll needs to be updated in a backwards-compatible way.

@samarpan-b
Copy link

Thanks @achrinza . However, I think mongodb connector also uses the same sql connector. Mongo DB has different query for insert multiple. I'll check and confirm.

@achrinza
Copy link
Member

achrinza commented Jun 27, 2022

@samarpan-b The MongoDB connector inherits from Connector instead, hence it should not be affected by updates to SQLConnector:

https://github.com/loopbackio/loopback-connector-mongodb/blob/51d132a48c0b559978a3d379a9695ff6bc1d888a/lib/mongodb.js#L211

Though it does implement the necessary CRUD functions to be compatible with Juggler DataAccessObject.

@samarpanB
Copy link
Contributor

samarpanB commented Jul 3, 2022

@achrinza I checked if the multiple insert in one query dialect is available in different databases or not and also if its available, is it same or not ?
I observed that only postgresql, mysql, sql server have a common dialect. Oracle supports multiple insert in one query but dialect is different and cassandra doesn't even support it.

So here is what I am thinking

  1. Add createAll method to connector
  2. Within createAll, I will invoke buildInsertAll method but I'll keep its implementation abstract (or empty, like buildReturning method), leaving the implementation to child connectors.
  3. Put a check if buildInsertAll is not implemented, then invoke create in loop
  4. Implement buildInsertAll in postgres, mysql, sql server, oracle connectors
  5. Update Juggler DataAccessObject implement createAll, which would be mixin-ed into the Juggler Model Class during runtime.
  6. Update LB4 DefaultCrudRepository's createAll

@samarpanB
Copy link
Contributor

Point 1 to 5 are done and PRs are raised. @achrinza please do take a look.

samarpanB added a commit to samarpanB/loopback-next that referenced this issue Sep 25, 2022
…uery

fix loopbackio#3357

Signed-off-by: Samarpan Bhattacharya <this.is.samy@gmail.com>
@samarpanB
Copy link
Contributor

Point 6 is also done. But its in draft. We need connector and juggler to release first. Then only I can update dependency in @loopback/repository.

@samarpanB
Copy link
Contributor

This is now released for postgresql in version 6.0.0 of loopback-connector-postgresql. See here - https://github.com/loopbackio/loopback-connector-postgresql/blob/master/CHANGES.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants