Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when auditing a large amount of inserts in Jpa. #1213

Closed
natami opened this issue Aug 8, 2022 · 3 comments
Closed

Performance issue when auditing a large amount of inserts in Jpa. #1213

natami opened this issue Aug 8, 2022 · 3 comments

Comments

@natami
Copy link

natami commented Aug 8, 2022

Is your feature request related to a problem? Please describe.
I'm using Javers in a project - where we have a data import feature, where we need to insert a large amount (ca. 20000-50000) of entities, which are also audited using Javers (Javers NewObject) using Hibernate/Jpa.

Looking at hibernate sql queries, i can see that there are 2 selects during audit of a single Jpa entity in javers - both on the same parameters - and both queries returning no found object.

An example:

` org.javers.core.Javers : Commit(id:7623678931677205483.00, snapshots:1, author:735b3f06-62d5-4d5f-8a54-68ff2ec40cfa, changes - NewObject:1), done in 22 millis (diff:3, persist:19)

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:5, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["SELECT global_id_pk FROM core_service.jv_global_id WHERE 1 = 1 AND local_id = ? AND type_name = ?"], Params:[("83a1d6af-84f2-4a0d-9799-81cd23c13628",com.xyz.core.service.identifiable.Identifiable)]

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:5, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["INSERT INTO core_service.jv_commit ( author, commit_date, commit_date_instant, commit_id, commit_pk ) VALUES ( ?,?,?,?,? )"], Params:[(735b3f06-62d5-4d5f-8a54-68ff2ec40cfa,2022-08-08 08:14:35.232,2022-08-08T08:14:35.232600Z,6746473815086611359.00,311)]

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:5, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["INSERT INTO core_service.jv_commit_property ( commit_fk, property_name, property_value ) VALUES ( ?,?,? )"], Params:[(311,organizationId,b254e19b-49e2-473c-9e08-bb65eaa0b924)]

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:3, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["SELECT global_id_pk FROM core_service.jv_global_id WHERE 1 = 1 AND local_id = ? AND type_name = ?"], Params:[("83a1d6af-84f2-4a0d-9799-81cd23c13628",com.xyz.core.service.identifiable.Identifiable)]

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:4, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["INSERT INTO core_service.jv_global_id ( type_name, local_id, global_id_pk ) VALUES ( ?,?,? )"], Params:[(com.xyz.core.service.identifiable.Identifiable,"83a1d6af-84f2-4a0d-9799-81cd23c13628",324)]

n.t.d.l.l.SLF4JQueryLoggingListener : Name:, Connection:23, Time:3, Success:True, Type:Prepared, Batch:False, QuerySize:1, BatchSize:0, Query:["INSERT INTO core_service.jv_snapshot ( type, global_id_fk, commit_fk, version, state, changed_properties, managed_type, snapshot_pk ) VALUES ( ?,?,?,?,?,?,?,nextval('core_service.jv_snapshot_pk_seq') * 100 )"], Params:[(INITIAL,324,311,1,{
`

Are the selects needed in case of a NewObject commit - as they are killing performance?

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
Somehow ensure the need for a SELECT global_id_pk FROM core_service.jv_global_id WHERE 1 = 1 AND local_id = ? AND type_name = ?" in NewObject scenario is not needed.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

@NicklasWallgren
Copy link

NicklasWallgren commented Mar 27, 2023

I've also encountered this issue while importing large sets of data. 78% of the time is spent selecting a global_id_id from jv_global_id.

@bartoszwalacik
Is it possible to define a custom global_id_pk generator, so we can allocate ids in batches (1000+ at a time) , similar to how jakarta.persistence.@SequenceGenerator works for JPA?

SELECT global_id_pk FROM jv_global_id WHERE 1 = 1 AND local_id = ? AND type_name = ?

image

@bartoszwalacik
Copy link
Member

I will take a look at it

bartoszwalacik added a commit that referenced this issue Jul 9, 2023
* #1213 reducing number of calls to GlobalIdRepository.findGlobalIdPk()

* build trigger

* fix - added cache evict for a freshly inserted GlobalId

* putGlobalIdPkInCache

* comment

* showStandardStreams = true

* cache put removed
@bartoszwalacik
Copy link
Member

I did some optimization. The number of calls to GlobalIdRepository.findGlobalIdPk() is significantly reduced.
this fix will be released in 7.0.2

bartoszwalacik added a commit that referenced this issue Jul 10, 2023
* #1213 reducing number of calls to GlobalIdRepository.findGlobalIdPk()

* build trigger

* fix - added cache evict for a freshly inserted GlobalId

* putGlobalIdPkInCache

* comment

* showStandardStreams = true

* cache put removed
bartoszwalacik added a commit that referenced this issue Jul 16, 2023
* #1213 reducing number of calls to GlobalIdRepository.findGlobalIdPk()

* build trigger

* fix - added cache evict for a freshly inserted GlobalId

* putGlobalIdPkInCache

* comment

* showStandardStreams = true

* cache put removed
bartoszwalacik added a commit that referenced this issue Jul 27, 2023
* #1213 reducing number of calls to GlobalIdRepository.findGlobalIdPk()

* build trigger

* fix - added cache evict for a freshly inserted GlobalId

* putGlobalIdPkInCache

* comment

* showStandardStreams = true

* cache put removed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants