Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to the pooled-lo optimizer in Hibernate ORM #31899

Closed
yrodiere opened this issue Mar 16, 2023 · 8 comments · Fixed by #31947
Closed

Default to the pooled-lo optimizer in Hibernate ORM #31899

yrodiere opened this issue Mar 16, 2023 · 8 comments · Fixed by #31947

Comments

@yrodiere
Copy link
Member

yrodiere commented Mar 16, 2023

Description

To be considered for Quarkus 3 as a way to address the infamous backwards-incompatible sequence changes in Hibernate ORM (see #31521)

Hibernate ORM defaults to the pooled optimizer, which assumes the pool of IDs it can use is between <sequence value >-<increment size>+1 and <sequence value>.

It works, but has a significant downside: every time you restart a sequence manually (native SQL, import script) you have to take care to set the value to max(id) + <increment size>, otherwise Hibernate ORM will use incorrect IDs. E.g. if you your max id is 2 and you restart your sequence to 3, next time Hibernate ORM retrieves the sequence value it will assign IDs between 3-<increment size>+1 and 3, which, with the default increment size of 50, leads to Hibernate ORM starting with negative IDs (-46, -47, ...) and eventually coming all the way back to already assigned IDs, i.e. 1, 2.

So, that can be surprising, and hence it's not a great default.

Hibernate ORM provides another optimizer, pooled-lo. That optimizer will assume the pool of IDs it can use is between <sequence value> and <sequence value>+<increment size>-1... which removes the problem mentioned above: if you restart your sequence to 3, Hibernate ORM will use IDs 3 to 52, which is exactly what you'd naively assume.

Interestingly, switching from the pooled optimizer to the pooled-lo optimizer... is a backwards-compatible change! If the pooled optimizer leaves the sequence with e.g. value 151 (intending to use values 101 to 151 in the next pool), then upon restarting after changing the optimizer, the pooled-lo optimizer will simply skip those values and use values 151 to 200 instead.

So, we could consider making the pooled-lo optimizer the default in Quarkus. The only problem would be if other actors hit the database and expect the pooled optimizer to be used; they might end up using IDs that the pooled-lo optimizer already used. We might consider this acceptable, since:

  1. Those collisions will lead to immediate error messages, assuming the DB has integrity constraints on primary keys (all DBs do, right? right?)
  2. People in this situation are probably advanced users who can debug this.

See https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#identifiers-generators-optimizer for documentation about Hibernate ORM optimizers.

Implementation ideas

No response

@quarkus-bot
Copy link

quarkus-bot bot commented Mar 16, 2023

/cc @Sanne (hibernate-orm), @gsmet (hibernate-orm)

@yrodiere
Copy link
Member Author

@Sanne , @gsmet , see above. Any opinion on this change? It's not completely safe but I think it would be a net gain for new users and demos, and not particularly worse for existing users and complex applications.

@sebersole stated some time ago that pooled-lo could have been a better default for Hibernate ORM, and I agree with him, but I don't think we can reasonably change that in ORM right now. In Quarkus, though... it may be more reasonable.

@gsmet
Copy link
Member

gsmet commented Mar 16, 2023

My question about pooled-lo is that anytime you start your app, you skip 50 items, right?

@Sanne
Copy link
Member

Sanne commented Mar 16, 2023

I think it's a reasonable idea. Also, the whole point of using "generated ids" is that the business logic shouldn't really care which IDs are actually being generated; so if there's artificial some additional gaps of numbers that go unused that's totally fine, especially as we're not wasting significant ranges of numbers from the available space of ids.

As an alternative, if this plan wouldn't work for some reason, also bear in mind that the increment of the sequence is set to the range of the optimizer ORM is using and this is validated by the schema tools; so even just recommending users to do add a single "nextvalue" instruction in their scripts also avoids the users needing to do any math - the unsolved problem is awareness about this being necessary.

Speaking of awareness - wondering if ORM couldn't validate ranges on boot, as we know which tables are sourcing their ids from each generator; but that's not trivial and not something I'd recommend looking into right now.

@yrodiere
Copy link
Member Author

My question about pooled-lo is that anytime you start your app, you skip 50 items, right?

Only on the first start after switching from pooled to pooled-lo. I don't see any reason it would skip items on the next starts.

That is, except the fact that you essentially "lose" the current content of the pool when you stop an application, but that's true for all optimizers. As Sanne said, gaps in entity IDs are something people are expected to live with when using optimizers. Not something frequent, but something that happens.

the unsolved problem is awareness about this being necessary.

Right that's the whole point of this proposal. Wrong increment sizes are pretty much dealt with; the remaining problem is correctly (re-)initializing sequence values, and that's not just a migration problem, it will potentially affect anyone starting a new application.

wondering if ORM couldn't validate ranges on boot, as we know which tables are sourcing their ids from each generator; but that's not trivial and not something I'd recommend looking into right now.

Agreed on both points: would be nice, but now's not a good time. I suspect that would opens up a whole new range of problems since I don't think the schema tools look at entity data at all at the moment.

@sebersole
Copy link
Contributor

I have indeed said that - pooled-lo should really be the default, in hindsight.

I did want to chime in about the "schema validation" bit. So far, validation has been limited to checking the actual schema. What you are suggesting here goes beyond that into validating some of the data contained in that schema. Different thing. It is on par with, e.g., validating that columns mapped to enums contain only valid values.

Not saying that does not have value or merit, but its a whole new thing entirely

@gsmet
Copy link
Member

gsmet commented Mar 16, 2023

OK, then I think using pooled-lo as the default for Quarkus makes sense.

@Sanne
Copy link
Member

Sanne commented Mar 16, 2023

I did want to chime in about the "schema validation" bit. So far, validation has been limited to checking the actual schema. What you are suggesting here goes beyond that into validating some of the data contained in that schema. Different thing. It is on par with, e.g., validating that columns mapped to enums contain only valid values.

Yes, absolutely: I might have been unclear:

  • ORM does validate the schema, including it to define exactly the expected increment
  • We could leverage the knowledge sourced from schema validation to additionally validate used ranges

It's definitely two separate items, the first one an existing feature of the schema validator, the second being wishful thinking I'm just pointing out to consider in the future. Sorry for the tangent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants