New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntitySqlDaoWrapperInvocationHandler optimizations #1093

Merged
merged 6 commits into from Feb 12, 2019

Conversation

Projects
None yet
2 participants
@pierre
Copy link
Member

pierre commented Feb 8, 2019

⚠️ This should be looked at after #1092 is merged (but no rush).

  • Revisit the code to limit the number of queries in various codepaths: by looking at the ChangeType and ObjectType (i.e. with or without history), we can avoid unnecessary queries in some cases. This makes the code a bit harder to follow, but I've added a few Preconditions to ensure my assumptions are correct.
  • Switch to bulk queries (e.g. getByIds).

To be enjoyed with ☕️... 😼

FYI, after this, there are two main items left on my TODO list (but we should already start advertising the changes and measuring the performance impact):

  • Investigate if we can insert history records in bulk (this might require some jdbi changes unfortunately, I'm currently looking into it)
  • Fix InvoiceTrackingSqlDao auditing (found a bug while refactoring this code)

pierre added some commits Feb 5, 2019

util: remove EntityDao#getRecordId API
This wasn't used.

Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>
util: remove snowflake code in EntitySqlDaoWrapperInvocationHandler
EntitySqlDaoWrapperInvocationHandler#invokeWithCaching was only used for
EntitySqlDao#getRecordId. Simplify the code by using NonEntityDao directly
from EntityDaoBase, instead of relying on annotations.

The @cachable annotation isn't really used anymore.

Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>
util: remove audit caches
These were actually never plugged properly.

This makes #247 obsolete.

Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>
util: limit the number of queries in EntitySqlDaoWrapperInvocationHan…
…dler

Revisit the code to limit the number of queries in various codepaths.

Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>
util: switch to getByIds queries in EntitySqlDaoWrapperInvocationHandler
Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>
util: rewrite updateHistoryAndAudit to make it clear when queries are…
… issued

Signed-off-by: Pierre-Alexandre Meyer <pierre@mouraf.org>

@pierre pierre requested a review from sbrossie Feb 8, 2019

@sbrossie
Copy link
Member

sbrossie left a comment

Couple of comments:

  • It's a lot of changes in some delicate parts of the code - although i tried to pay attention, a CR is unlikely to point to subtle issues... so take it for what it is...
  • What i like:
    • Removal of dead code
    • Lots of Preconditions about state in which expect to be
  • What i don't know yet if i like:
    • Can we measure the improvements -- i know it's unfair to ask for this question at this stage :-) ?
  • What i like less:
    • The additional two paths from bulk v.s non bulk -- if we could optimize bulk and treat everything as bulk that would be ideal. I realize this may too much to ask for...

@KillBillSqlDaoStringTemplate
public interface EntitySqlDao<M extends EntityModelDao<E>, E extends Entity> extends AuditSqlDao, HistorySqlDao<M, E>, Transactional<EntitySqlDao<M, E>>, CloseMe {

@SqlUpdate
@GetGeneratedKeys(value = LongMapper.class)

This comment has been minimized.

@sbrossie

sbrossie Feb 8, 2019

Member

Slightly tricky though: "If the columns which represent the auto-generated keys were not specified, the JDBC driver implementation will determine the columns which best represent the auto-generated keys."

Also unrelated to this specific PR, but if this is the direction to go, we should then think of fixing our queue code.

This comment has been minimized.

@pierre

pierre Feb 11, 2019

Author Member

Slightly tricky though: "If the columns which represent the auto-generated keys were not specified, the JDBC driver implementation will determine the columns which best represent the auto-generated keys."

Should be fairly safe though, as it works fine with H2/PostgreSQL/MySQL. Note that I could specify it in the annotation (@GetGeneratedKeys(columnName = ...)), but casing is a bit annoying (H2 would want RECORD_ID instead of record_id).

That being said, it would be nice to completely get rid of record_id and only rely on id (somewhat related: #35). This would also unblock the Spanner work (TODO mostly aspirational at this point...).

Also unrelated to this specific PR, but if this is the direction to go, we should then think of fixing our queue code.

We could potentially - one thing it keep in mind though is that LAST_INSERT_ID() is scoped to a connection while RETURN_GENERATED_KEYS is scoped to a Statement (would have to refresh my memory on the queue code if it matters in that case).

@Audited(ChangeType.INSERT)
public Object create(@SmartBindBean final M entity,
@SmartBindBean final InternalCallContext context);

@SqlBatch
// We don't @GetGeneratedKeys here, as it's unclear if the ordering through the batches is respected by all JDBC drivers

This comment has been minimized.

@sbrossie

sbrossie Feb 8, 2019

Member

The JDBI 3 (but 2 is probably no different) seems to indicate we can use it.

I suppose in the current scheme, the code follows with a get to retrieve all objects and access the auto-generated fields?

This comment has been minimized.

@pierre

pierre Feb 11, 2019

Author Member

The JDBI 3 (but 2 is probably no different) seems to indicate we can use it.

Our version of JDBI doesn't support @GetGeneratedKeys on batch statements, but I'm currently fixing it.

But the remark here was more about the order of keys returned: it is driver specific whether the ordering returned matches the ordering of inserts.

I suppose in the current scheme, the code follows with a get to retrieve all objects and access the auto-generated fields?

Yes. Note also that for INSERT with a history table, we need the full rehydrated object anyways.

if (changeType == ChangeType.DELETE) {
for (final String entityId : entityIds) {
deletedEntities.put(entityId, sqlDao.getById(entityId, context));
// TODO Switch to getByIds

This comment has been minimized.

@sbrossie

sbrossie Feb 8, 2019

Member

This seems reasonable, but do we even have use case of delete operations in batch?

This comment has been minimized.

@pierre

pierre Feb 11, 2019

Author Member

We don't indeed - just making the code forward compatible 😆

InternalCallContext context = null;
// Retrieve record_id(s) for audit and history tables
final List<Long> entityRecordIds = new LinkedList<Long>();
if (changeType == ChangeType.DELETE) {

This comment has been minimized.

@sbrossie

sbrossie Feb 8, 2019

Member

Am i understanding well, that all this new code is because we to optimize the rehydration process by using getByRecordId instead of getBydId?

This comment has been minimized.

@pierre

pierre Feb 11, 2019

Author Member

Not quite: the gist of it is that we have a matrix of cases tableName x changeType x with or without history. In some cases, we only need the new record_id, in others we need the old entity and the new entity, etc.

The previous code was a bit more generic and wasn't explicitly enumerating these matrix cases, causing extra unnecessary get queries to be issued (IIRC, I saw 2 unnecessary extra getInvoiceItem queries for each createInvoiceItem). I now go in each of these cases to try to optimize as much as possible. Code is a bit more verbose but more obvious IMHO (and I've added Preconditions to validate some of my assumptions).

@pierre

This comment has been minimized.

Copy link
Member Author

pierre commented Feb 11, 2019

  • What i don't know yet if i like:

    • Can we measure the improvements -- i know it's unfair to ask for this question at this stage :-) ?

There are really two distinct parts in this work:

  • Bulk queries stuff (e.g. invoice with lots of invoice items): I don't think we'll expect a performance boost in all cases. It will depend on a lot of factors, including dataset, type of database engine and jdbc configuration parameters.
  • Reducing the number of queries in some codepaths: hard to measure, but we're for sure saving a few round trips 😼
  • What i like less:

    • The additional two paths from bulk v.s non bulk -- if we could optimize bulk and treat everything as bulk that would be ideal. I realize this may too much to ask for...

Yup, that's being worked on in the branch invoice-performance-v5, but it requires some jdbi changes. PRs are coming...

@pierre pierre referenced this pull request Feb 12, 2019

Merged

Invoice performance v5 #1096

@sbrossie sbrossie merged commit 733c62f into work-for-release-0.21.x Feb 12, 2019

5 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: integration-tests Your tests passed on CircleCI!
Details
ci/circleci: test-h2 Your tests passed on CircleCI!
Details
ci/circleci: test-mysql Your tests passed on CircleCI!
Details
ci/circleci: test-postgresql Your tests passed on CircleCI!
Details

@pierre pierre deleted the invoice-performance-v4 branch Feb 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment