Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: optimize multiple objects write #1059

Merged
merged 4 commits into from
Jul 18, 2023
Merged

Conversation

redbaron
Copy link
Contributor

@redbaron redbaron commented Jul 16, 2023

Previously multiple objects write required 2 round trips
to DB per object: one to fetch object state from DB and second
to update object. Each query is indexed and fast, but
with latency to DB comparable to query execution time it adds
significant overhead.

This change optimizes multiple objects write in 2 steps:

  1. Instead of reading DB state for each object and then deciding
    on possible write operation, perform write operation unconditionally
    with correct predicates (version and permissions) defined where
    applicable. That said write operation might not succeed if row doesn't
    match predicate. Write query is structured in such way, that
    final state of the row in the database is returned, regardless
    whether writeop successed or not. By inspecting returned row we can
    infer whether it was success, version conflict or permission error.

  2. Now that each object is written to DB in a single query, there
    is no dependencies between queries and all of them can be blasted
    to DB in a batch without waiting for result of each. Whole batch
    continues to be executed in a single transaction, so outcome is
    the same, but batching negates latency penalty.

To support batching access to native pgx connection is required, for that ExecuteInTxPgx variant was added and all call-site of StorageWrite converted to it.

Previously multiple objects write required 2 round trips
to DB per object: one to fetch object state from DB and second
to update object. Each query is indexed and fast, but
with latency to DB comparable to query execution time it adds
significant overhead.

This change optimizes multiple objects write in 2 steps:

1. Instead of reading DB state for each object and then deciding
   on possible write operation, perform write operation unconditionally
   with correct predicates (version and permissions) defined where
   applicable. That said write operation might not succeed if row doesn't
   match predicate. Write query is structured in such way, that
   final state of the row in the database is returned, regardless
   whether writeop successed or not. By inspecting returned row we can
   infer whether it was success, version conflict or permission error.

2. Now that each object is written to DB in a single query, there
   is no dependencies between queries and all of them can be blasted
   to DB in a batch without waiting for result of each. Whole batch
   continues to be executed in a single transaction, so outcome is
   the same, but batching negates latency penalty.
@redbaron redbaron merged commit ca07adf into master Jul 18, 2023
1 check passed
@redbaron redbaron deleted the storage-bulk-update branch July 18, 2023 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants