Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): use enum for object store impl #3427

Merged
merged 7 commits into from
Jul 1, 2022

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Jun 23, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

Currently in our code, we will add a prefix in the object store path to indicate whether we are operating on local or remote object store, and therefore all our object store implementation (S3ObjectStore, InMemObjectStore, DiskObjectStore ...) has to internally be aware that whether they are local object store or not, which makes the logic wired and messy.

In this PR, we use an enum to represent ObjectStoreImpl and explicitly restrict which object store can be used as remote or local object store. The path check and routing is done in this higher level enum so that the real object store implementation doesn't have to be aware that whether they are local or not. The enums look like

pub enum ObjectStoreImpl {
    InMem(MonitoredObjectStore<InMemObjectStore>),
    Disk(MonitoredObjectStore<DiskObjectStore>),
    S3(MonitoredObjectStore<S3ObjectStore>),
    Hybrid {
        local: Box<ObjectStoreImpl>,
        remote: Box<ObjectStoreImpl>,
    },
}

Some macros are used to reduce duplicated code.

Another benefit of introducing the enum is that in the code where use ObjectStoreImpl, we can be aware which type of object store we are using, and we may call some non-object-store-trait methods of specific object stores to get better performance in some scenario.

After the enum is introduced, we can also avoid using async_trait in object store, though the original cost of using async_trait can be ignored compared to external IO.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@Little-Wallace
Copy link
Contributor

It's not necessary to use enum for object store. Because the time cost on IO and RPC will be much more than which cost on dynamic virtual-table.

@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #3427 (d2c4ab1) into main (d2ec610) will decrease coverage by 0.01%.
The diff coverage is 37.50%.

@@            Coverage Diff             @@
##             main    #3427      +/-   ##
==========================================
- Coverage   74.41%   74.39%   -0.02%     
==========================================
  Files         771      771              
  Lines      108925   108933       +8     
==========================================
- Hits        81052    81039      -13     
- Misses      27873    27894      +21     
Flag Coverage Δ
rust 74.39% <37.50%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ctl/src/cmd_impl/hummock/sst_dump.rs 0.00% <ø> (ø)
src/object_store/src/lib.rs 100.00% <ø> (ø)
src/object_store/src/object/s3.rs 0.00% <ø> (ø)
src/storage/compactor/src/server.rs 0.00% <0.00%> (ø)
src/storage/src/hummock/sstable_store.rs 80.99% <ø> (ø)
src/storage/src/hummock/vacuum.rs 89.47% <ø> (ø)
src/storage/src/store_impl.rs 8.00% <0.00%> (ø)
src/object_store/src/object/mod.rs 40.00% <25.37%> (-5.46%) ⬇️
src/object_store/src/object/disk.rs 94.32% <100.00%> (-0.10%) ⬇️
src/object_store/src/object/mem.rs 86.02% <100.00%> (+0.16%) ⬆️
... and 5 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@wenym1 wenym1 marked this pull request as ready for review June 28, 2022 10:14
@wenym1
Copy link
Contributor Author

wenym1 commented Jun 28, 2022

It's not necessary to use enum for object store. Because the time cost on IO and RPC will be much more than which cost on dynamic virtual-table.

Performance is not the main reason for introducing enum for object store in this PR. The reason is described in the PR description.

@skyzh
Copy link
Contributor

skyzh commented Jun 28, 2022

Generally looks good. For @Little-Wallace 's concern, we can still add async_trait over the traits and impls, so that they still use dynamic dispatch even if they're in an enum, and thus avoid codegen explosion.

@wenym1 wenym1 requested a review from hzxa21 June 29, 2022 07:18
@Little-Wallace
Copy link
Contributor

Little-Wallace commented Jun 29, 2022

Generally looks good. For @Little-Wallace 's concern, we can still add async_trait over the traits and impls, so that they still use dynamic dispatch even if they're in an enum, and thus avoid codegen explosion.

I just think it is not necessary to maintain a complex code to optimize performance of object-storage acquirement. Because every times we send a request to object storage we will switch thread and kernel-state and wait several milliseconds or several hundred milliseconds (because s3 is much slower than MinIO based on NVMe). And all of them will cost much more resource than a dynamic vtable.

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logics are greatly simplified. Thanks for the PR. LGTM!

Looks like the spot instance limit is hit in CI. You need to push an empty commit to rerun it.

@wenym1 wenym1 changed the title refactor(storage): use enum for object store impl and avoid using async trait in ObjectStore refactor(storage): use enum for object store impl Jun 30, 2022
@wenym1 wenym1 added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jul 1, 2022
@mergify mergify bot merged commit 4a75d24 into main Jul 1, 2022
@mergify mergify bot deleted the yiming/refactor-object-store branch July 1, 2022 03:29
huangjw806 pushed a commit that referenced this pull request Jul 5, 2022
* refactor(storage): use enum for object store impl

* avoid using async_trait in object store trait

* add async_trait back for object store and remove the local and remote enum

* empty commit to trigger CI rerun

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants