Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read the whole data object in compaction #3276

Merged
merged 25 commits into from
Jun 27, 2022
Merged

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Jun 16, 2022

What's changed and what's your intention?

close #3245

reduce iops in compaction to save cost on S3

image

image

Main Changes

  • remove fill cache after flush in compute-node. Because if we fill some slice of Bytes to block-cache, the whole memory of this file would not be free until every block of it have been evited. If we want to re-fill some data to cache to avoid cache-miss after flushing shared buffer, we need a better strategy, and I think it shall be port in another PR, not this.
  • Do not use block-cache in compactor-node because it is not necessary.
  • Awalys read the whole object for sst-files in iterator of compactor node.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
This reverts commit 2deb294.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 855 files.

Valid Invalid Ignored Fixed
851 3 1 0
Click to see the invalid file list
  • src/common/src/config_size.rs
  • src/storage/src/hummock/iterator/in_memory_iterator.rs
  • src/storage/src/hummock/table_acessor.rs

src/common/src/config_size.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/in_memory_iterator.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/table_acessor.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 854 files.

Valid Invalid Ignored Fixed
849 4 1 0
Click to see the invalid file list
  • src/common/src/sys/cgroup.rs
  • src/common/src/config_size.rs
  • src/storage/src/hummock/iterator/in_memory_iterator.rs
  • src/storage/src/hummock/table_acessor.rs

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 854 files.

Valid Invalid Ignored Fixed
851 2 1 0
Click to see the invalid file list
  • src/common/src/sys/cgroup.rs
  • src/common/src/config_size.rs

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@hzxa21 hzxa21 self-requested a review June 16, 2022 11:39
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@codecov
Copy link

codecov bot commented Jun 16, 2022

Codecov Report

Merging #3276 (74072d2) into main (e251132) will increase coverage by 0.00%.
The diff coverage is 86.26%.

@@           Coverage Diff           @@
##             main    #3276   +/-   ##
=======================================
  Coverage   74.43%   74.43%           
=======================================
  Files         769      769           
  Lines      107507   107554   +47     
=======================================
+ Hits        80021    80061   +40     
- Misses      27486    27493    +7     
Flag Coverage Δ
rust 74.43% <86.26%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/compactor/src/server.rs 0.00% <0.00%> (ø)
src/storage/src/hummock/iterator/concat_inner.rs 96.38% <57.14%> (-2.37%) ⬇️
src/storage/src/hummock/sstable/mod.rs 95.55% <75.00%> (-1.30%) ⬇️
src/storage/src/hummock/test_utils.rs 85.71% <80.00%> (-0.42%) ⬇️
src/storage/src/hummock/compactor.rs 78.18% <83.33%> (+0.51%) ⬆️
src/storage/src/hummock/sstable_store.rs 80.99% <84.40%> (-2.05%) ⬇️
src/storage/src/hummock/sstable/multi_builder.rs 90.64% <88.00%> (-0.46%) ⬇️
...ge/src/hummock/sstable/forward_sstable_iterator.rs 95.83% <96.55%> (+0.71%) ⬆️
...a/src/hummock/compaction/tier_compaction_picker.rs 96.71% <100.00%> (ø)
src/storage/src/hummock/block_cache.rs 72.16% <100.00%> (-16.85%) ⬇️
... and 18 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@hzxa21 hzxa21 requested a review from wenym1 June 20, 2022 04:17
src/storage/src/hummock/sstable_store.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/sstable/mod.rs Show resolved Hide resolved
src/storage/src/hummock/iterator/concat_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/table_accessor.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/state_store.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/sstable/mod.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/block_cache.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/compactor.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/sstable/in_memory_iterator.rs Outdated Show resolved Hide resolved
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
commit 309ce36
Author: Bowen <36908971+BowenXiao1999@users.noreply.github.com>
Date:   Tue Jun 21 13:19:44 2022 +0800

    refactor(agg): clean up unused fields & refactor (#3339)

    * refactor(agg): clean up unused fields

    * delete file

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit e48dced
Author: Shmiwy <wyf000219@126.com>
Date:   Tue Jun 21 12:55:21 2022 +0800

    feat(storage): support compression setting per level (#3362)

    Signed-off-by: Shmiwy <wyf000219@126.com>

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 961936f
Author: Alex Chi <iskyzh@gmail.com>
Date:   Tue Jun 21 12:42:45 2022 +0800

    feat(test): parallelize sqlsmith test (#3360)

    * feat(test): parallelize sqlsmith test

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

    * more tests

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit fa90541
Author: TennyZhuang <zty0826@gmail.com>
Date:   Tue Jun 21 12:25:33 2022 +0800

    chore(github): feature-request template should use the feature label (#3359)

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 02405e1
Author: Bowen <36908971+BowenXiao1999@users.noreply.github.com>
Date:   Tue Jun 21 12:13:07 2022 +0800

    style: add more comments & refactor on pg-wire  (#3358)

    style: add more comments on pg-wire code

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 1c09432
Author: Li0k <yuli@singularity-data.com>
Date:   Tue Jun 21 12:00:37 2022 +0800

    fix(storage): fix slow unit-test in compactor_test (#3357)

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 19036a6
Author: xxchan <37948597+xxchan@users.noreply.github.com>
Date:   Tue Jun 21 05:48:08 2022 +0200

    fix(binder): do not allow correlated subquery in join tables (#3352)

    * fix(binder): do not allow correlated subquery in join tables

    * clippy

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit c37c9c4
Author: TennyZhuang <zty0826@gmail.com>
Date:   Tue Jun 21 11:24:40 2022 +0800

    refactor: remove unnecessary lazy_static (#3353)

    Signed-off-by: TennyZhuang <zty0826@gmail.com>

    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 68596ab
Author: Croxx <mrcroxx@outlook.com>
Date:   Tue Jun 21 11:11:53 2022 +0800

    feat(cache): introduce LruCacheEventListener to subscribe erasure and eviction (#3334)

commit 5627f25
Author: Steven Chua <stevengkc714@protonmail.com>
Date:   Tue Jun 21 10:54:12 2022 +0800

    feat(ctl): Display SstableIdInfo and Block Metadata in sst-dump (#3338)

    * feat(ctl): Add sst-dump command to risectl

    * feat(ctl): Fix risectl compatibility and remove VNode info

    * feat(ctl): Add checksum and compression algo for each block

    * feat(ctl): Add SstableIdInfo data to sst-dump

    * feat(ctl): Fix compilation errors

    * feat(ctl): Fix compilation errors and bugs

commit 16ffd98
Author: Name1e5s <name1e5s@qq.com>
Date:   Tue Jun 21 10:50:33 2022 +0800

    fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round (#3319)

    * fix(expr): cast int16/int32/int64/float32 to float64 in floor/ceil/round

    * fix plan

    Co-authored-by: TennyZhuang <zty0826@gmail.com>

commit 964bb92
Author: TennyZhuang <zty0826@gmail.com>
Date:   Tue Jun 21 10:23:56 2022 +0800

    ci(Mergify): configuration update (#3355)

    Signed-off-by: null <zty0826@gmail.com>

commit dac904e
Author: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
Date:   Tue Jun 21 09:57:33 2022 +0800

    feat(executor): streaming hyperloglog improvements (#3315)

    * minor

    * rename tests

    * minor

    * remove option, const eval of param, better comments, succint tests

commit cd4f302
Author: TennyZhuang <zty0826@gmail.com>
Date:   Tue Jun 21 09:34:54 2022 +0800

    ci(Mergify): configuration update (#3252)

    * ci(Mergify): configuration update

    Signed-off-by: null <zty0826@gmail.com>

    * Update .mergify.yml

    * Update .mergify.yml

    * Update .mergify.yml

    Co-authored-by: xxchan <37948597+xxchan@users.noreply.github.com>

commit 2aa7e8e
Author: Xinpeng Wei <windowsxp@sjtu.edu.cn>
Date:   Mon Jun 20 22:11:51 2022 +0800

    feat(frontend): add InternalStateTable Catalog (#3139)

    * use TableMessage for internal table

    * fix risedev check

    * update planner test

    * fix unit test

    * fix misc check

    * fix ci

    * fix issues in PR comments

    * fix clippy

    * fix ci

    * update planner test

commit 1e190dd
Author: xxchan <37948597+xxchan@users.noreply.github.com>
Date:   Mon Jun 20 14:16:12 2022 +0200

    fix(binder): do not allow correlated input ref in order by (#3346)

commit 263d770
Author: Tao Wu <wutao@singularity-data.com>
Date:   Mon Jun 20 18:54:25 2022 +0800

    fix: build failure caused by OptimzierContext::new (#3340)

commit 9f18401
Author: Tao Wu <wutao@singularity-data.com>
Date:   Mon Jun 20 17:47:22 2022 +0800

    feat: introduce the framework of sqlsmith (#3305)

commit 096a991
Author: Alex Chi <iskyzh@gmail.com>
Date:   Mon Jun 20 17:40:05 2022 +0800

    feat(ctl): add bench command (#3337)

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

commit 075d596
Author: TennyZhuang <zty0826@gmail.com>
Date:   Mon Jun 20 17:05:50 2022 +0800

    build: bump toolchain to 20220620 (#3324)

    * build: bump toolchain to 20220620

    Signed-off-by: TennyZhuang <zty0826@gmail.com>

    * also update docker-compose

    Signed-off-by: TennyZhuang <zty0826@gmail.com>

commit d864f30
Author: Wenzhuo Liu <lwzbill@foxmail.com>
Date:   Mon Jun 20 16:38:21 2022 +0800

    feat: add output_indices to join executors (#3047)

commit 13b9d58
Author: StrikeW <wangsiyuanse@gmail.com>
Date:   Mon Jun 20 16:29:56 2022 +0800

    feat(stream): enable append-only mv plan for kafka source (#3333)

commit ea386d3
Author: Liang <44948473+soundOfDestiny@users.noreply.github.com>
Date:   Mon Jun 20 15:50:55 2022 +0800

    refactor(compaction): deprecate the HashStrategy for OverlapStrategy (#3331)

commit a9fba38
Author: Liang <44948473+soundOfDestiny@users.noreply.github.com>
Date:   Mon Jun 20 15:34:25 2022 +0800

    fix(picker): fetch info from table_id field in sstableinfo (#3332)

commit 5d2bb42
Author: Bohan Zhang <tabvision@bupt.icu>
Date:   Mon Jun 20 14:55:59 2022 +0800

    test(stream): add ci for split change mutation in source (#3039)

    * stage

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * stage

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * add test

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * change e2e to datagen

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * stage

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * some bug to fix

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * fix async issue

    Signed-off-by: tabVersion <tabvision@bupt.icu>

    * add assert

    Signed-off-by: tabVersion <tabvision@bupt.icu>

commit 04fe6d6
Author: Liang <44948473+soundOfDestiny@users.noreply.github.com>
Date:   Mon Jun 20 14:47:56 2022 +0800

    refactor(vnode bitmap): remove vnode bitmap in sst info (#3329)

commit c6d1288
Author: Li0k <yuli@singularity-data.com>
Date:   Mon Jun 20 14:29:56 2022 +0800

    feat(storage): add manual compaction picker for targeted compaction (#3288)

    * feat(storage): add ManualCompactionPicker

    * feat(storage): distinguish get_compaction_task for manual

    * feat(storage): meta client support more parameters for manual_compaction

    * chore(storage): add tracing and some notes

    * chore(storage): split manual_compaction_picker to independent file

    * feat(storage): fix target_input check pending and support manual_pick for dynamic_level_selector

    * fix(storage): internal_table_id include mv_id

    * fix(storage): fix picker check target_input_ssts pending

    * fix(storage): fix picker with total_file_size

commit d04954f
Author: Renjie Liu <liurenjie2008@gmail.com>
Date:   Mon Jun 20 14:27:01 2022 +0800

    fix(ci): Reduce log (#3330)

commit eca9239
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Mon Jun 20 13:59:56 2022 +0800

    refactor(storage): remove `Option` on pk serializer of cell-based table (#3328)

    * minor refactor

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * remove option of pk serializer

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * remove pk serializer in state table

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * extract vnode compute

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * remove into order types

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

commit 9169436
Author: Bowen <36908971+BowenXiao1999@users.noreply.github.com>
Date:   Mon Jun 20 13:44:10 2022 +0800

    feat: apply relational refactor for hash agg (max, min) (#2999)

    * feat: two closure can not get mut ref of same variable

    * use Arc::Mutex to wrap the state table

    * roll back string agg

    * add StateTable to get_output

    * finish basic coding (unit test failed)

    * finish basic coding

    * fix bug

    * show case

    * use empty Row for scan

    * tweak

commit 35bb16a
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Mon Jun 20 13:43:11 2022 +0800

    refactor: use packed bitmap struct for vnode bitmap (#3310)

    * use bitmap in streaming

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * use bitmap in storage

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * minor fix

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

    * make bitmap optional

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

commit be50f93
Author: Liang <44948473+soundOfDestiny@users.noreply.github.com>
Date:   Mon Jun 20 13:27:37 2022 +0800

    feat(compaction): let compactor be unaware of vnode mapping (#3321)

commit 88258a6
Author: lmatz <lmatz823@gmail.com>
Date:   Sun Jun 19 21:31:01 2022 -0700

    doc: no need to manually check in PR from forks (#3325)

commit c590e18
Author: zwang28 <70626450+zwang28@users.noreply.github.com>
Date:   Mon Jun 20 12:13:50 2022 +0800

    refactor(storage): split HummockVersion's levels by compaction group. (#3206)

commit f1c3298
Author: zwang28 <70626450+zwang28@users.noreply.github.com>
Date:   Mon Jun 20 11:35:18 2022 +0800

    feat(meta): register source to compaction group manager (#3300)

commit bf08b54
Author: Zack <52342064+nhzaci@users.noreply.github.com>
Date:   Mon Jun 20 11:25:39 2022 +0800

    feat(frontend): Add sql string into context for debugging (#3312)

    * feat(frontend): Add sql string into context for debugging

    * Remove renaming

    * Refactor to use str

commit f784ba3
Author: zwang28 <70626450+zwang28@users.noreply.github.com>
Date:   Mon Jun 20 11:22:12 2022 +0800

    feat(storage): shared buffer flush L0 by compaction group (#3200)

commit bd48bba
Author: Kexiang Wang <kx.wang@hotmail.com>
Date:   Sun Jun 19 07:48:45 2022 -0400

    feat: modify interfaces to support specifying parallelism for each fr… (#3283)

    feat: modify interfaces to support specifying parallelism for each fragment

commit 8f0e0b2
Author: Steven Chua <stevengkc714@protonmail.com>
Date:   Sun Jun 19 12:56:51 2022 +0800

    feat(ctl): Support basic sst dump in risectl (#3309)

    * feat(ctl): Add sst-dump command to risectl

    * feat(ctl): Fix risectl compatibility and remove VNode info

commit daf9222
Author: Alex Chi <iskyzh@gmail.com>
Date:   Sat Jun 18 21:41:44 2022 +0800

    feat(risedev): generate risectl config (#3318)

    * feat(risedev): generate risectl config

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

    * fix

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

commit 86ff992
Author: Alex Chi <iskyzh@gmail.com>
Date:   Sat Jun 18 20:52:51 2022 +0800

    feat(ctl): support table scan (#3317)

    * feat(ctl): support table scan

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

    * license header

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

    * add docs

    Signed-off-by: Alex Chi <iskyzh@gmail.com>

commit 5ac5637
Author: Yikun Chen <36026213+cykbls01@users.noreply.github.com>
Date:   Sat Jun 18 08:03:06 2022 -0400

    feat: support interval comparison (#3222)

    1. fix timestamp substract timestamp.
    2. support interval comparison. From pgsql, 1 month equal to 30 days and 1 day equal to 86400000 ms.

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace
Copy link
Contributor Author

merge this PR after #3375 so that compactor would not cost too much memory

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace
Copy link
Contributor Author

@wenym1 Any suggestion?

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/storage/src/hummock/sstable_store.rs Show resolved Hide resolved
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace Little-Wallace enabled auto-merge (squash) June 27, 2022 07:50
@Little-Wallace Little-Wallace merged commit 744f71c into main Jun 27, 2022
@Little-Wallace Little-Wallace deleted the wallace/refactor branch June 27, 2022 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: refactor prefetch for compaction to reduce read cost
3 participants