engine: persist data to kernel #143

huachaohuang · 2021-12-01T05:12:39Z

This PR enables the hash engine to persist data in a kernel. It demonstrates the interaction between Engine and Kernel.

Note that our kernel is still buggy as described here, so the engine recovery thing doesn't work if we test with more records here. I will fix that on the kernel side later.

Closes #59

src/kernel/src/mem/mod.rs

zojw · 2021-12-01T09:06:15Z

it seems we have multiple usage modes, the two most obvious ways are "as-storage-lib" or "as-storage-service".....

it let us think a question:

do we also need to handle logic like rocksdb's "OPTION-XXXX" file?
does components assembly logic is also like "using mem or file" also need persist as config file?
maybe info like OPTION also need in "as-storage-service mode" and where are they stored?

huachaohuang · 2021-12-01T09:10:29Z

it seems we have multiple usage modes, the two most obvious ways are "as-storage-lib" or "as-storage-service".....

it let us think a question:

do we also need to handle logic like rocksdb's "OPTION-XXXX" file?

does components assembly logic is also like "using mem or file" also need persist as config file?

maybe info like OPTION also need in "as-storage-service mode" and where are they stored?

Yes, we need to persist the kernel options once a kernel is created. It is very dangerous to use a kernel in one way and then switch to another way. So we should provide some validation when a kernel is open.

zojw · 2021-12-01T11:36:53Z

src/engine/hash/src/engine.rs

+    pub async fn set(&self, ts: Timestamp, key: Vec<u8>, value: Vec<u8>) -> Result<()> {
+        let current = self.current_version().await;
+        current.set(ts, key, value).await?;
+        if let Some((imm, version)) = current.should_flush().await {


maybe here has some concurrency problem? two thread set at the same time maybe generate two immutable table & version... 🧐

Well, that's why I acquire a lock at https://github.com/engula/engula/pull/143/files#diff-4ee1308253a5d8885bc1d6a84ecf80e15233c086174740a737302e56654ac6a2R58.

We still need to prevent flushing multiple memory tables at the same time, though. There are still a lot of work to do before this engine can work decently 🤣

huachaohuang · 2021-12-06T07:39:45Z

@zojw @tisonkun This PR is ready for review :)

tisonkun · 2021-12-06T07:46:22Z

@huachaohuang thanks for your reminder! I'm reviewing this PR now.

tisonkun

Thanks for preparing this PR! Comments inline.

I'll take a closer look at engine.rs later.

src/engine/hash/src/engine.rs

src/engine/hash/src/format.rs

src/engine/hash/src/memtable.rs

zojw · 2021-12-06T12:35:53Z

src/engine/hash/src/engine.rs

+        for object in &update.add_objects {
+            // We assume that objects are flushed from the oldest immtable to the newest.
+            let reader = version.bucket.new_sequential_reader(object).await?;
+            let table_reader = TableReader::new(reader).await?;


mark a improvement~ maybe we can build all readers first then consume readers :)

logic like

for { // build readers } for { // build table reader }

then for gRPC impl, it can pipeline send requests to remote

Sounds a good idea.

src/engine/hash/src/engine.rs

Co-authored-by: tison <wander4096@gmail.com>

zojw · 2021-12-06T15:13:23Z

src/engine/hash/src/engine.rs

+            for event in events {
+                ts += 1;
+                let (key, value) = codec::decode_record(&event.data)?;
+                current.put(ts, key.to_owned(), value.to_owned()).await;


do we need to check MEMTABLE_SIZE here after recovery~?

it seems some rocksdb-like engines do that(it also has smaller memtable size threshold than normal flush operation, restart process maybe produce more sst file - -), I'm not sure is any practice need do it 😕

Yes, I think so. But maybe leave it as a future improvement. This PR has done enough.

src/engine/hash/src/table_reader.rs

tisonkun · 2021-12-06T16:27:03Z

src/engine/hash/src/engine.rs

@@ -40,8 +37,8 @@ pub struct Engine<K: Kernel> {
    stream: K::Stream,
    bucket: K::Bucket,
    current: Arc<Mutex<Arc<EngineVersion<K>>>>,
-    last_ts: Arc<Mutex<Timestamp>>,
-    last_number: Arc<AtomicU64>,
+    last_timestamp: Arc<Mutex<Timestamp>>,


why do you rename the field to last_timestamp?

Hmm, I can't give a strong reason here. Just thought that if I have multiple last_xxx fields here, maybe I'd better make them more verbose.

@tisonkun Any more comments? If not, I am going to merge it then.

zojw

LGTM

tisonkun

LGTM.

huachaohuang · 2021-12-07T04:48:04Z

Thanks for your review. I believe this is a big step to complement the bridge between Engine and Kernel :)

zojw · 2021-12-07T06:01:25Z

src/engine/hash/src/engine.rs

+
+        let mut update = KernelUpdate::default();
+        let last_ts = encode_u64_meta(imm.last_update_timestamp().await);
+        update.set_meta(LAST_TIMESTAMP, last_ts);


after some learning, it seems we'd better add check last_ts > [current_last_ts] before do update.

someone ingest SST maybe contains higher ts value than imm.last_update_timestamp

but it's ok for now, because we don't support ingest 😄

Yeah, the engine is quite buggy for now.

huachaohuang added 2 commits November 30, 2021 22:38

kernel: add some basic structs

f63e493

engine: persist data to kernel

508660c

huachaohuang added this to the Version 0.2 milestone Dec 1, 2021

tisonkun reviewed Dec 1, 2021

View reviewed changes

src/kernel/src/mem/mod.rs Outdated Show resolved Hide resolved

zojw reviewed Dec 1, 2021

View reviewed changes

This was referenced Dec 1, 2021

journal: cleanup traits for v0.2 #153

Merged

storage: cleanup traits for v0.2 #154

Merged

kernel: add abstractions and a memory implementation #155

Merged

huachaohuang added 6 commits December 2, 2021 12:01

journal: cleanup traits for v0.2

c43137c

kernel: add kernel abstractions and a memory implementation

49f1ce5

Merge remote-tracking branch 'upstream/main' into cleanup-kernel

4d9f046

Generalize the local implementation

0da1c1b

Merge branch 'cleanup-kernel' into engine

77fbd24

Fix examples

84fb429

huachaohuang mentioned this pull request Dec 3, 2021

Port mini-redis to store data in the hash engine #160

Closed

huachaohuang added 9 commits December 3, 2021 16:01

Support general metadata

72a234e

Merge branch 'cleanup-kernel' into engine

43f3362

Merge remote-tracking branch 'upstream/main' into engine

5f2e566

Merge remote-tracking branch 'upstream/main' into cleanup-kernel

0ff6dc4

Improve metadata

7167782

Merge branch 'cleanup-kernel' into engine

5167ff0

Support engine recovery

dbb8941

Merge remote-tracking branch 'origin/main' into engine

a5f0bad

Add some descriptions

6074dce

huachaohuang mentioned this pull request Dec 5, 2021

Settle down the project layout #59

Closed

6 tasks

Tiny fix

238bc60

tisonkun reviewed Dec 6, 2021

View reviewed changes

src/engine/hash/src/engine.rs Outdated Show resolved Hide resolved

src/engine/hash/src/format.rs Outdated Show resolved Hide resolved

src/engine/hash/src/format.rs Outdated Show resolved Hide resolved

src/engine/hash/src/memtable.rs Show resolved Hide resolved

zojw reviewed Dec 6, 2021

View reviewed changes

src/engine/hash/src/engine.rs Show resolved Hide resolved

huachaohuang and others added 4 commits December 6, 2021 21:30

Update src/engine/hash/src/engine.rs

d9fe1dc

Co-authored-by: tison <wander4096@gmail.com>

Merge remote-tracking branch 'origin/main' into engine

cc0e0d0

Merge remote-tracking branch 'huachao/engine' into engine

34e2d59

Address comments

b10f793

huachaohuang mentioned this pull request Dec 6, 2021

Support delete operation #168

Closed

Rename set to put

b733275

zojw reviewed Dec 6, 2021

View reviewed changes

tisonkun reviewed Dec 6, 2021

View reviewed changes

src/engine/hash/src/table_reader.rs Outdated Show resolved Hide resolved

tisonkun reviewed Dec 6, 2021

View reviewed changes

Address comments

895e382

zojw approved these changes Dec 7, 2021

View reviewed changes

tisonkun approved these changes Dec 7, 2021

View reviewed changes

huachaohuang merged commit c19e631 into engula:main Dec 7, 2021

huachaohuang deleted the engine branch December 7, 2021 04:48

zojw reviewed Dec 7, 2021

View reviewed changes

zojw mentioned this pull request Dec 7, 2021

kernel: add manifest abstraction and a memory implementation #167

Merged

tisonkun mentioned this pull request Dec 7, 2021

docs: add the design document of Warehouse #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: persist data to kernel #143

engine: persist data to kernel #143

huachaohuang commented Dec 1, 2021 •

edited

Loading

zojw commented Dec 1, 2021 •

edited

Loading

huachaohuang commented Dec 1, 2021

zojw Dec 1, 2021 •

edited

Loading

huachaohuang Dec 1, 2021

huachaohuang Dec 1, 2021

huachaohuang commented Dec 6, 2021

tisonkun commented Dec 6, 2021

tisonkun left a comment

zojw Dec 6, 2021

huachaohuang Dec 6, 2021

zojw Dec 6, 2021

huachaohuang Dec 7, 2021

tisonkun Dec 6, 2021

huachaohuang Dec 7, 2021

huachaohuang Dec 7, 2021

zojw left a comment

tisonkun left a comment

huachaohuang commented Dec 7, 2021

zojw Dec 7, 2021

huachaohuang Dec 7, 2021

engine: persist data to kernel #143

engine: persist data to kernel #143

Conversation

huachaohuang commented Dec 1, 2021 • edited Loading

zojw commented Dec 1, 2021 • edited Loading

huachaohuang commented Dec 1, 2021

zojw Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huachaohuang commented Dec 6, 2021

tisonkun commented Dec 6, 2021

tisonkun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zojw left a comment

Choose a reason for hiding this comment

tisonkun left a comment

Choose a reason for hiding this comment

huachaohuang commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huachaohuang commented Dec 1, 2021 •

edited

Loading

zojw commented Dec 1, 2021 •

edited

Loading

zojw Dec 1, 2021 •

edited

Loading