Add split check test #52

Connor1996 · 2020-02-09T08:51:40Z

simplify split check logic
add split check test
fix ExceedEndKey
move kv/rowcodec to kv/coprocessor/rowcodec
extract EncodeKey and DecodeUserKey to kv/util/codec
move engine_util to kv/util/engine_util

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 · 2020-02-10T05:39:42Z

nrc

Nice reorganisation! There are a few comments inline, but they are mostly minor and most of the PR looks good.

I don't like to have util packages/modules - I think they are a code smell and that nearly always these packages can be better organised. In this case it seems that codec and engine_utils would be fine in kv. I think engine_utils could be somewhere better, but not sure where.

kv/tikv/raftstore/runner/runner_test.go

kv/util/codec/codec.go

nrc · 2020-02-11T01:27:42Z

kv/util/codec/codec.go

+
+	encGroupSize = 8
+	encMarker    = byte(0xFF)
+	encPad       = byte(0x0)


I think using these consts makes the code harder to read, especially since the comments often refer to the fixed numbers.

I think it's better to use a const number with accurate meaning instead of a magic number

kv/util/codec/codec.go

nrc · 2020-02-11T03:45:09Z

Thinking about it, I'm a bit surprised that the functions for encoding/decoding keys are used outside of storage/kvstore. I guess we use EncodeBytes in other places? Maybe we can leave the functions for operating on keys and timestamps in transaction.go, but refactor them to use codec/EncodeBytes? My reasoning is that how keys are encoded in the local storage should be an implementation detail of the storage layer.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 · 2020-02-11T13:01:02Z

Thinking about it, I'm a bit surprised that the functions for encoding/decoding keys are used outside of storage/kvstore. I guess we use EncodeBytes in other places?

No, EncodeBytes is only used by transaction package. But region split should make a clean cut for mvcc key that the different versions of same user key should locate in a same region. That's why split check use DecodeBytes to check if the key is encoded.

Connor1996 · 2020-02-11T13:07:10Z

Maybe we can leave the functions for operating on keys and timestamps in transaction.go, but refactor them to use codec/EncodeBytes? My reasoning is that how keys are encoded in the local storage should be an implementation detail of the storage layer.

Okay. But the split check test which would be deleted in the final (because unrelated with the lessons) needs EncodeKey. I will duplicate EncodeKey in the test now.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 · 2020-02-11T13:14:29Z

I don't like to have util packages/modules - I think they are a code smell and that nearly always these packages can be better organised. In this case it seems that codec and engine_utils would be fine in kv. I think engine_utils could be somewhere better, but not sure where.

let's keep it now. The next step is combining the kv/config and kv/tikv/config, and the level of directory tikv can be removed. We'll find a better place then.

Connor1996 · 2020-02-12T13:35:08Z

PTAL again @nrc

nrc · 2020-02-13T03:24:35Z

No, EncodeBytes is only used by transaction package. But region split should make a clean cut for mvcc key that the different versions of same user key should locate in a same region. That's why split check use DecodeBytes to check if the key is encoded.

OK, thanks for the explanation. Longer term I think we should use some kind of reader to check this rather than use DecodeBytes, but that doesn't need to block this PR

nrc

Thanks for the changes, looks good. I have two more minor suggestions, but you can land the PR with or without them.

nrc · 2020-02-13T03:29:13Z

kv/tikv/storage/kvstore/transaction.go

-	binary.BigEndian.PutUint64(newKey[newLen-8:], ^ts)
+	encodedKey := codec.EncodeBytes(key)
+	newKey := append(encodedKey, make([]byte, 8)...)
+	binary.BigEndian.PutUint64(newKey[len(newKey)-8:], ^ts)


Could use len(encodedKey) instead of len(newKey)-8, I think it's clearer.

nrc · 2020-02-13T03:30:41Z

kv/util/codec/codec.go

+
+// DecodeBytes decodes bytes which is encoded by EncodeBytes before,
+// returns the leftover bytes and decoded value if no error.
+func DecodeBytes(b []byte) ([]byte, []byte, error) {


Can we panic here instead of returning the error? I think we do not need proper error handling for TinyKV.

already panic in DecodeKey, but we need to know whether the key is encoded or raw by checking the error != nil

Yeah, I mean it would be nicer to panic in DecodeBytes than DecodeKey. We can return nil, nil if there is an error

https://github.com/pingcap-incubator/tinykv/pull/52/files#diff-ee22b14637a12c92c212438b82fb4600R46
See, if we panic in DecodeBytes, splitting on a raw key will cause panic cause it doesn't follow the format!

It is a bit sad that we cannot always know whether a key is raw or encoded. I suppose the problem is that the client can use the raw API to write raw keys to the DB? But this makes me wonder that one could write a raw key which can be validly decoded, but isn't in fact an encoded key and then things would break. Maybe we should not offer the raw API in TinyKV at all? Does TinySQL use it?

If it's a raw key which can be validly decoded, decode and then encode on the key doesn't make any difference, so it won't break the correctness for split checker.

Also, TinyKV use raw API for LAB1-3 which is to build a kvserver without transaction support

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

support basic functionality Prewrite/Commit/Rollback/Get/Scan Fill more implementations support coprocessor request. It's runnable now. Most of the code is copied from mocktikv. fix build support concurrent writer support more commands fix bugs support multiple regions and auto split. only resolve lock in region range remove tableScan and indexScan limit Avoid creating iterator too many times. support near seek optimize codec fix check prewrite conflict Use ValidForPrefix avoids conflict. And add message 'try again later' will make the error retryable in TiDB. fix delete range assert fix lock bug and support GC 1. lock key should be raw key rather than mvcc key, or resolve lock may send to another region. 2. support GC. fix scan lock is ignored. Should check pair.Err before check pair.Key. fix rollback command and update options change storage model After this change, lock and latest version is stored together with the same key, old version has a different prefix. During scan, we don't need to traverse every old version. GC is not implement yet. fix and optimize resolve lock fix prewrite keys support GC fix GC bug fix panic rename package name to fork fix build The NewChunk has been removed in TiDB. serialize write operation always use file-io for value log use a wake up channel to save CPU usage. fix reverse scan and limit scan count. rename lock/unlock. Delete instead of SetEntry if UserMeta is maixedDelFlag fix gc panic do not trace already committed error fix panic reduce prewrite conflict possibility and fix resolve lock panic extract acquire lock function and log latency refactor for efficient Resolve request. address comment user defer unlock write requests in batch to avoid exceed max transaction size. And provide an option to set the max transaction size. fix lock bug and implement efficient ScanLock 4k entries in a batch makes prewrite idempotent fix table scan and index scan bugs. makes commit request idempotent fix reverse scan use getter to avoid creating iterator in BatchGet close the DB when received signal. If we set sync write to false, the data may be corrupted if we do not close DB. fix GC bug (#25) * fix GC bug fix the lock unresolvable bug. (#26) * fix lock unresolvable remove key encoding (#27) * remove mv keys, always use raw key fix split bug after removing key encoding (#29) This bug will cause a single region as big as the whole database. Should use region.startKey instead of region.meta.StartKey in split check. return ErrLocked instead of ErrRetryable in Prewrite (#30) introduce lockstore (#31) * introduce lockstore A in-memory lock-free data-structure to store lock. - allows single writer and multiple readers without lock. - memory can be reused after all the entries in a block are deleted. * manual inline getNext add an command line flog to set value log path. (#32) kv: tiny clean up RUnlock and RLock might cost more then a memory allocation, let's keep things simple lock less scope avoid holding txnKeysMu for long time (#35) * avoid holding txnKeysMu for long time move lock out of LSM tree (#36) * move lock out of LSM tree. fix panic introduced by DBReader refactoring (#37) tiny fix Make code more clear Update mvcc.go move acquireLatches method from MVCCStore to regionCtx clear magic number (#41) move write batch out of latch scope (#42) move write batch construction out of latch scope (#43) Tiny refactor (#44) check `badger.ErrKeyNotFound` error returned by `txn.Get`. (#45) fix duplicate prewrite request caused panic. (#47) The lock store only supports insert and delete, if there is a same key, insert would fail and panic. When we meet a duplicate prewrite, we should not write the lock store. avoid `txn.Get` in `Commit` if there is no old version. (#46) unify names (#48) unify iterator style (#49) fix timestamp comparation in KvGet (#50) rename LockStore to MemStore (#51) typo (#52) fix bugs found in high conflicting concurrent transaction. (#53) 1. Fix `Cleanup` mistakenly write the batch to DB instead of lock store and use separate write batch type for Lock and DB to prevent error like this. 2. `Rollback` function should read all keys in lock store before read DB. 3. Fix delete key do actual delete instead of write tombstone. make conflict errors retryable (#54) fix unused import library (#55) fix WaitGroup (#56) After `WaitGroup.Wait` is called, another `Add` call will cause panic, so call `Add` in the Lock to prevent this. * call done after epoch not match split rollbackGC batch (#57) In TPCC benchmark, the number of rollback keys in one minutes are very large, up to hundreds of thousands. It increases latency a lot. Split the rollbackGC batch into small batches to avoid high latency. Make map with capacity (#58) preallocate slices (#59) make source more readable (#60) make it compile with latest tidb (#61) config: add max-procs (#63) support chunk execution (#64) * support chunk execution inline errors.Trace call explicitly (#65) unify Scan function to use closure and clean up code (#66) specify startKey and endKey in iterator options (#67) use Value instead of ValueCopy (#69) clear db reader (#70) optimize count(*) (#71) * optimize count(*) fix compile issue (#72) rename buf to cols (#73) fix build with latest tipb (#74) fix build for removing types.DatumRow (#75) implement closure executor (#76) * implement closure executor handle end key checking implicitly and remove the checking out of hot… (#77) support more executors in closure executor (#78) support the new row format (#79) clean up logs (#80) optimize and refactor encoding and add benchmark (#81) add old value to lock (#82) Avoid one DB.Get in Commit operation. continue handle request if necessary (#84) Rename and cleanup (#85) * use multiple badger DB * pre-split more regions and use 8 DBs * rename project name fix build error and use dep to manage dependencies (#86) * fix build error and use dep to manage dependencies add a flag to control shard-key feature (#88) update dep (#90) add topn closure executor (#89) * add topn closure executor update badger version (#93) use go mod (#94) fix memory leak in lock store (#95) move startTS and commitTS from mvccValue to UserMeta. (#96) fix crash when setting shardKey to false (#97) implement GCCompactionFilter (#98) Disable sharding by default (#99) fix delete panic (#100) upgrade mod (#101)

* add split check test Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 added 2 commits February 9, 2020 14:19

fix start handler

5593540

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

add split check test

8b63277

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 requested a review from nrc February 9, 2020 08:51

nrc reviewed Feb 11, 2020

View reviewed changes

Connor1996 added 3 commits February 11, 2020 20:23

Merge remote-tracking branch 'origin/master' into zbk/cleanup

4d5e50e

clean comment

12ae66e

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

address comment

e418ffa

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

fix test

4b78167

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

nrc approved these changes Feb 13, 2020

View reviewed changes

address comment

4fda8c0

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Connor1996 merged commit 9dc002c into master Feb 13, 2020

Connor1996 added a commit that referenced this pull request Apr 30, 2020

Add split check test (#52)

087e54e

* add split check test Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add split check test #52

Add split check test #52

Connor1996 commented Feb 9, 2020

Connor1996 commented Feb 10, 2020

nrc left a comment

nrc Feb 11, 2020

Connor1996 Feb 11, 2020

nrc commented Feb 11, 2020

Connor1996 commented Feb 11, 2020 •

edited

Connor1996 commented Feb 11, 2020

Connor1996 commented Feb 11, 2020

Connor1996 commented Feb 12, 2020

nrc commented Feb 13, 2020

nrc left a comment

nrc Feb 13, 2020

nrc Feb 13, 2020

Connor1996 Feb 13, 2020

nrc Feb 13, 2020

Connor1996 Feb 13, 2020

nrc Feb 13, 2020

Connor1996 Feb 13, 2020

Add split check test #52

Add split check test #52

Conversation

Connor1996 commented Feb 9, 2020

Connor1996 commented Feb 10, 2020

nrc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrc commented Feb 11, 2020

Connor1996 commented Feb 11, 2020 • edited

Connor1996 commented Feb 11, 2020

Connor1996 commented Feb 11, 2020

Connor1996 commented Feb 12, 2020

nrc commented Feb 13, 2020

nrc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Connor1996 commented Feb 11, 2020 •

edited