New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add split check test #52
Conversation
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
PTAL @nrc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice reorganisation! There are a few comments inline, but they are mostly minor and most of the PR looks good.
I don't like to have util packages/modules - I think they are a code smell and that nearly always these packages can be better organised. In this case it seems that codec and engine_utils would be fine in kv. I think engine_utils could be somewhere better, but not sure where.
|
||
encGroupSize = 8 | ||
encMarker = byte(0xFF) | ||
encPad = byte(0x0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using these consts makes the code harder to read, especially since the comments often refer to the fixed numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use a const number with accurate meaning instead of a magic number
Thinking about it, I'm a bit surprised that the functions for encoding/decoding keys are used outside of storage/kvstore. I guess we use |
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
No, |
Okay. But the split check test which would be deleted in the final (because unrelated with the lessons) needs |
let's keep it now. The next step is combining the |
PTAL again @nrc |
OK, thanks for the explanation. Longer term I think we should use some kind of reader to check this rather than use DecodeBytes, but that doesn't need to block this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, looks good. I have two more minor suggestions, but you can land the PR with or without them.
binary.BigEndian.PutUint64(newKey[newLen-8:], ^ts) | ||
encodedKey := codec.EncodeBytes(key) | ||
newKey := append(encodedKey, make([]byte, 8)...) | ||
binary.BigEndian.PutUint64(newKey[len(newKey)-8:], ^ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use len(encodedKey)
instead of len(newKey)-8
, I think it's clearer.
|
||
// DecodeBytes decodes bytes which is encoded by EncodeBytes before, | ||
// returns the leftover bytes and decoded value if no error. | ||
func DecodeBytes(b []byte) ([]byte, []byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we panic here instead of returning the error? I think we do not need proper error handling for TinyKV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already panic in DecodeKey
, but we need to know whether the key is encoded or raw by checking the error != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean it would be nicer to panic in DecodeBytes than DecodeKey. We can return nil, nil if there is an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pingcap-incubator/tinykv/pull/52/files#diff-ee22b14637a12c92c212438b82fb4600R46
See, if we panic in DecodeBytes
, splitting on a raw key will cause panic cause it doesn't follow the format!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit sad that we cannot always know whether a key is raw or encoded. I suppose the problem is that the client can use the raw API to write raw keys to the DB? But this makes me wonder that one could write a raw key which can be validly decoded, but isn't in fact an encoded key and then things would break. Maybe we should not offer the raw API in TinyKV at all? Does TinySQL use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a raw key which can be validly decoded, decode and then encode on the key doesn't make any difference, so it won't break the correctness for split checker.
Also, TinyKV use raw API for LAB1-3 which is to build a kvserver without transaction support
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
support basic functionality Prewrite/Commit/Rollback/Get/Scan Fill more implementations support coprocessor request. It's runnable now. Most of the code is copied from mocktikv. fix build support concurrent writer support more commands fix bugs support multiple regions and auto split. only resolve lock in region range remove tableScan and indexScan limit Avoid creating iterator too many times. support near seek optimize codec fix check prewrite conflict Use ValidForPrefix avoids conflict. And add message 'try again later' will make the error retryable in TiDB. fix delete range assert fix lock bug and support GC 1. lock key should be raw key rather than mvcc key, or resolve lock may send to another region. 2. support GC. fix scan lock is ignored. Should check pair.Err before check pair.Key. fix rollback command and update options change storage model After this change, lock and latest version is stored together with the same key, old version has a different prefix. During scan, we don't need to traverse every old version. GC is not implement yet. fix and optimize resolve lock fix prewrite keys support GC fix GC bug fix panic rename package name to fork fix build The NewChunk has been removed in TiDB. serialize write operation always use file-io for value log use a wake up channel to save CPU usage. fix reverse scan and limit scan count. rename lock/unlock. Delete instead of SetEntry if UserMeta is maixedDelFlag fix gc panic do not trace already committed error fix panic reduce prewrite conflict possibility and fix resolve lock panic extract acquire lock function and log latency refactor for efficient Resolve request. address comment user defer unlock write requests in batch to avoid exceed max transaction size. And provide an option to set the max transaction size. fix lock bug and implement efficient ScanLock 4k entries in a batch makes prewrite idempotent fix table scan and index scan bugs. makes commit request idempotent fix reverse scan use getter to avoid creating iterator in BatchGet close the DB when received signal. If we set sync write to false, the data may be corrupted if we do not close DB. fix GC bug (#25) * fix GC bug fix the lock unresolvable bug. (#26) * fix lock unresolvable remove key encoding (#27) * remove mv keys, always use raw key fix split bug after removing key encoding (#29) This bug will cause a single region as big as the whole database. Should use region.startKey instead of region.meta.StartKey in split check. return ErrLocked instead of ErrRetryable in Prewrite (#30) introduce lockstore (#31) * introduce lockstore A in-memory lock-free data-structure to store lock. - allows single writer and multiple readers without lock. - memory can be reused after all the entries in a block are deleted. * manual inline getNext add an command line flog to set value log path. (#32) kv: tiny clean up RUnlock and RLock might cost more then a memory allocation, let's keep things simple lock less scope avoid holding txnKeysMu for long time (#35) * avoid holding txnKeysMu for long time move lock out of LSM tree (#36) * move lock out of LSM tree. fix panic introduced by DBReader refactoring (#37) tiny fix Make code more clear Update mvcc.go move acquireLatches method from MVCCStore to regionCtx clear magic number (#41) move write batch out of latch scope (#42) move write batch construction out of latch scope (#43) Tiny refactor (#44) check `badger.ErrKeyNotFound` error returned by `txn.Get`. (#45) fix duplicate prewrite request caused panic. (#47) The lock store only supports insert and delete, if there is a same key, insert would fail and panic. When we meet a duplicate prewrite, we should not write the lock store. avoid `txn.Get` in `Commit` if there is no old version. (#46) unify names (#48) unify iterator style (#49) fix timestamp comparation in KvGet (#50) rename LockStore to MemStore (#51) typo (#52) fix bugs found in high conflicting concurrent transaction. (#53) 1. Fix `Cleanup` mistakenly write the batch to DB instead of lock store and use separate write batch type for Lock and DB to prevent error like this. 2. `Rollback` function should read all keys in lock store before read DB. 3. Fix delete key do actual delete instead of write tombstone. make conflict errors retryable (#54) fix unused import library (#55) fix WaitGroup (#56) After `WaitGroup.Wait` is called, another `Add` call will cause panic, so call `Add` in the Lock to prevent this. * call done after epoch not match split rollbackGC batch (#57) In TPCC benchmark, the number of rollback keys in one minutes are very large, up to hundreds of thousands. It increases latency a lot. Split the rollbackGC batch into small batches to avoid high latency. Make map with capacity (#58) preallocate slices (#59) make source more readable (#60) make it compile with latest tidb (#61) config: add max-procs (#63) support chunk execution (#64) * support chunk execution inline errors.Trace call explicitly (#65) unify Scan function to use closure and clean up code (#66) specify startKey and endKey in iterator options (#67) use Value instead of ValueCopy (#69) clear db reader (#70) optimize count(*) (#71) * optimize count(*) fix compile issue (#72) rename buf to cols (#73) fix build with latest tipb (#74) fix build for removing types.DatumRow (#75) implement closure executor (#76) * implement closure executor handle end key checking implicitly and remove the checking out of hot… (#77) support more executors in closure executor (#78) support the new row format (#79) clean up logs (#80) optimize and refactor encoding and add benchmark (#81) add old value to lock (#82) Avoid one DB.Get in Commit operation. continue handle request if necessary (#84) Rename and cleanup (#85) * use multiple badger DB * pre-split more regions and use 8 DBs * rename project name fix build error and use dep to manage dependencies (#86) * fix build error and use dep to manage dependencies add a flag to control shard-key feature (#88) update dep (#90) add topn closure executor (#89) * add topn closure executor update badger version (#93) use go mod (#94) fix memory leak in lock store (#95) move startTS and commitTS from mvccValue to UserMeta. (#96) fix crash when setting shardKey to false (#97) implement GCCompactionFilter (#98) Disable sharding by default (#99) fix delete panic (#100) upgrade mod (#101)
* add split check test Signed-off-by: Connor1996 <zbk602423539@gmail.com>
ExceedEndKey
kv/rowcodec
tokv/coprocessor/rowcodec
EncodeKey
andDecodeUserKey
tokv/util/codec
engine_util
tokv/util/engine_util