Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add split check test #52

Merged
merged 7 commits into from Feb 13, 2020
Merged

Add split check test #52

merged 7 commits into from Feb 13, 2020

Conversation

Connor1996
Copy link
Collaborator

  • simplify split check logic
  • add split check test
  • fix ExceedEndKey
  • move kv/rowcodec to kv/coprocessor/rowcodec
  • extract EncodeKey and DecodeUserKey to kv/util/codec
  • move engine_util to kv/util/engine_util

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996 Connor1996 requested a review from nrc February 9, 2020 08:51
@Connor1996
Copy link
Collaborator Author

PTAL @nrc

Copy link
Contributor

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reorganisation! There are a few comments inline, but they are mostly minor and most of the PR looks good.

I don't like to have util packages/modules - I think they are a code smell and that nearly always these packages can be better organised. In this case it seems that codec and engine_utils would be fine in kv. I think engine_utils could be somewhere better, but not sure where.

kv/tikv/raftstore/runner/runner_test.go Outdated Show resolved Hide resolved
kv/util/codec/codec.go Outdated Show resolved Hide resolved
kv/util/codec/codec.go Outdated Show resolved Hide resolved

encGroupSize = 8
encMarker = byte(0xFF)
encPad = byte(0x0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using these consts makes the code harder to read, especially since the comments often refer to the fixed numbers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use a const number with accurate meaning instead of a magic number

kv/util/codec/codec.go Outdated Show resolved Hide resolved
@nrc
Copy link
Contributor

nrc commented Feb 11, 2020

Thinking about it, I'm a bit surprised that the functions for encoding/decoding keys are used outside of storage/kvstore. I guess we use EncodeBytes in other places? Maybe we can leave the functions for operating on keys and timestamps in transaction.go, but refactor them to use codec/EncodeBytes? My reasoning is that how keys are encoded in the local storage should be an implementation detail of the storage layer.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996
Copy link
Collaborator Author

Connor1996 commented Feb 11, 2020

Thinking about it, I'm a bit surprised that the functions for encoding/decoding keys are used outside of storage/kvstore. I guess we use EncodeBytes in other places?

No, EncodeBytes is only used by transaction package. But region split should make a clean cut for mvcc key that the different versions of same user key should locate in a same region. That's why split check use DecodeBytes to check if the key is encoded.

@Connor1996
Copy link
Collaborator Author

Maybe we can leave the functions for operating on keys and timestamps in transaction.go, but refactor them to use codec/EncodeBytes? My reasoning is that how keys are encoded in the local storage should be an implementation detail of the storage layer.

Okay. But the split check test which would be deleted in the final (because unrelated with the lessons) needs EncodeKey. I will duplicate EncodeKey in the test now.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996
Copy link
Collaborator Author

I don't like to have util packages/modules - I think they are a code smell and that nearly always these packages can be better organised. In this case it seems that codec and engine_utils would be fine in kv. I think engine_utils could be somewhere better, but not sure where.

let's keep it now. The next step is combining the kv/config and kv/tikv/config, and the level of directory tikv can be removed. We'll find a better place then.

@Connor1996
Copy link
Collaborator Author

PTAL again @nrc

@nrc
Copy link
Contributor

nrc commented Feb 13, 2020

No, EncodeBytes is only used by transaction package. But region split should make a clean cut for mvcc key that the different versions of same user key should locate in a same region. That's why split check use DecodeBytes to check if the key is encoded.

OK, thanks for the explanation. Longer term I think we should use some kind of reader to check this rather than use DecodeBytes, but that doesn't need to block this PR

Copy link
Contributor

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, looks good. I have two more minor suggestions, but you can land the PR with or without them.

binary.BigEndian.PutUint64(newKey[newLen-8:], ^ts)
encodedKey := codec.EncodeBytes(key)
newKey := append(encodedKey, make([]byte, 8)...)
binary.BigEndian.PutUint64(newKey[len(newKey)-8:], ^ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use len(encodedKey) instead of len(newKey)-8, I think it's clearer.


// DecodeBytes decodes bytes which is encoded by EncodeBytes before,
// returns the leftover bytes and decoded value if no error.
func DecodeBytes(b []byte) ([]byte, []byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we panic here instead of returning the error? I think we do not need proper error handling for TinyKV.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already panic in DecodeKey, but we need to know whether the key is encoded or raw by checking the error != nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean it would be nicer to panic in DecodeBytes than DecodeKey. We can return nil, nil if there is an error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pingcap-incubator/tinykv/pull/52/files#diff-ee22b14637a12c92c212438b82fb4600R46
See, if we panic in DecodeBytes, splitting on a raw key will cause panic cause it doesn't follow the format!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit sad that we cannot always know whether a key is raw or encoded. I suppose the problem is that the client can use the raw API to write raw keys to the DB? But this makes me wonder that one could write a raw key which can be validly decoded, but isn't in fact an encoded key and then things would break. Maybe we should not offer the raw API in TinyKV at all? Does TinySQL use it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a raw key which can be validly decoded, decode and then encode on the key doesn't make any difference, so it won't break the correctness for split checker.

Also, TinyKV use raw API for LAB1-3 which is to build a kvserver without transaction support

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
@Connor1996 Connor1996 merged commit 9dc002c into master Feb 13, 2020
Connor1996 pushed a commit that referenced this pull request Apr 30, 2020
support basic functionality

Prewrite/Commit/Rollback/Get/Scan

Fill more implementations

support coprocessor request.

It's runnable now.
Most of the code is copied from mocktikv.

fix build

support concurrent writer

support more commands

fix bugs

support multiple regions and auto split.

only resolve lock in region range

remove tableScan and indexScan limit

Avoid creating iterator too many times.

support near seek

optimize codec

fix check prewrite conflict

Use ValidForPrefix avoids conflict.
And add message 'try again later' will make the error retryable in TiDB.

fix delete range assert

fix lock bug and support GC

1. lock key should be raw key rather than mvcc key, or resolve lock may send to another region.

2. support GC.

fix scan lock is ignored.

Should check pair.Err before check pair.Key.

fix rollback command and update options

change storage model

After this change, lock and latest version is stored together with the same key, old version has a different prefix.
During scan, we don't need to traverse every old version.

GC is not implement yet.

fix and optimize resolve lock

fix prewrite keys

support GC

fix GC bug

fix panic

rename package name to fork

fix build

The NewChunk has been removed in TiDB.

serialize write operation

always use file-io for value log

use a wake up channel to save CPU usage.

fix reverse scan and limit scan count.

rename lock/unlock.

Delete instead of SetEntry if UserMeta is maixedDelFlag

fix gc panic

do not trace already committed error

fix panic

reduce prewrite conflict possibility and fix resolve lock panic

extract acquire lock function and log latency

refactor for efficient Resolve request.

address comment

user defer unlock

write requests in batch to avoid exceed max transaction size.

And provide an option to set the max transaction size.

fix lock bug and implement efficient ScanLock

4k entries in a batch

makes prewrite idempotent

fix table scan and index scan bugs.

makes commit request idempotent

fix reverse scan

use getter to avoid creating iterator in BatchGet

close the DB when received signal.

If we set sync write to false, the data may be corrupted if we do not close DB.

fix GC bug (#25)

* fix GC bug

fix the lock unresolvable bug. (#26)

* fix lock unresolvable

remove key encoding (#27)

* remove mv keys, always use raw key

fix split bug after removing key encoding (#29)

This bug will cause a single region as big as the whole database.
Should use region.startKey instead of region.meta.StartKey in split check.

return ErrLocked instead of ErrRetryable in Prewrite (#30)

introduce lockstore (#31)

* introduce lockstore

A in-memory lock-free data-structure to store lock.
- allows single writer and multiple readers without lock.
- memory can be reused after all the entries in a block are deleted.

* manual inline getNext

add an command line flog to set value log path. (#32)

kv: tiny clean up

RUnlock and RLock might cost more then a memory allocation, let's keep
things simple

lock less scope

avoid holding txnKeysMu for long time (#35)

* avoid holding txnKeysMu for long time

move lock out of LSM tree (#36)

* move lock out of LSM tree.

fix panic introduced by DBReader refactoring (#37)

tiny fix

Make code more clear

Update mvcc.go

move acquireLatches method from MVCCStore to regionCtx

clear magic number (#41)

move write batch out of latch scope (#42)

move write batch construction out of latch scope (#43)

Tiny refactor (#44)

check `badger.ErrKeyNotFound` error returned by `txn.Get`. (#45)

fix duplicate prewrite request caused panic. (#47)

The lock store only supports insert and delete, if there is a same key, insert would fail and panic.
When we meet a duplicate prewrite, we should not write the lock store.

avoid `txn.Get` in `Commit` if there is no old version. (#46)

unify names (#48)

unify iterator style (#49)

fix timestamp comparation in KvGet (#50)

rename LockStore to MemStore (#51)

typo (#52)

fix bugs found in high conflicting concurrent transaction. (#53)

1. Fix `Cleanup` mistakenly write the batch to DB instead of lock store and use separate write batch type for Lock and DB to prevent error like this.

2. `Rollback` function should read all keys in lock store before read DB.

3. Fix delete key do actual delete instead of write tombstone.

make conflict errors retryable (#54)

fix unused import library (#55)

fix WaitGroup (#56)

After `WaitGroup.Wait` is called, another `Add` call will cause panic, so call `Add` in the Lock to prevent this.

* call done after epoch not match

split rollbackGC batch (#57)

In TPCC benchmark, the number of rollback keys in one minutes are very large, up to hundreds of thousands.
It increases latency a lot.
Split the rollbackGC batch into small batches to avoid high latency.

Make map with capacity (#58)

preallocate slices (#59)

make source more readable (#60)

make it compile with latest tidb (#61)

config: add max-procs (#63)

support chunk execution (#64)

* support chunk execution

inline errors.Trace call explicitly (#65)

unify Scan function to use closure and clean up code (#66)

specify startKey and endKey in iterator options (#67)

use Value instead of ValueCopy (#69)

clear db reader (#70)

optimize count(*)  (#71)

* optimize count(*)

fix compile issue (#72)

rename buf to cols (#73)

fix build with latest tipb (#74)

fix build for removing types.DatumRow (#75)

implement closure executor (#76)

* implement closure executor

handle end key checking implicitly and remove the checking out of hot… (#77)

support more executors in closure executor (#78)

support the new row format (#79)

clean up logs (#80)

optimize and refactor encoding and add benchmark (#81)

add old value to lock (#82)

Avoid one DB.Get in Commit operation.

continue handle request if necessary (#84)

Rename and cleanup (#85)

* use multiple badger DB

* pre-split more regions and use 8 DBs

* rename project name

fix build error and use dep to manage dependencies (#86)

* fix build error and use dep to manage dependencies

add a flag to control shard-key feature (#88)

update dep (#90)

add topn closure executor (#89)

* add topn closure executor

update badger version (#93)

use go mod (#94)

fix memory leak in lock store (#95)

move startTS and commitTS from mvccValue to UserMeta. (#96)

fix crash when setting shardKey to false (#97)

implement GCCompactionFilter (#98)

Disable sharding by default (#99)

fix delete panic (#100)

upgrade mod (#101)
Connor1996 added a commit that referenced this pull request Apr 30, 2020
* add split check test

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants