New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store-tikv: drop invalid cached region #4506

Merged
merged 15 commits into from Sep 25, 2017

Conversation

Projects
None yet
4 participants
@atmzhou
Contributor

atmzhou commented Sep 12, 2017

Some cached region may be out-of-date for they are long not used.
To avoid use these regions, we allocate each cached region a TTL, which records when the region may be out-of-date.
When an out-of-dated region is identified, we remove the region out of the region cache.

Fix #2511
Fix #4498

@atmzhou atmzhou requested review from AndreMouche, tiancaiamao and disksing Sep 12, 2017

@coocood

This comment has been minimized.

Show comment
Hide comment
@coocood

coocood Sep 12, 2017

Member

@atmzhou
I think adding an accessTime uint64 field in Region is simpler.
Multiple threads may update the access time concurrently,
we can update it with atomic operations to avoid race.

Member

coocood commented Sep 12, 2017

@atmzhou
I think adding an accessTime uint64 field in Region is simpler.
Multiple threads may update the access time concurrently,
we can update it with atomic operations to avoid race.

@coocood

This comment has been minimized.

Show comment
Hide comment
@coocood

coocood Sep 12, 2017

Member

@atmzhou
Please update the title of the PR.

Member

coocood commented Sep 12, 2017

@atmzhou
Please update the title of the PR.

@atmzhou atmzhou changed the title from Ningnanzhou/dropinvalidregioncache to Ningnanzhou/DropInvalidCachedRegion Sep 12, 2017

@atmzhou atmzhou changed the title from Ningnanzhou/DropInvalidCachedRegion to store-tikv/DropInvalidCachedRegion Sep 12, 2017

@atmzhou atmzhou changed the title from store-tikv/DropInvalidCachedRegion to store-tikv: drop invalid cached region Sep 12, 2017

@atmzhou atmzhou closed this Sep 12, 2017

@atmzhou atmzhou reopened this Sep 12, 2017

@atmzhou

This comment has been minimized.

Show comment
Hide comment
@atmzhou

atmzhou Sep 12, 2017

Contributor

@coocood
Do you think it is necessary to replace the mutex by atomic operation?

Contributor

atmzhou commented Sep 12, 2017

@coocood
Do you think it is necessary to replace the mutex by atomic operation?

@atmzhou

This comment has been minimized.

Show comment
Hide comment
@atmzhou

atmzhou Sep 12, 2017

Contributor

@disksing @AndreMouche @tiancaiamao
Could you please review my code at convenience?
Thanks~

Contributor

atmzhou commented Sep 12, 2017

@disksing @AndreMouche @tiancaiamao
Could you please review my code at convenience?
Thanks~

@coocood

This comment has been minimized.

Show comment
Hide comment
@coocood

coocood Sep 12, 2017

Member

@atmzhou
We can't replace the mutex by atomic operation.
RegionCache uses RWLock which allows concurrent reading.
But reading updates the access time, which is a write operation, so we need atomic operation to avoid race.

Member

coocood commented Sep 12, 2017

@atmzhou
We can't replace the mutex by atomic operation.
RegionCache uses RWLock which allows concurrent reading.
But reading updates the access time, which is a write operation, so we need atomic operation to avoid race.

@atmzhou

This comment has been minimized.

Show comment
Hide comment
@atmzhou

atmzhou Sep 12, 2017

Contributor

@coocood
So, what is your conclusion? Should we use the RWLock or atomic operations?
By the way, I think that the data race is rare and a little data race does not matter here.

Contributor

atmzhou commented Sep 12, 2017

@coocood
So, what is your conclusion? Should we use the RWLock or atomic operations?
By the way, I think that the data race is rare and a little data race does not matter here.

@coocood

This comment has been minimized.

Show comment
Hide comment
@coocood

coocood Sep 12, 2017

Member

@atmzhou
We use both RWLock and atomic operation, RWLock for the RegionCache, atomic operation for the Region.
Data race is strictly not allowed in TiDB. The CI fails if any data race is detected.

Member

coocood commented Sep 12, 2017

@atmzhou
We use both RWLock and atomic operation, RWLock for the RegionCache, atomic operation for the Region.
Data race is strictly not allowed in TiDB. The CI fails if any data race is detected.

@atmzhou

This comment has been minimized.

Show comment
Hide comment
@atmzhou

atmzhou Sep 13, 2017

Contributor

@coocood
OK, I will eliminate any potential data race here.
By the way, in fact, this data race cannot be detected because it does not affect correctness. It just removes some region from the cache falsely or accesses an out-of-dated region, which will also be handled.

Contributor

atmzhou commented Sep 13, 2017

@coocood
OK, I will eliminate any potential data race here.
By the way, in fact, this data race cannot be detected because it does not affect correctness. It just removes some region from the cache falsely or accesses an out-of-dated region, which will also be handled.

@coocood

This comment has been minimized.

Show comment
Hide comment
@coocood

coocood Sep 13, 2017

Member

@atmzhou
I will be detected when two goroutines access the same variable concurrently and at least one of the accesses is a write.

See https://golang.org/doc/articles/race_detector.html

Member

coocood commented Sep 13, 2017

@atmzhou
I will be detected when two goroutines access the same variable concurrently and at least one of the accesses is a write.

See https://golang.org/doc/articles/race_detector.html

atmzhou added some commits Sep 18, 2017

Show outdated Hide outdated store/tikv/region_cache.go
Show outdated Hide outdated store/tikv/region_cache.go
Show outdated Hide outdated store/tikv/region_cache.go
// CachedRegion encapsulates {Region, TTL}
type CachedRegion struct {
region *Region

This comment has been minimized.

@tiancaiamao

tiancaiamao Sep 20, 2017

Contributor

How about region Region ?

@tiancaiamao

tiancaiamao Sep 20, 2017

Contributor

How about region Region ?

// CachedRegion encapsulates {Region, TTL}
type CachedRegion struct {
region *Region
lastAccess int64

This comment has been minimized.

@tiancaiamao

tiancaiamao Sep 20, 2017

Contributor

lastAccess time.Time ?

@tiancaiamao

tiancaiamao Sep 20, 2017

Contributor

lastAccess time.Time ?

This comment has been minimized.

@atmzhou

atmzhou Sep 20, 2017

Contributor

we use int64 to enable atomic operation

@atmzhou

atmzhou Sep 20, 2017

Contributor

we use int64 to enable atomic operation

Show outdated Hide outdated store/tikv/region_cache.go
@disksing

This comment has been minimized.

Show comment
Hide comment
@disksing

disksing Sep 21, 2017

Member

LGTM.

Member

disksing commented Sep 21, 2017

LGTM.

@@ -138,26 +168,24 @@ func (c *RegionCache) LocateKey(bo *Backoffer, key []byte) (*KeyLocation, error)
// LocateRegionByID searches for the region with ID
func (c *RegionCache) LocateRegionByID(bo *Backoffer, regionID uint64) (*KeyLocation, error) {
c.mu.RLock()
if r := c.getRegionByIDFromCache(regionID); r != nil {
r := c.getRegionByIDFromCache(regionID)

This comment has been minimized.

@coocood

coocood Sep 21, 2017

Member

@disksing Should we use GetCachedRegion here?
This method is only used by HTTP status server.

@coocood

coocood Sep 21, 2017

Member

@disksing Should we use GetCachedRegion here?
This method is only used by HTTP status server.

This comment has been minimized.

@disksing

disksing Sep 22, 2017

Member

They have different parameters. BTW, I think we need lock here. @atmzhou

@disksing

disksing Sep 22, 2017

Member

They have different parameters. BTW, I think we need lock here. @atmzhou

This comment has been minimized.

@atmzhou

atmzhou Sep 22, 2017

Contributor

@disksing I remember that you said this function is only used for test. So I did not add lock in getRegionByIDFromCache.
Do you think we should add lock? If yes, I will add.

@atmzhou

atmzhou Sep 22, 2017

Contributor

@disksing I remember that you said this function is only used for test. So I did not add lock in getRegionByIDFromCache.
Do you think we should add lock? If yes, I will add.

This comment has been minimized.

@disksing

disksing Sep 22, 2017

Member

Yes. We don't want debug API have data race too, especially it may affect online service.

@disksing

disksing Sep 22, 2017

Member

Yes. We don't want debug API have data race too, especially it may affect online service.

@disksing

This comment has been minimized.

Show comment
Hide comment
@disksing
Member

disksing commented Sep 25, 2017

PTAL @coocood

@coocood

LGTM

@atmzhou atmzhou merged commit b9c3bc4 into master Sep 25, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on master at 72.687%
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

@atmzhou atmzhou deleted the ningnanzhou/dropinvalidregioncache branch Sep 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment