Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing TestCache_MaxCacheSizeParallel/LruCache #28

Closed
paskal opened this issue Jun 4, 2020 · 2 comments · Fixed by #30
Closed

Failing TestCache_MaxCacheSizeParallel/LruCache #28

paskal opened this issue Jun 4, 2020 · 2 comments · Fixed by #30
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@paskal
Copy link
Collaborator

paskal commented Jun 4, 2020

With probability higher than 20% locally and than 50% on GitHub Actions, TestCache_MaxCacheSizeParallel/LruCache fails with race condition. It's seen as LruCache.currentSize wrong values \ concurrent changes despite the fact that all changes are done using atomic.AddInt64 and atomic.LoadInt64.

image

Example in GitHub Actions

If somebody would be able to fix code in a way it won't fail, it would be great.

@paskal paskal added bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers labels Jun 4, 2020
@chychkan
Copy link

chychkan commented Nov 22, 2020

Hey team!
The failures of the test are caused by the lack of synchronization when updating the cache, which is not necessarily a bad thing as this way the cache is more performant in multithreaded environment and still eventually consistent.

I mean the LruCache.Get() function and in particular this part:

c.backend.Add(key, data)

if s, ok := data.(Sizer); ok {
    atomic.AddInt64(&c.currentSize, int64(s.Size()))
    if c.maxCacheSize > 0 && atomic.LoadInt64(&c.currentSize) > c.maxCacheSize {
        for atomic.LoadInt64(&c.currentSize) > c.maxCacheSize {
            c.backend.RemoveOldest()
        }
    }
}

If the requirement is that none of the client goroutines should ever observe c.currentSize greater than c.maxCacheSize, then this entire code block must be a single atomic operation, which would be much slower for the clients. As it is not atomic, it is expected to have fluctuations of the currentSize above the maxCacheSize, especially in case of high number of parallel goroutines doing the updates.

Possible scenario could be:

  1. Goroutine #N adds value to the cache, cache is still below the maxCacheSize and the execution flow is passed to goroutine #N+1 just before goroutine #N reaches statement size := c.size() in the test (line #228).
  2. Goroutine #N+1 and multiple other goroutines add values to the cache and update the currentSize but the execution flow for each of them is passed to the next one before they reach the c.backend.RemoveOldest().
  3. The execution flow is passed to the goroutine #N and it gets c.currentSize that is way above the c.maxCacheSize and the test fails at line #229.

Eventually, when execution of all the goroutines is complete, the cache will converge to the correct state, but during the execution it can fluctuate pretty significantly.

I ran the tests locally and in all cases the test fails at line #229, which is inside of the goroutine. As the goroutines can observe the fluctuations, you probably should not assume any upper limit (the size < 200).

If you are ok with the fluctuations of the actual cache size, then I would suggest to remove the require.True(t, size < 200 ...) assertion in the goroutine here and keep only the one after the loop here (the assert.True(t, c.size() < 123 && c.size() >= 0)).

What do you think?

@paskal
Copy link
Collaborator Author

paskal commented Nov 29, 2020

@chychkan, thanks a lot for the investigation!

@umputun what do you think about removing problematic test line and acknowledging that the size limit for the cache has a loose guarantee? I don't see the reason to sacrifice the speed to enforce the limit and would vote for altering the text, as proposed by Oleksandr.

paskal added a commit that referenced this issue Dec 6, 2020
Fixes flaps in TestCache_MaxCacheSizeParallel, resolves #28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants