Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃敟 Feature: Add max size to cache #1892

Merged
merged 5 commits into from May 10, 2022

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented May 3, 2022

This PR is a proposed solution to #1136 and allows limiting the cache size.

As stated in the issue, the cache should expire old entries to make room for new entries. To find old entries, it has to keep track of their expirations in a separate data structure.

Finding a solution

In general, the data structure has to support three main operations:

  • adding entries
  • removing arbitrary entries (in case of self-expiration)
  • removing the oldest entry (with lowest expiration point)

At first, it looks like a binary search tree would be the perfect fit. However GO's standart library doesn't offer much for implementing them and I didn't want to add new dependencies/piles of code. An easier and equally performant solution exists: lets use a binary heap and track node movements for arbitrary removals. This is what indexedHeap is for.

Benchmarks

Max size does not affect read performance in any way

Benchmark_Cache_MaxSize is a benchmark for three cases:

  • No max size specified (= no performance penalty)
  • A very large max size (= only insertions)
  • A small max size (= constant removal & insertion)

My results are as follows:

Benchmark_Cache_MaxSize/Disabled-16              1626 ns/op             349 B/op          7 allocs/op
Benchmark_Cache_MaxSize/Disabled-16              1588 ns/op             392 B/op          7 allocs/op
Benchmark_Cache_MaxSize/Unlim-16                 1812 ns/op             665 B/op          8 allocs/op
Benchmark_Cache_MaxSize/Unlim-16                 1678 ns/op             629 B/op          8 allocs/op
Benchmark_Cache_MaxSize/LowBounded-16            1178 ns/op             256 B/op          8 allocs/op
Benchmark_Cache_MaxSize/LowBounded-16            1190 ns/op             256 B/op          8 allocs/op

We're up one allocation from first to second case and about 10% slower per insertion. The third case is the fastest because our cache always stays small.

This is a benchmark where the handler takes zero time to execute. 10% might sound like much, but the difference is less than one microsecond in my case (thats actually suspiciously small 馃). I doubt any handler call that has to be cached would care about such a time period.

Measuring real memory consumption

Instead of counting entries, we could count their body size to measure real memory consumption.

Other possible soltuions

  • We could ignore expiration all together and remove just the oldest entry by time of insertion. This would be the easiest and most performant solution. I'm not sure whether this would be correct behaviour (what if we pre-cache a bunch of big pages on startup - should we drop them all?)
  • Implement some garbage collection goroutine. This sounds more complex and is full of bad worst cases

@welcome
Copy link

welcome bot commented May 3, 2022

Thanks for opening this pull request! 馃帀 Please check out our contributing guidelines. If you need help or want to chat with us, join us on Discord https://gofiber.io/discord

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 4, 2022

Thanks for this feature

will review it this weekend and if everything is ok will merge it

middleware/cache/config.go Outdated Show resolved Hide resolved
@dranikpg
Copy link
Contributor Author

dranikpg commented May 4, 2022

I've just tracked down that one allocation 馃コ
It comes from promoting a heapEntry to an interface{} when pushing onto the heap.

Pushing manually (append + heap.Fix) avoids it and performance is now almost on par. Looks kind of tricky though...

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 4, 2022

I just remembered that we should use mutex, sync map or atomic due to the concurrency processes to prevent race conditions in the access

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 4, 2022

  • Pls update the documentation for the middleware in our docs repository
  • update the config documentation in our README.md for thr middleware

@dranikpg
Copy link
Contributor Author

dranikpg commented May 4, 2022

I just remembered that we should use mutex, sync map or atomic due to the concurrency processes to prevent race conditions in the access

Sync is already in place for storage writes, heap modifications happen around it. There shouldn't be any issues

@dranikpg
Copy link
Contributor Author

dranikpg commented May 4, 2022

  • Pls update the documentation for the middleware in our docs repository
  • update the config documentation in our README.md for the middleware

Before touching docs: are we sure we'll count entries and not bytes?

The problem with bytes is that this is only an estimation. We should somehow account for internal values. For example, If the user stores entries in an external storage or the values are just very small, then the internal structures will blow up in size. That value would be an "upper bound on everything" no matter how and where it is stored.

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 4, 2022

I just remembered that we should use mutex, sync map or atomic due to the concurrency processes to prevent race conditions in the access

Sync is already in place for storage writes, heap modifications happen around it. There shouldn't be any issues

Perfect

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 4, 2022

  • Pls update the documentation for the middleware in our docs repository
  • update the config documentation in our README.md for the middleware

Before touching docs: are we sure we'll count entries and not bytes?

Will look tomorrow again more closely and answer

@ReneWerner87
Copy link
Member

ReneWerner87 commented May 5, 2022

@dranikpg
Ok then we should name the configuration flag differently

e.g. maxEntries

@efectn @hi019 what do you think about the count of bytes vs count of entries

@efectn
Copy link
Member

efectn commented May 5, 2022

@dranikpg Ok then we should name the configuration flag differently

e.g. maxEntries

@efectn @hi019 what do you think about the count of bytes vs count of entries

I think counting bytes is better option

type heapEntry struct {
key string
exp uint64
bytes uint
Copy link
Contributor Author

@dranikpg dranikpg May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry size is stored in the heap to update the total size without reading entries on delete

efectn
efectn approved these changes May 9, 2022
@ReneWerner87
Copy link
Member

ReneWerner87 commented May 10, 2022

@dranikpg thx
#1892 (comment)
don麓t forget the doc repository

@ReneWerner87 ReneWerner87 merged commit aa22928 into gofiber:master May 10, 2022
16 checks passed
@welcome
Copy link

welcome bot commented May 10, 2022

Congrats on merging your first pull request! 馃帀 We here at Fiber are proud of you! If you need help or want to chat with us, join us on Discord https://gofiber.io/discord

@ReneWerner87 ReneWerner87 linked an issue May 10, 2022 that may be closed by this pull request
dranikpg added a commit to dranikpg/docs that referenced this issue May 10, 2022
Add new MaxBytes param to cache docs from gofiber/fiber#1892
trim21 pushed a commit to trim21/fiber that referenced this issue Aug 15, 2022
* Cache middleware size limit

* Replace MaxInt with MaxInt32. Add comments to benchmark

* Avoid allocation in heap push. Small fixes

* Count body sizes instead of entries

* Update cache/readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants