db: Use mutex around cache #12580

ryanslade · 2020-07-31T08:18:32Z

This ensures that when the cache has expired, only one goroutine will
refresh it while others will block until the it has refreshed.

This ensures that when the cache has expired, only one goroutine will refresh it while others will block until the it has refreshed.

ryanslade · 2020-07-31T08:21:18Z

@keegancsmith hah, I think using a mutex may actually be better 😄

keegancsmith · 2020-07-31T11:44:49Z

This ensures that when the cache has expired, only one goroutine will
refresh it while others will block until the it has refreshed.

Yeah I was aware of this drawbac, and mentioned it on our call. But a few points:

You now have to worry about several ctx cancelling starving. Not too unlikely due to everything having to wait.
You need to wait on the mutex in a ctx aware way.
once a minute every concurrent request listing is not that bad. Still a lot better than before and its simple.

ryanslade · 2020-07-31T12:34:46Z

Ok, fair enough.

Another option I considered was keeping atomic.Value and returning stale cache data and kicking off a background process to update the cache.. but that's even more complicated 😄

tsenart · 2020-07-31T12:37:01Z

Related to the work you both did here: I played around with the query we are using to load this data. It's a pretty optimal plan. It takes 300ms because it's moving 3.75GB of memory around (all rows were already cached in memory, no disk I/O), so we're actually limited by memory bus bandwidth (12.5 GB / s, extrapolated).

Two remarks:

This is another datapoint that we should be using IDs everywhere rather than names. Loading only the 100k IDs from the default_repos table takes only 15ms (~5MB of data).
Have we considered the additional memory requirements for the frontend? Caching this in memory takes a significant chunk of the available memory.

keegancsmith · 2020-07-31T14:56:59Z

Related to the work you both did here: I played around with the query we are using to load this data. It's a pretty optimal plan. It takes 300ms because it's moving 3.75GB of memory around (all rows were already cached in memory, no disk I/O), so we're actually limited by memory bus bandwidth (12.5 GB / s, extrapolated).

Thanks for looking in, yeah I suspected it had to do with IO. Is it really 3.75GB of memory? The default_repos table has just over 100k repos. That is 37kb per repo, which way to high, I'd expect at most 1kb per repo (but much less in practice). Note: we are only fetching the ID and name.

This is another datapoint that we should be using IDs everywhere rather than names. Loading only the 100k IDs from the default_repos table takes only 15ms (~5MB of data).

Agreed.

Have we considered the additional memory requirements for the frontend? Caching this in memory takes a significant chunk of the available memory.

I think the estimate on the size above is likely an order of magnitude off, so it should be fine. The current implementation is a clear improvement in the following ways:

We were fetching this anyways, so there is no real additional load this introduces.
We only copy the slice, not the values. So each request only allocates 100k * sizeof(pointer) instead of sizeof(type Repo struct).

I was thinking this morning about how we can take this optimization and apply it for customers. The big reason this is so much faster is due to avoiding network IO, so the idea of implementing an imemory version of our List function may be very effective.

tsenart · 2020-07-31T15:22:33Z

Is it really 3.75GB of memory?

Good call. I think this might be the data size for the full rows which Postgres needs to move around (since it's not a column store) before filtering down to just name and id which is what is returned to the frontend.

ryanslade added 2 commits July 31, 2020 10:07

db: Use mutex around cache

951552a

This ensures that when the cache has expired, only one goroutine will refresh it while others will block until the it has refreshed.

Update comment

0ae8dcc

ryanslade marked this pull request as ready for review July 31, 2020 08:21

ryanslade requested a review from slimsag as a code owner July 31, 2020 08:21

ryanslade requested a review from a team July 31, 2020 08:21

Update comment again

5709b5a

ryanslade closed this Jul 31, 2020

ryanslade deleted the cache-mutex branch July 31, 2020 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: Use mutex around cache #12580

db: Use mutex around cache #12580

ryanslade commented Jul 31, 2020

ryanslade commented Jul 31, 2020

keegancsmith commented Jul 31, 2020

ryanslade commented Jul 31, 2020

tsenart commented Jul 31, 2020

keegancsmith commented Jul 31, 2020

tsenart commented Jul 31, 2020

db: Use mutex around cache #12580

db: Use mutex around cache #12580

Conversation

ryanslade commented Jul 31, 2020

ryanslade commented Jul 31, 2020

keegancsmith commented Jul 31, 2020

ryanslade commented Jul 31, 2020

tsenart commented Jul 31, 2020

keegancsmith commented Jul 31, 2020

tsenart commented Jul 31, 2020