Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: Use mutex around cache #12580

Closed
wants to merge 3 commits into from
Closed

db: Use mutex around cache #12580

wants to merge 3 commits into from

Conversation

ryanslade
Copy link
Contributor

This ensures that when the cache has expired, only one goroutine will
refresh it while others will block until the it has refreshed.

This ensures that when the cache has expired, only one goroutine will
refresh it while others will block until the it has refreshed.
@ryanslade
Copy link
Contributor Author

@keegancsmith hah, I think using a mutex may actually be better 😄

@ryanslade ryanslade marked this pull request as ready for review July 31, 2020 08:21
@ryanslade ryanslade requested a review from slimsag as a code owner July 31, 2020 08:21
@ryanslade ryanslade requested a review from a team July 31, 2020 08:21
@keegancsmith
Copy link
Member

This ensures that when the cache has expired, only one goroutine will
refresh it while others will block until the it has refreshed.

Yeah I was aware of this drawbac, and mentioned it on our call. But a few points:

  1. You now have to worry about several ctx cancelling starving. Not too unlikely due to everything having to wait.
  2. You need to wait on the mutex in a ctx aware way.
  3. once a minute every concurrent request listing is not that bad. Still a lot better than before and its simple.

@ryanslade
Copy link
Contributor Author

Ok, fair enough.

Another option I considered was keeping atomic.Value and returning stale cache data and kicking off a background process to update the cache.. but that's even more complicated 😄

@ryanslade ryanslade closed this Jul 31, 2020
@ryanslade ryanslade deleted the cache-mutex branch July 31, 2020 12:34
@tsenart
Copy link
Contributor

tsenart commented Jul 31, 2020

Related to the work you both did here: I played around with the query we are using to load this data. It's a pretty optimal plan. It takes 300ms because it's moving 3.75GB of memory around (all rows were already cached in memory, no disk I/O), so we're actually limited by memory bus bandwidth (12.5 GB / s, extrapolated).

Two remarks:

  1. This is another datapoint that we should be using IDs everywhere rather than names. Loading only the 100k IDs from the default_repos table takes only 15ms (~5MB of data).

  2. Have we considered the additional memory requirements for the frontend? Caching this in memory takes a significant chunk of the available memory.

@keegancsmith
Copy link
Member

Related to the work you both did here: I played around with the query we are using to load this data. It's a pretty optimal plan. It takes 300ms because it's moving 3.75GB of memory around (all rows were already cached in memory, no disk I/O), so we're actually limited by memory bus bandwidth (12.5 GB / s, extrapolated).

Thanks for looking in, yeah I suspected it had to do with IO. Is it really 3.75GB of memory? The default_repos table has just over 100k repos. That is 37kb per repo, which way to high, I'd expect at most 1kb per repo (but much less in practice). Note: we are only fetching the ID and name.

  1. This is another datapoint that we should be using IDs everywhere rather than names. Loading only the 100k IDs from the default_repos table takes only 15ms (~5MB of data).

Agreed.

  1. Have we considered the additional memory requirements for the frontend? Caching this in memory takes a significant chunk of the available memory.

I think the estimate on the size above is likely an order of magnitude off, so it should be fine. The current implementation is a clear improvement in the following ways:

  1. We were fetching this anyways, so there is no real additional load this introduces.
  2. We only copy the slice, not the values. So each request only allocates 100k * sizeof(pointer) instead of sizeof(type Repo struct).

I was thinking this morning about how we can take this optimization and apply it for customers. The big reason this is so much faster is due to avoiding network IO, so the idea of implementing an imemory version of our List function may be very effective.

@tsenart
Copy link
Contributor

tsenart commented Jul 31, 2020

Is it really 3.75GB of memory?

Good call. I think this might be the data size for the full rows which Postgres needs to move around (since it's not a column store) before filtering down to just name and id which is what is returned to the frontend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants