Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slot no more: overhauled internal algorithm #296

Merged
merged 16 commits into from Feb 7, 2022

Conversation

nikomatsakis
Copy link
Member

This is the overhauled implementation that avoids slots, is more parallel friendly, and paves the way to fixed point and more expressive cycle handling.

We just spent 90 minutes going over it. Some rough notes are available here, and a video will be posted soon.

You may find the flowgraph useful.

@netlify
Copy link

netlify bot commented Jan 28, 2022

❌ Deploy Preview for salsa-rs failed.

🔨 Explore the source changes: d4a6b24

🔍 Inspect the deploy log: https://app.netlify.com/sites/salsa-rs/deploys/61f428116eb8450008d21f49

@vlthr
Copy link

vlthr commented Jan 29, 2022

I finally got around to running our benchmarks. I ran our integration test benchmarks on salsa 0.16.0 (our current version in production), the master branch, plus the slot-no-more branch for this PR. Each integration test runs a fresh (non-incremental) analysis of some contract text, either completely single-threaded or "preheated", where we spin up a bunch of threads to query common data.

Here are the initial results:

  • going from 0.16.0 to master was almost equal across the board, with only one document being 10% slower, so no meaningful change
  • going from master to slot-no-more was roughly equal in the single-threaded case (possibly <5% average slowdown, but no significant pattern)
  • in the preheated tests however, slot-no-more caused regressions of between 7% and 64%. After having a look at the flamegraph, the culprit is a single query spending a lot of time contending the SyncMap via QueryStorageOps::fetch -> SyncMap::claim -> DashMap::entry -> dashmap::lock::compare_exchange

In our case, I suspect part of the problem is that the majority of the preheating is routed through the same query (which looks up and executes dynamically registered entity finders by ID). Could this possibly be fixed by dynamically generating separate queries for each entity finder?

@nikomatsakis
Copy link
Member Author

@vlthr interesting. I don't think you should have to refactor your tests -- it seems surprising to see the problem being contention for the sync-map. Do you any way to measure the hit rate overall?

Are your tests available on github?

@vlthr
Copy link

vlthr commented Jan 31, 2022

@nikomatsakis the code is unfortunately not publicly accessible, but if it helps I can DM more info and/or profiling/benchmark results.

I wasn't able to measure the hit rate, but I had another look at the preheating code to figure out why it's causing so much contention. The way it was implemented was pretty naive: we have ~500 dynamically registered entity labellers accessible via a common query endpoint db.labels_from(labeller_id), but only partial static knowledge of their dependencies (i.e. certain labeller implementations may call salsa queries which aggregate some other group of labellers). Since the preheater only has access to a partial dependency graph it was naively spinning up enough threads to run an entire cluster of possibly dependent queries each on their own thread and relying on the dependent ones being parked by salsa.

I'm guessing that having 50+ (or possibly more) threads hammering the same query might be close to the worst case for the compute_value loop. After adding some heuristics to the preheater I was able to slim down the number of threads, and that seems to have fixed it. With the new preheater all of the benchmarks time deltas between master and slot-no-more are within measurements noise, and there's no contention visible on the flamegraph anywhere.

@nikomatsakis
Copy link
Member Author

@vlthr ok, so let me see here. The idea is that you have a lot of threads that are all trying to do the same query at once. Under this system, it probably does require an extra lock compared to the old system. You have to:

  • Check for a completed result (fetch-hot) -- finds nothing
  • Acquire sync-map -- finds it is already in use
  • Acquire a lock on the runtime to acquire condvar

I believe the first lock is not used in the old system. Hard to imagine that's such a source of contention, but then, it might be! (Makes me want to experiment with alternatives to dashmap.)

Still, seems like we should consider landing this.

@nikomatsakis
Copy link
Member Author

bors r+

@bors
Copy link
Contributor

bors bot commented Feb 7, 2022

Build succeeded:

@bors bors bot merged commit 0f9971a into salsa-rs:master Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants