Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: rewrite sequences of mutex calls #41253

Open
CAFxX opened this issue Sep 7, 2020 · 4 comments
Open

cmd/compile: rewrite sequences of mutex calls #41253

CAFxX opened this issue Sep 7, 2020 · 4 comments

Comments

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Sep 7, 2020

Just a random idea that may be worth considering. It should be safe (at least on amd64) for the compiler to rewrite sequences of

// var m sync.Mutex
m.Lock()
// no code between the two
m.Unlock()

to

if atomic.LoadUint32(&m.state) != mutexUnlocked {
	m.Lock()
	m.Unlock()
}

and, similarly,

// var m sync.Mutex
m.Unlock()
// no code between the two
m.Lock()

to

if atomic.LoadUint32(&m.state) != mutexLocked {
	m.Unlock()
	m.Lock()
}

Especially the latter is used in low-priority long-running operations holding a mutex to yield to higher-priority operations, like:

m.Lock()
for ... {
	// ...
	m.Unlock() // yield
	m.Lock()
}
m.Unlock()

(that ideally should instead be written as follows, but that would require a much smarter compiler)

for ... {
	m.Lock()
	// ...
	m.Unlock()
}

The benefit being that, in the no-contention case, we would avoid two RMW atomic ops and replace it with a single atomic load from a location that is almost certainly already in cache, plus a predictable jump (in the contention case the extra load+jump would be overshadowed by the rest of the mutex machinery, so it shouldn't matter anyway). I haven't benchmarked this yet: I can do it if we're interested in adding this.

A (imperfect) search on the the GH dataset on BQ shows the following number of occurrences:

  • Lock/Unlock: 365
  • Unlock/Lock: 80
  • RLock/RUnlock: 133
  • RUnlock/RLock: 16
query
DECLARE first, second STRING;
SET (first, second) = ("Lock", "Unlock");

WITH matching_files AS (
  SELECT ANY_VALUE(files.repo_name) AS repo_name, ANY_VALUE(files.path) AS path, files.id
  FROM `bigquery-public-data.github_repos.files` AS files
  WHERE ENDS_WITH(files.path, ".go")
  AND NOT ENDS_WITH(files.path, "_test.go")
  AND NOT REGEXP_CONTAINS(files.path, r"(^|/)(vendor|Godeps|deps|_third_party)/")
  GROUP BY files.id
),
matching_contents AS (
  SELECT matching_files.repo_name, matching_files.path, contents.content
  FROM matching_files
  LEFT JOIN `bigquery-public-data.github_repos.contents` AS contents ON matching_files.id = contents.id
  WHERE REGEXP_CONTAINS( contents.content, CONCAT(r"[\r\n]\s*\S+[.]", first, r"[(][)](\s*[\r\n]{1,2})+\s*\S+[.]", second, r"[(][)]\s*[\r\n]") )
),
matches AS (
  SELECT repo_name, path, content, REGEXP_EXTRACT_ALL( content, CONCAT(r"[\r\n]\s*(\S+[.]", first, r"[(][)](?:\s*[\r\n]{1,2})+\s*\S+[.]", second, r"[(][)])\s*[\r\n]") ) AS match
  FROM matching_contents
),
flatmatches AS (
  SELECT repo_name, path, content, flatmatches AS match FROM matches, matches.match AS flatmatches
),
cleanmatches AS (
  SELECT repo_name, path, content, SPLIT(REGEXP_REPLACE(match, r"\s+", "|"), "|") AS match FROM flatmatches
),
rootmatches AS (
  SELECT repo_name, path, content, match[OFFSET(0)] AS match0, match[OFFSET(1)] AS match1 , REGEXP_REPLACE(match[OFFSET(0)], CONCAT(r"[.]", first, r"[(][)]"), "") AS root0, REGEXP_REPLACE(match[OFFSET(1)], CONCAT(r"[.]", second, r"[(][)]"), "") AS root1 FROM cleanmatches
)

SELECT repo_name, path, match0, match1, content FROM rootmatches WHERE root0 = root1
@CAFxX CAFxX changed the title cmd/compile: rewrite sequence of mutex calls cmd/compile: rewrite sequences of mutex calls Sep 7, 2020
@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 7, 2020

m.Lock()
// no code between the two
m.Unlock()

Does that ever happen? Is it worth optimizing?

// var m sync.Mutex
m.Unlock()
// no code between the two
m.Lock()

if atomic.LoadUint32(&m.state) != mutexLocked {
	m.Unlock()
	m.Lock()
}

That's a neat trick to detect starvation. Not sure it is worth baking into the compiler, though. Maybe we can just detect m.Unlock/m.Lock pairs and rewrite to the unexported m.unlockThenLock, and put the starvation detection in that function.

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Sep 7, 2020

m.Lock()
// no code between the two
m.Unlock()

Does that ever happen?

Surprisingly, as I would have expected Unlock/Lock instead, Lock/Unlock is the pair that happens most frequently in the wild (at least according to the BQ GitHub dataset; see first post for the query used):

  • Lock/Unlock: 365
  • Unlock/Lock: 80
  • RLock/RUnlock: 133
  • RUnlock/RLock: 16

Is it worth optimizing?

Will try some experiments to see how much things improve. It is true that all of these are pretty niche use cases though. Before running those BQ queries my main argument here was that if we add the Unlock/Lock machinery it would be trivial (and less surprising?) to add also the Lock/Unlock one.

Not sure it is worth baking into the compiler, though. Maybe we can just detect m.Unlock/m.Lock pairs and rewrite to the unexported m.unlockThenLock, and put the starvation detection in that function.

I think it's ok to proceed incrementally, even though we'd be leaving some performance wins on the plate. Hopefully for things like these the potential inlining benefit, once the register-based calling convention lands, should be pretty low.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 7, 2020

at least according to the BQ GitHub dataset; see first post for the query used

Hm, I checked the first three results from that query.

The first is just testing performance of Lock/Unlock itself.

The second says this:

	// Check for global ratelimits
	r.global.Lock()
	r.global.Unlock()

Not sure at all what that means.

The third is just an assert that locks aren't already held:

	if false {
		// For debugging, this function must not be executed with locks held.
		corpusMu.Lock()
		corpusMu.Unlock()

Not sure any of these are worth optimizing.

@dgryski
Copy link
Contributor

@dgryski dgryski commented Sep 8, 2020

I see this pattern generally (correctly or not) as a way to indicate that a number of concurrent goroutines have finished processing their work.

For example, https://github.com/twmb/kafka-go/blob/master/pkg/kgo/broker.go#L145

(In this particular case this is an RWMutex, so I the goroutines with their RLocks finish, then the writer's Lock() is allowed to happen. Maybe there's no good reason if it's a regular Mutex and not a RWMutex?)

staticcheck detects and complains about empty critical sections.

@andybons andybons added this to the Unplanned milestone Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.