Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance test occasionally fails with fatal error: concurrent map writes #3670

Closed
markmandel opened this issue Feb 22, 2024 · 4 comments · Fixed by #3678
Closed

Performance test occasionally fails with fatal error: concurrent map writes #3670

markmandel opened this issue Feb 22, 2024 · 4 comments · Fixed by #3678
Labels
area/performance Anything to do with Agones being slow, or making it go faster. kind/bug These are bugs.

Comments

@markmandel
Copy link
Member

What happened:

Short version:

fatal error: concurrent map writes

goroutine 144634 [running]:
main.main.func1.1()
	/go/src/agones.dev/agones/test/load/allocation/runscenario/runscenario.go:126 +0xcf
created by main.main.func1 in goroutine 11
	/go/src/agones.dev/agones/test/load/allocation/runscenario/runscenario.go:123 +0x66d

goroutine 1 [semacquire, 48 minutes]:
sync.runtime_Semacquire(0xc000012db0?)
	/usr/local/go/src/runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0xc0001d8d50?)
	/usr/local/go/src/sync/waitgroup.go:116 +0x48
main.main()
	/go/src/agones.dev/agones/test/load/allocation/runscenario/runscenario.go:137 +0x6f2

goroutine 6 [sleep]:
time.Sleep(0x3b9aca00)
	/usr/local/go/src/runtime/time.go:195 +0x125
main.main.func1(0x0)
	/go/src/agones.dev/agones/test/load/allocation/runscenario/runscenario.go:131 +0x549
created by main.main in goroutine 1
	/go/src/agones.dev/agones/test/load/allocation/runscenario/runscenario.go:109 +0x10e9

See attached log for full details
log-275a3e64-934b-4512-80a4-28ff3ad733c8.txt

What you expected to happen:

The performance test should always pass

How to reproduce it (as minimally and precisely as possible):

Check the logs:
https://console.cloud.google.com/cloud-build/builds;region=global/275a3e64-934b-4512-80a4-28ff3ad733c8;step=6?e=13803378&mods=logs_tg_prod&project=agones-images

Anything else we need to know?:

Environment:

  • Agones version: dev
  • Kubernetes version (use kubectl version): whatever the perf cluster is.
  • Cloud provider or hardware configuration: GKE
  • Install method (yaml/helm):
  • Troubleshooting guide log(s):
  • Others:
@markmandel markmandel added kind/bug These are bugs. area/performance Anything to do with Agones being slow, or making it go faster. labels Feb 22, 2024
@ashutosji
Copy link
Contributor

@roberthbailey
Copy link
Member

The error is coming from https://github.com/googleforgames/agones/blob/main/test/load/allocation/runscenario/runscenario.go#L126

Looking at the code, I'm curious how it ever works, since we have a gofunc per client and they are all writing to the same shared map.

@markmandel
Copy link
Member Author

Check the logs:
https://console.cloud.google.com/cloud-build/builds;region=global/275a3e64-934b-4512-80a4-28ff3ad733c8;step=6?e=13803378&mods=logs_tg_prod&project=agones-images

Error: Getting Permission denied for all resources.

Sorry about that, probably not public, here's a copy
log-275a3e64-934b-4512-80a4-28ff3ad733c8.txt

Looking at the code, I'm curious how it ever works, since we have a gofunc per client and they are all writing to the same shared map.

I guess it doesn't manage to synchronously mutate the map too often.

Some quick ideas, not sure which ones are best:

  • Some mutexes in the right places
  • There are a few sync Maps that are thread safe out there, could go pick one (one with generics support?)
  • Move stuff to channels?

@ashutosji
Copy link
Contributor

I guess it doesn't manage to synchronously mutate the map too often.

Some quick ideas, not sure which ones are best:

  • Some mutexes in the right places
  • There are a few sync Maps that are thread safe out there, could go pick one (one with generics support?)
  • Move stuff to channels?

I think placing mutex in the right places should work. I have created a small example for this use case: https://goplay.tools/snippet/N84X6jIZmIq

We can utilize the syncmap but syncmap is slow and has poor performance. Personally, I haven't done any benchmark testing but this is what i got from the internet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Agones being slow, or making it go faster. kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants