Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues caused by expensive exceptions thrown in RunConnectivityWatcherAsync #13227

Closed
soiferj opened this issue Nov 1, 2017 · 7 comments
Assignees
Labels

Comments

@soiferj
Copy link

soiferj commented Nov 1, 2017

Please answer these questions before submitting your issue.

Should this be an issue in the gRPC issue tracker?

Create new issues for bugs and feature requests. An issue needs to be actionable. General gRPC discussions and usage questions belong to:

Please don't double post your questions in more locations; we are monitoring both channels, and the time spent de-duplicating questions is better spent answering more user questions.

What version of gRPC and what language are you using?

Version: 1.4.1
Language: C#

What operating system (Linux, Windows, …) and version?

Windows Server 2016

What runtime / compiler are you using (e.g. python version or version of gcc)

.NET 4.5

Issue

I recently switched to Grpc for routing requests among many machines. Immediately after the switch, I noticed a significant performance degradation. After investigating, I noticed that there was an enormous spike in garbage collection activity in .NET. This seemed to be causing the slowdown. Seeing as the only change in the system was the use of Grpc, I investigated further.

This led to the discovery that there are lots of exceptions being thrown in RunConnectivityWatcherAsync in Channel.cs. On line 312 of this file, RunConnectivityWatcherAsync loops forever, waiting for a state change in the connection. It throws an exception if there is a timeout. This means that even if state is not changed, there will still be an exception thrown. In my case, each machine has persistent connections to many other machines. Since their states are not changing, each machine throws many exceptions per second. Exceptions are expensive, and are leading to the increase in garbage collection.

Are connection states expected to change frequently? If so, then throwing an exception makes sense. Otherwise, this seems like an improper use of an exception that's leading to a large performance penalty. Any guidance here would be greatly appreciated!

Thanks a lot!
Jon

@soiferj soiferj changed the title Performance issues cause by expensive exceptions thrown in RunConnectivityWatcherAsync Performance issues caused by expensive exceptions thrown in RunConnectivityWatcherAsync Nov 1, 2017
@soiferj
Copy link
Author

soiferj commented Nov 7, 2017

Has anyone else experienced this issue?

@jtattermusch jtattermusch self-assigned this Nov 7, 2017
@jtattermusch
Copy link
Contributor

The line you are mentioning should throw 1 exception per second per channel.

await WaitForStateChangedAsync(lastState, DateTime.UtcNow.AddSeconds(1)).ConfigureAwait(false);

That doesn't really sound something that should result in a significant performance penalty.

Btw, the RunConnectivityWatcherAsync() method is mean as a temporary workaround for googleapis/google-cloud-dotnet#822 and it should eventually go away.

@soiferj
Copy link
Author

soiferj commented Nov 7, 2017

In our case, this means ~1000 exceptions / second per server. In addition to everything else going on, it seems like that can cause a pretty significant hit.

Is there an ETA for removing RunConnectivityWatcherAsync?

@jtattermusch
Copy link
Contributor

Currently there's no clear ETA for removing RunConnectivityWatcherAsync. While I agree this is less than ideal, I still don't think 1000exceptions/s that need to be garbage collected represent a significant overhead.
What I think would help: can you try disabling RunConnectivityWatcherAsync for channels and see if you see significant perf. improvements in your setup? (I suspect you won't).

I currently know about several places in gRPC C# where we could save on allocations in a more significant way than this issue would (a good example would be #13236) - so unless we establish a clear link between RunConnectivytWatcherAsync and the performance penalty you are seeing, I think it makes more sense to focus our energy there.

@jtattermusch
Copy link
Contributor

Fixed by #13957

@manigandham
Copy link

manigandham commented Jan 29, 2018

Just to confirm, we're seeing this too. It's both the number of exceptions and the fact that they cause an entire stack trace to be dumped which is expensive. If it was just garbage collection then it wouldn't really be an issue, especially with empty catch blocks.

One of our apps is showing about 400 exceptions/sec while only using the datastore API (which is only used once on startup):

image

@jtattermusch
Copy link
Contributor

@manigandham this has been fixed and there are 1.9.0-pre2 prerelease nugets that contain the fix. Would you mind verifying that the problems you were seeing are gone?

@lock lock bot locked as resolved and limited conversation to collaborators Sep 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants