Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return instead of ignore error #1287

Merged
merged 2 commits into from
May 23, 2022
Merged

return instead of ignore error #1287

merged 2 commits into from
May 23, 2022

Conversation

BlinkyStitt
Copy link
Contributor

@BlinkyStitt BlinkyStitt commented May 20, 2022

Motivation

I noticed that occasionally my program would get stuck with one thread spinning at near 100% CPU. After a bunch of investigation with rust-gdb and tokio-console and stripping it down to a much smaller program, I narrowed it down to a websocket task being very active doing nothing.

Here you can see the spawned future stuck running:

IMAGE 2022-05-19 18:30:24

It is being woken up ~253k times per second, so that explains the 100% CPU:

IMAGE 2022-05-19 18:29:44

After some digging, I found this related code:

resp = self.ws.next() => match resp {
Some(Ok(resp)) => self.handle(resp).await?,
// TODO: Log the error?
Some(Err(_)) => {},
None => {
return Err(ClientError::UnexpectedClose);
},

Dropping that error is bad because it might be an Io(Kind(UnexpectedEof)). If that's the case, the websocket needs to reconnect.

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Solution

Quick fix is to just return an error when the socket gives an error.

I think a more robust fix might be to use something like https://docs.rs/stubborn-io/latest/stubborn_io/.

PR Checklist

I'm not sure how to write a test for this.

  • Added Tests
  • Added Documentation
  • Updated the changelog

@gakonst
Copy link
Owner

gakonst commented May 20, 2022

Thanks for the in-depth investigation!

Quick fix is to just return an error when the socket gives an error.

How does returning the error fix the CPU issue? Is this error a temporary error, which means that it made sense to ignore it? If we're terminating on a temporary error that doesn't seem right?

@BlinkyStitt
Copy link
Contributor Author

BlinkyStitt commented May 20, 2022

The error I'm seeing (and I think all of them), is not a temporary error. Once you get it, checking that future again always gives back that same error (in my case, an Eof) immediately. The CPU issue is because it gets stuck in tick just getting an Eof back immediately every time it polls. Changing it to return an error causes the websocket thread to panic higher up in the code. I think there is another issue (or at least some discussion in telegram) to make that give a nicer error instead of panic.

Panic works fine for me though because I detect that my subscription exited and reconnect (code here).

// TODO: Log the error?
Some(Err(_)) => {},
Some(Err(_)) => {
return Err(ClientError::UnexpectedClose);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of dropping the error we can add a new field to enum UnknownError(Err) that wraps the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll try to get to that tomorrow.

@BlinkyStitt
Copy link
Contributor Author

with this PR, my proxy saw an error and reconnected:

image

Having the full error would be even better though.

@@ -401,8 +401,9 @@ where
// Handle ws messages
resp = self.ws.next() => match resp {
Some(Ok(resp)) => self.handle(resp).await?,
// TODO: Log the error?
Some(Err(_)) => {},
Some(Err(_)) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, but let's also log here the error

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's log the error and otherwise looks good

@gakonst gakonst merged commit 9ed9395 into gakonst:master May 23, 2022
mattsse pushed a commit to mattsse/ethers-rs that referenced this pull request May 28, 2022
* return error instead of swallow error

* changelog

Co-authored-by: Bryan Stitt <bryan@satoshiandkin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants