Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled promise rejections en masse #1208

Closed
Jannis opened this issue Dec 19, 2020 · 18 comments
Closed

Unhandled promise rejections en masse #1208

Jannis opened this issue Dec 19, 2020 · 18 comments
Labels
bug Verified to be an issue. fixed/complete This Bug is fixed or Enhancement is complete and published.

Comments

@Jannis
Copy link

Jannis commented Dec 19, 2020

We're seeing a ton of unhandled promise rejections using ethers 5.0.12:

Error: timeout (requestBody="{\"method\":\"eth_blockNumber\",\"params\":[],\"id\":52,\"jsonrpc\":\"2.0\"}", requestMethod="POST", timeout=120000, url="...snip...", code=TIMEOUT, version=web/5.0.11)
    at Logger.makeError (/opt/indexer/node_modules/@ethersproject/logger/lib/index.js:179:21)
    at Timeout._onTimeout (/opt/indexer/node_modules/@ethersproject/web/lib/index.js:148:35)
    at listOnTimeout (internal/timers.js:549:17)
    at processTimers (internal/timers.js:492:7)

The setup is as follows. After creating the Ethereum provider, we use it to make a number of contract calls and create transactions (as you would).

    const ethereum = new providers.StaticJsonRpcProvider(
      {
        url: ...,
        user: ...,
        password: ...,
        allowInsecureAuthentication: true,
      },
      "mainnet"
    )
    ethereum.pollingInterval = 4000

    ethereum.on('debug', info => { ... })

    ethereum.on('network', (newNetwork, oldNetwork) => {
      logger.trace('Ethereum network change', {
        oldNetwork: oldNetwork,
        newNetwork: newNetwork,
      })
    })

    logger.info(`Connected to Ethereum`, {
      pollingInterval: ethereum.pollingInterval,
      network: await ethereum.detectNetwork(),
    })

We have tested custom Ethereum nodes, Alchemy, Infura — the behavior is the same across all of them. I'm not entirely sure what leads to the eth_blockNumber calls, why they happen so frequently and why they all time out.

I'd be happy to provide more details if needed though.

@ricmoo
Copy link
Member

ricmoo commented Dec 19, 2020

I don’t think this should affect anything, but can you try upgrading to the latest version of ethers?

Does this just show up in the browser as unhandled? When using INFURA, alchemy, etc. are you including your own API key?

@ricmoo ricmoo added the discussion Questions, feedback and general information. label Dec 19, 2020
@Jannis
Copy link
Author

Jannis commented Dec 19, 2020

The software runs on Node.js (not in the browser). And yes, the API keys being used are individuals' keys. We have a handful of people reporting this to happen for them, but it seems inconsistent insofar as others (including myself) do not run into the problem, even when running the same Docker image (from where the Node.js-based software is running).

Upgrading to the latest ethers version was my next step.

@alexandruast
Copy link

In my particular case, directly hitting the node(geth v1.9.25) from the container running the node.js application, using autocannon, there are no timeouts whatsoever:

root@ea4c57232ee4:/opt/indexer/packages/indexer-agent# autocannon -c 10 -d 30 -H "Content-Type: application/json" -b '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":69}' http://seth.stakesystems.io:8545
Running 30s test @ http://seth.stakesystems.io:8545
10 connections

┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max    │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼────────┤
│ Latency │ 0 ms │ 0 ms │ 0 ms  │ 1 ms │ 0.05 ms │ 2.48 ms │ 212 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev   │ Min     │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
│ Req/Sec   │ 18303   │ 18303   │ 22975   │ 26031   │ 22578.67 │ 2159.27 │ 18294   │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
│ Bytes/Sec │ 2.82 MB │ 2.82 MB │ 3.54 MB │ 4.01 MB │ 3.48 MB  │ 332 kB  │ 2.82 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┴─────────┘

Req/Bytes counts sampled once per second.
677k requests in 30.01s, 104 MB read

Before, during, and after the test, the node.js application thrown lots of unhandled promise rejections.

@Jannis
Copy link
Author

Jannis commented Dec 19, 2020

We did some more experimentation and it looks like increasing the polling interval is helping:

Around ~75 errors per 5 minutes on the default (4000ms)
~20 per 5 minutes on 15000ms
0 errrors on 30000ms

Given that the block time is ~15s on mainnet, I wonder if eth_blockNumber requests somehow throw timeouts if the block hasn't change since the last time it ran? Just a wild guess, I admit.

@Jannis
Copy link
Author

Jannis commented Jan 4, 2021

I can reliably reproduce it with a script that does no more than:

import { providers } from 'ethers'

async function run() {
  const provider = new providers.InfuraProvider('mainnet', ...)
  await provider.ready

  for (const item of [ ...large array... ]) {
    // create a transaction using `provider`
  }
}

run()

It always runs into the unresolved promise rejections, be it after 2, 5 or 10 minutes.

@andrevmatos
Copy link

I'm also getting this error, and I think #1221 is a duplicate of this one.
From I could gather by reading the source, I think I've found the cause of this:

const internalBlockNumber = this._internalBlockNumber;
if (maxAge > 0 && this._internalBlockNumber) {
const result = await internalBlockNumber;
if ((getTime() - result.respTime) <= maxAge) {
return result.blockNumber;
}
}
const reqTime = getTime();
const checkInternalBlockNumber = resolveProperties({
blockNumber: this.perform("getBlockNumber", { }),
networkError: this.getNetwork().then((network) => (null), (error) => (error))
}).then(({ blockNumber, networkError }) => {
if (networkError) {
// Unremember this bad internal block number
if (this._internalBlockNumber === checkInternalBlockNumber) {
this._internalBlockNumber = null;
}
throw networkError;
}
const respTime = getTime();
blockNumber = BigNumber.from(blockNumber).toNumber();
if (blockNumber < this._maxInternalBlockNumber) { blockNumber = this._maxInternalBlockNumber; }
this._maxInternalBlockNumber = blockNumber;
this._setFastBlockNumber(blockNumber); // @TODO: Still need this?
return { blockNumber, reqTime, respTime };
});
this._internalBlockNumber = checkInternalBlockNumber;

On the code above, if this.perform("getBlockNumber", { }) rejects, because of a timeout or whatever, it's going to poison this._internalBlockNumber, which then will always reject at line 591 on all calls where maxAge is positive (most impactful, the one at L632 of async poll() method).
I'll prepare a PR to extend the error handling of networkError to also work for blockNumber.

@ricmoo
Copy link
Member

ricmoo commented Jan 12, 2021

Thanks! I will try to reproduce this today. Worst case I can add some code to force a poison (great word) with 50% probability to simulate the issue. :)

@ricmoo ricmoo added investigate Under investigation and may be a bug. on-deck This Enhancement or Bug is currently being worked on. and removed discussion Questions, feedback and general information. labels Jan 12, 2021
@andrevmatos
Copy link

Looks like #1084 is also this same bug.
I think a simple change like this should work:

        const checkInternalBlockNumber = resolveProperties({
            blockNumber: this.perform("getBlockNumber", { }),
            networkError: this.getNetwork(),
        }).then(({ blockNumber }) => {
            const respTime = getTime();

            blockNumber = BigNumber.from(blockNumber).toNumber();
            if (blockNumber < this._maxInternalBlockNumber) { blockNumber = this._maxInternalBlockNumber; }

            this._maxInternalBlockNumber = blockNumber;
            this._setFastBlockNumber(blockNumber); // @TODO: Still need this?
            return { blockNumber, reqTime, respTime };
        }).catch((error) => {
            // Unremember this bad internal block number
            if (this._internalBlockNumber === checkInternalBlockNumber) {
                this._internalBlockNumber = null;
            }
            throw error;
        });

@ricmoo ricmoo added bug Verified to be an issue. and removed investigate Under investigation and may be a bug. labels Jan 13, 2021
This was referenced Jan 13, 2021
@ricmoo
Copy link
Member

ricmoo commented Jan 13, 2021

Thanks for your hard work! I believe you are absolutely correct.

I have made some local changes to reproduce this reliably (I added a new method used that will reject 50% of the time and succeed the other 50% of the time) and have a fix in place that I think works.

I'll keep this ticket updated.

@ricmoo
Copy link
Member

ricmoo commented Jan 14, 2021

This should be addressed in 5.0.26. Please try it out and let me know. :)

@ricmoo ricmoo added fixed/complete This Bug is fixed or Enhancement is complete and published. and removed on-deck This Enhancement or Bug is currently being worked on. labels Jan 15, 2021
@mrwillis
Copy link

It looks better with 5.0.26 but I'm actually still getting these errors. I can confirm 5.0.26 is the only 5.x one installed in my project (there are some deps which use 4.x)

image

@andrevmatos
Copy link

@mrwillis The implemented fix doesn't prevent one error from happening: if your network is unstable or the eth node is overloaded, it can happen, and the best that can be done is to propagate and log, as is being done; the fix is actually to prevent one such error from never allowing further results to be properly fetched, which was the case before, and is fixed now;
In sum, you can still see one error now and then, but as soon as the node starts to reply properly, polling will be healthy again

@mrwillis
Copy link

thanks @andrevmatos. What can be done on our end to avoid it bubbling up to the main scope? i.e., handling it somehow

Jannis added a commit to graphprotocol/indexer that referenced this issue Jan 15, 2021
ethers 5.0.26 includes a fix for ethers-io/ethers.js#1208,
which should remove the unresolved promise rejections we've seen in #183.
Jannis added a commit to graphprotocol/indexer that referenced this issue Jan 15, 2021
ethers 5.0.26 includes a fix for ethers-io/ethers.js#1208,
which should remove the unresolved promise rejections we've seen in #183.
@ricmoo
Copy link
Member

ricmoo commented Jan 19, 2021

@mrwillis This seems to be a separate issue from the one 5.0.26 was intended to fix.

I think I've found the problem though, and will fix it.

@ricmoo
Copy link
Member

ricmoo commented Jan 19, 2021

Actually. I take that back. The promise I thought wasn't handled is.

Are you possibly calling detectNetwork somewhere in your code and not handling error conditions?

@mrwillis
Copy link

mrwillis commented Jan 19, 2021

@ricmoo no not calling that. Could it be we are not awaiting provider.ready? Other than that we are not doing anything special

@ricmoo
Copy link
Member

ricmoo commented Feb 25, 2021

@mrwillis Are you still having issues with this?

@mrwillis
Copy link

@ricmoo All good now thanks

fordN pushed a commit to graphprotocol/indexer that referenced this issue Mar 15, 2021
ethers 5.0.26 includes a fix for ethers-io/ethers.js#1208,
which should remove the unresolved promise rejections we've seen in #183.
@ricmoo ricmoo closed this as completed Mar 30, 2021
K-Chang32 added a commit to K-Chang32/indexer that referenced this issue Apr 4, 2024
ethers 5.0.26 includes a fix for ethers-io/ethers.js#1208,
which should remove the unresolved promise rejections we've seen in #183.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Verified to be an issue. fixed/complete This Bug is fixed or Enhancement is complete and published.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants