Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound size of TransactionPool in memory #3284

Closed
Tracked by #8878
birchmd opened this issue Sep 3, 2020 · 12 comments
Closed
Tracked by #8878

Bound size of TransactionPool in memory #3284

birchmd opened this issue Sep 3, 2020 · 12 comments
Assignees
Labels
A-chain Area: Chain, client & related T-core Team: issues relevant to the core team

Comments

@birchmd
Copy link
Contributor

birchmd commented Sep 3, 2020

Presently transactions are simply stored in a BTreeMap in memory, however, if we receive too many transactions too quickly, then this could grow over time, especially if the transactions are large.

@birchmd birchmd added the A-chain Area: Chain, client & related label Sep 3, 2020
@birchmd birchmd self-assigned this Sep 3, 2020
@stale
Copy link

stale bot commented Jul 1, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Jul 1, 2021
@bowenwang1996 bowenwang1996 added T-core Team: issues relevant to the core team and removed S-stale labels Jul 1, 2021
@bowenwang1996 bowenwang1996 assigned mzhangmzz and unassigned birchmd Jul 6, 2021
@bowenwang1996 bowenwang1996 added this to New Issues in Chain + Transaction Runtime Q3 via automation Jul 6, 2021
@mzhangmzz mzhangmzz moved this from New Issues to Backlog in Chain + Transaction Runtime Q3 Jul 6, 2021
@stale
Copy link

stale bot commented Oct 24, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@nikurt
Copy link
Contributor

nikurt commented Jan 13, 2022

Not sure what to do if a pool reaches its maximum size but transactions keep coming in.
Is it possible that rejecting incoming transactions will also the one transaction that needs to be processed next, i.e. a transaction with the lowest nonce?
@mzhangmzz do you have suggestions?

@nikurt
Copy link
Contributor

nikurt commented Jan 13, 2022

To get an idea of a reasonable tx pool size bound, will add a metric.

near-bulldozer bot pushed a commit that referenced this issue Jan 17, 2022
@bowenwang1996
Copy link
Collaborator

Not sure what to do if a pool reaches its maximum size but transactions keep coming in.

We might need to stop accepting transactions at that point. cc @mm-near

@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Apr 27, 2022
near-bulldozer bot pushed a commit that referenced this issue May 2, 2022
This fixes an issue of the transaction pool growing indefinitely in non-RPC nodes.

Issue #3284
mina86 added a commit to mina86/nearcore that referenced this issue May 18, 2022
…ling

Commit a361b25: ‘Don’t add transactions to a pool in non-validator
nodes’ broke a few nightly tests.  This is a known issue at the
moment.  Disable the tests while fix is being worked on.

Issue: near#3284
@akashin akashin self-assigned this Apr 25, 2023
@akashin
Copy link
Collaborator

akashin commented Apr 25, 2023

I'll pick this issue in the context of congestion work (https://github.com/near/nearcore/milestone/26, #8878).

As a first step, I plan to investigate two naive approaches:

  • Introducing hard capacity limit for transaction pool (e.g. 1000 transactions)
  • Introducing TTL for transactions in the pool (e.g. 1 minute)

I highly suspect that both approaches will break some tests/assumptions that the clients make. My first goal will be to understand:

@akashin
Copy link
Collaborator

akashin commented Apr 26, 2023

We do indeed have tests that rely on putting many transactions into the pool. Example failures are:

Concrete test

fn benchmark_large_chunk_production_time() {
puts 20 transactions into the pool before starting to process them.

I'm adding explicit checks for the returned transaction status #8976 which would make it much easier to catch tests broken by introducing transaction pool size limits.

Retrying the failure in all those tests would likely be too burdensome, so we'll have to use some high-enough limit that will cover the majority of tests and only introduce in the retries in tests that actually exercise the congestion scenario.

@akashin
Copy link
Collaborator

akashin commented May 4, 2023

I will proceed with implementing a hard capacity limit measured in bytes. The limit will be separate for each per-shard transaction pool within the ShardedTransactionPool.
I confirmed that none of the tests break with a limit > 1000 transactions (yet to measure how many bytes that would be).

We will also start returning a new error type to the clients when we reach the transaction pool size limit, so that they can retry this error intelligently with a back-off.

near-bulldozer bot pushed a commit that referenced this issue May 11, 2023
The PR introduces a limit to the size of the transaction pool for each shard and a logic that rejects transactions that go over this limit. The reasoning for this work is described in #3284.

To start, the limit will be disabled (effectively set to infinity) and this PR shouldn't have any effect. The node operators can override it with a config option. In the future, we will come up with a safe value to set by default (probably between 10 MiB - 1 GiB).

We also start with a simple option where the RPC client will not know if their transaction runs into this limit. We will need to rework this part in the future, but it will have to touch the transaction forwarding code in a non-trivial way, and I would prefer to work on this when we have a congestion test ready #8920.

Lastly, this PR adds some nuance to handling reintroducing transactions back to the pool after reorg or producing a chunk (`reintroduce_transactions` method), by acknowledging that not all transactions might have been included and logging the number of dropped transactions.
@akashin
Copy link
Collaborator

akashin commented May 11, 2023

I've also explored the option of garbage collecting transactions based on their age and while doable, the existing clients (e.g. jsonrpc) will not work well with this approach and will raise the error to a higher level, likely all the way to the user (e.g. Wallet interface). This also introduces an attack vector where users can spam the system with many transactions, knowing that only a few of them will be included within the time limit and effectively block transaction pool space at a lower cost.

The remaining steps to finish this work

  • Expose the size of transaction pool and number of dropped transactions as a Prometheus metric
  • Estimate reasonable default limit and set it to testnet nodes
  • Suggest the limit to mainnet nodes
  • Write documentation about this feature

akashin added a commit to akashin/nearcore that referenced this issue May 12, 2023
akashin added a commit to akashin/nearcore that referenced this issue May 15, 2023
akashin added a commit to akashin/nearcore that referenced this issue May 15, 2023
@akashin
Copy link
Collaborator

akashin commented May 15, 2023

I've filed a bug for some follow-up work to simplify transaction pool code: #9060

akashin added a commit to akashin/nearcore that referenced this issue May 16, 2023
near-bulldozer bot pushed a commit that referenced this issue May 16, 2023
This will allow us to understand how big the transaction pools get in practice and what is the realistic limit to set for them.

The logic within pool iterator is a bit complex due to the need to return transactions back to the pool and I'm working on a way to simplify it in a separate PR, but for now this accounting should do the job.

This is one of the steps for #3284
near-bulldozer bot pushed a commit that referenced this issue May 26, 2023
So that it will be read from `config.json` that each node provides

This is a part of #3284
nikurt pushed a commit that referenced this issue May 31, 2023
So that it will be read from `config.json` that each node provides

This is a part of #3284
nikurt pushed a commit to nikurt/nearcore that referenced this issue Jun 8, 2023
So that it will be read from `config.json` that each node provides

This is a part of near#3284
near-bulldozer bot pushed a commit that referenced this issue Jun 12, 2023
Right now the metric has noticeable blips due to rapid change when transactions are drawn from the pool: https://nearinc.grafana.net/goto/YZwyDllVR?orgId=1 To avoid this, we only decrease the metric after the pool iterator is dropped.

This is a part of #3284
near-bulldozer bot pushed a commit that referenced this issue Jun 12, 2023
This PR enables the limit discussed in #3284.

I've [considered](https://near.zulipchat.com/#narrow/stream/297873-pagoda.2Fnode/topic/Adding.20a.20new.20field.20to.20config.2Ejson/near/358955785) another approach to rolling this out by changing the value in `config.json` distributed through S3, but that would require more work both on our side and on validators without adding much benefit. 
Specifically, I've checked that we have a good safety margin here, as over the last month on the testnet, the max size of the transaction pool on the validators was < 40 KB: https://nearinc.grafana.net/goto/AhZFN__4R?orgId=1.

@nikurt What would be a good place to document this field for validators? I saw https://near-nodes.io/, but couldn't find the appropriate section there.
nikurt pushed a commit that referenced this issue Jun 13, 2023
So that it will be read from `config.json` that each node provides

This is a part of #3284
nikurt pushed a commit that referenced this issue Jun 13, 2023
Right now the metric has noticeable blips due to rapid change when transactions are drawn from the pool: https://nearinc.grafana.net/goto/YZwyDllVR?orgId=1 To avoid this, we only decrease the metric after the pool iterator is dropped.

This is a part of #3284
nikurt pushed a commit that referenced this issue Jun 13, 2023
This PR enables the limit discussed in #3284.

I've [considered](https://near.zulipchat.com/#narrow/stream/297873-pagoda.2Fnode/topic/Adding.20a.20new.20field.20to.20config.2Ejson/near/358955785) another approach to rolling this out by changing the value in `config.json` distributed through S3, but that would require more work both on our side and on validators without adding much benefit.
Specifically, I've checked that we have a good safety margin here, as over the last month on the testnet, the max size of the transaction pool on the validators was < 40 KB: https://nearinc.grafana.net/goto/AhZFN__4R?orgId=1.

@nikurt What would be a good place to document this field for validators? I saw https://near-nodes.io/, but couldn't find the appropriate section there.
@akashin
Copy link
Collaborator

akashin commented Jun 13, 2023

The limit of 100 MB on the per-shard transaction pool on each node will be active with the next release that @marcelo-gonzalez will be shepherding.

@akashin akashin closed this as completed Jun 13, 2023
Chain + Transaction Runtime Q3 automation moved this from Backlog to Done Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related T-core Team: issues relevant to the core team
Development

Successfully merging a pull request may close this issue.

6 participants