Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define scalable "typical" traffic outside Social DB and integrate it into the load generator / test runner #9097

Open
2 of 5 tasks
Tracked by #8999
jakmeier opened this issue May 23, 2023 · 4 comments

Comments

@jakmeier
Copy link
Contributor

jakmeier commented May 23, 2023

For the TPS benchmark, we know that we want some SocialDB workload (see #9095) but we want to combine it with some other "typical" workload.

For this, we probably want to mimic workload observed on mainnet. Ideally it should include traffic from our largest users atm (sweat, aurora, ...) but also non smart contract workload (creating accounts, near transfers, etc).

All if this should be defined in the locust setup.

Tasks

@jakmeier
Copy link
Contributor Author

Traffic Today (Analysis)

I've looked at what mainnet traffic currently looks like. Very roughly, the summary is:

  • 60% of receipts involve token.sweat
    • record_batch by oracle.sweat makes up about 20% of that but is gas heavier than the average tx
    • account funding by sweat_welcome.near makes up about 15%
    • another 15% are claims in tge-lockup
    • the remaining 50% are probably ft_transfer receipts, triggered by claims or other sources
  • 20% aurora
  • 13% nearcrowd
  • 4% spin.fi
  • 3% ref-finance

Proposed Benchmark Workload

Based on that rough analysis, I want to create a benchmark with 20M daily transactions distributed like this:

  • 5M SocialDB (all on a single contract)
  • 5M Sweat (90% simple calls, 10% record batch which will test our storage, all on a single contract with large state)
  • 5M non-Sweat FT (spread across multiple contracts)
  • 3M aurora (single contract, only workload on aurora shard)
  • 2M nearcrowd (multiple contracts)

This tries to project how traffic could grow from today (300k daily tx) to 20M daily tx.
One assumption we established at the start of the quarter was 5M of that should be from social DB.
The remainder I have now defined.

I allocate 10M to FT use cases because Sweat today already takes >50% of TPS.
But instead of forcing it all onto a single account, I want to spread the load across shards, so I split it evenly between Sweat and "other" FTs. That could also simulate more loyalty-program-like use cases coming to near, which isn't unlikely given the success of Sweat.

Then I thought aurora and nearcrowd shouldn't be missing either, given they have >10% of the tx volume each. Aurora is slightly larger today, so I think 3M aurora & 2M nearcrowd makes sense. Note that aurora transactions also consume much more gas than nearcrowd transactions.

Notably, de-fi is missing completely. We could add that later but for now I want to focus on the more common and more performance-critical use cases.

cc @akhi3030 @bowenwang1996 please let me know if you have concerns regarding my proposed benchmark workload

@akhi3030
Copy link
Collaborator

This seems like a good approach to me.

@jakmeier
Copy link
Contributor Author

Currently the plan is to:

  • Integrate the SWEAT workload (only a couple more PRs, code is mostly ready)
  • Don't integrate aurora / nearcrowd for now. If we want to extend the benchmark in the future, this could be done but it's currently not the highest priority.

@jakmeier
Copy link
Contributor Author

I have been running with Sweatcoin batches for a while now and I managed to make it work. However, there is one remaining annoying issue: record_batch can only be called by a registered oracle account, and registering an oracle can only be done by the owner of the contract account.

Currently I create one oracle for each worker. The problem then is that all users of the worker will share one oracle. This results in nonce conflicts. Sometimes this results in a simple retry and things work fine after that. But the most annoying case is where the RPC node accepts the transaction, as the nonce would be valid at the time, but the validator filters it out in its tx pool. Then the user ends up waiting forever (cut off is currently at 30min) and never see a useful error.

I think I can resolve it by using multiple keys per oracle but I'm not sure if I'm willing to spend the extra effort on this just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants