Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Problem: contract call which does lots of message calls is slow to execute #710

Closed
yihuang opened this issue Nov 1, 2021 · 8 comments · Fixed by #729
Closed

Problem: contract call which does lots of message calls is slow to execute #710

yihuang opened this issue Nov 1, 2021 · 8 comments · Fixed by #729

Comments

@yihuang
Copy link
Contributor

yihuang commented Nov 1, 2021

System info: ethermint main

Test Contract

pragma solidity ^0.6.6;

contract Inner {
	event TestEvent(uint256);
    function test() public returns (uint256) {
        emit TestEvent(42);
        return 42;
    }
}

// An contract that do lots of message calls
contract TestMessageCall {
    Inner _inner;
	constructor() public {
        _inner = new Inner();
	}

    function test(uint iterations) public returns (uint256) {
        uint256 n = 0;
        for (uint i=0; i < iterations; i++) {
            n += _inner.test();
        }
        return n;
    }
}

EstimateGas Slow

When call eth_estimateGas for the test(40000) contract method call, the web3 HTTP client timeout, the node log shows it takes 17 seconds to finish.
A single eth_call can be finished in 584ms, and the binary search in estimateGas did ceil(log2(25000000)) = 25 number of calls, it sums up to 25 * 584ms = 14 seconds, close to what we see in the eth_estimateGas call.

Solution:

  • improve the execution speed of each call
  • lower the internal gas cap (so out of gas tx will fail earlier, and estimateGas will do less iterations in the worst case).

Tx Execution Slow

Further investigation shows that the actual execution of the tx is even slower, a single run of the tx takes 30 seconds, the profile shows the hotspot is in the commit of a deep context stack.

Solution:

  • might have to refactor StateDB to use journal log approach like go-ethereum to implement the snapshot/revert.

Expected behavior: RPC API and transactions should return fast

Actual behavior: blocked for dozens of seconds

Additional info:

Compiled contract:
TestMessageCall.json.zip

@yihuang
Copy link
Contributor Author

yihuang commented Nov 1, 2021

I tried same script with geth and same gas limit, it's much faster, the request returns normally:

ValueError: {'code': -32000, 'message': 'gas required exceeds allowance (24926833)'}

@leejw51crypto
Copy link
Contributor

leejw51crypto commented Nov 1, 2021

how about logging the loop count which calls evm evaluation in binary searching?
time for each call
total call count

maybe too many calls?

starting gas allowance maybe too high?
possible solution is lowering default gas allowances?

@yihuang
Copy link
Contributor Author

yihuang commented Nov 1, 2021

how about logging the loop count which calls evm evaluation in binary searching? time for each call total call count

maybe too many calls?

it's the worst case of binary search, so it should be around log2(25000000) = 24.57

starting gas allowance maybe too high? possible solution is lowering default gas allowances?

yeah, possibly, or make the performance closer to go-ethereum.

@yihuang
Copy link
Contributor Author

yihuang commented Nov 1, 2021

I did a single eth_call, the logged execution time is 584.998496ms, and 584.998496 * 25 = 14624.9624ms, close to the 17s of estimateGas.

@leejw51crypto
Copy link
Contributor

leejw51crypto commented Nov 1, 2021

25 times of search count is not bad,
can we reduce evm execution time ?
584 ms == 0.5 seconds

  1. possibly , too many logs from evm calls?
  2. some cost for each state commit? (in running evm)

@yihuang
Copy link
Contributor Author

yihuang commented Nov 2, 2021

Screen Shot 2021-11-02 at 12 54 14 PM

I did a profile, most of the time are spent on the commit of context stack.

I think we have an O(N*M) (N: depth of context stack, M: number of written items) complexity here, because, for an item set at the top of the stack, when commit, it's not committed directly to initial context, but committed to every context in the stack one by one.

@yihuang
Copy link
Contributor Author

yihuang commented Nov 2, 2021

The benchmark is added here: https://github.com/yihuang/ethermint/tree/bench-tx
Profile: go test -v ./x/evm/keeper -v -run="^$" -bench="BenchmarkMessageCall" -cpuprofile profile.out
Benchmark result on my mac:

goos: darwin
goarch: amd64
pkg: github.com/tharsis/ethermint/x/evm/keeper
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkMessageCall
BenchmarkMessageCall-16    	       1	30278823164 ns/op

@yihuang yihuang changed the title Problem: An estimateGas request is very slow to execute Problem: contract with lots of message call is slow Nov 2, 2021
@yihuang yihuang changed the title Problem: contract with lots of message call is slow Problem: contract call which does lots of message calls is slow to execute Nov 2, 2021
@yihuang yihuang mentioned this issue Nov 2, 2021
11 tasks
odeke-em added a commit to cosmos/cosmos-sdk that referenced this issue Nov 8, 2021
…st (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710
blewater pushed a commit to e-money/cosmos-sdk that referenced this issue Dec 8, 2021
…st (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710
@github-actions
Copy link

This issue is stale because it has been open 45 days with no activity. Remove Status: Stale label or comment or this will be closed in 7 days.

mergify bot pushed a commit to cosmos/cosmos-sdk that referenced this issue Dec 28, 2021
…st (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md
amaury1093 pushed a commit to cosmos/cosmos-sdk that referenced this issue Jan 3, 2022
…st, avoid keys sort (backport #10486) (#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
daeMOn63 pushed a commit to fetchai/cosmos-sdk that referenced this issue Mar 1, 2022
…st, avoid keys sort (backport #10486) (#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
JimLarson pushed a commit to agoric-labs/cosmos-sdk that referenced this issue Jul 7, 2022
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
Eengineer1 pushed a commit to cheqd/cosmos-sdk that referenced this issue Aug 26, 2022
…st, avoid keys sort (backport cosmos#10486) (cosmos#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (cosmos#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per evmos/ethermint#710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes cosmos#10487
Updates evmos/ethermint#710

(cherry picked from commit 5399e72)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants