Conversation
f8ca4e2 to
e55ac11
Compare
Added cach layer Remove redundant calls.
e55ac11 to
04f6520
Compare
cmd/bee/cmd/deploy.go
Outdated
| blocktime, | ||
| true, | ||
| c.config.GetUint64(optionNameMinimumGasTipCap), | ||
| blocktime-2, |
There was a problem hiding this comment.
Just to note that current blocktime on our CI is 1s. So this way caching is not used there... How we could achive to use it there?
There was a problem hiding this comment.
In deploy.go blockTime is hardcoded by itself. But if we are talking about CI, normally cacheTTL is about 85% of block time, so even if it is 1 second, than we will get 850 ms
There was a problem hiding this comment.
This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.
Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.
| t.Helper() | ||
|
|
||
| var expSegments [][]byte | ||
| expSegments := make([][]byte, 0, len(exp)) |
There was a problem hiding this comment.
These are fine but feel like they belong in a separate PR.
There was a problem hiding this comment.
Yes, but linter was failing so I decided to fix everything. Next time will create separate PR
pkg/node/chain.go
Outdated
| pollingInterval time.Duration, | ||
| chainEnabled bool, | ||
| minimumGasTipCap uint64, | ||
| blockCacheTTl time.Duration, |
There was a problem hiding this comment.
Typo blockCacheTTl -> blockCacheTTL
| } | ||
| } | ||
|
|
||
| func NewMetrics() Metrics { |
There was a problem hiding this comment.
Seems not used (dead code).
| c.metrics.LoadErrors.Inc() | ||
| return val, err | ||
| } | ||
| c.Set(val, time.Now()) |
There was a problem hiding this comment.
Should we use here now from argument?
There was a problem hiding this comment.
What could happen if loader takes a while?
There was a problem hiding this comment.
I do not think so because there is difference between time when we request value and time, when we set new value if we had to load it.
|
Suggestion: Instead of a fixed TTL ( How it works:
|
| @@ -0,0 +1,107 @@ | |||
| // Copyright 2025 The Swarm Authors. All rights reserved. | |||
| @@ -0,0 +1,189 @@ | |||
| // Copyright 2025 The Swarm Authors. All rights reserved. | |||
| @@ -0,0 +1,10 @@ | |||
| // Copyright 2025 The Swarm Authors. All rights reserved. | |||
| @@ -0,0 +1,84 @@ | |||
| // Copyright 2025 The Swarm Authors. All rights reserved. | |||
| expiresAt time.Time | ||
|
|
||
| group singleflight.Group | ||
| key Key |
There was a problem hiding this comment.
The key here is only used in metrics and in c.group.Do as the key, but that key is always the same as the singleflight group is the same and the key is the same for the same instance of the ExpiringSingleFlightCache type. I owuld suggest to remove the Key type and just to provide the metrics prefix string in the constructor NewExpiringSingleFlightCache.
| }(i) | ||
| } | ||
|
|
||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Could synctest be used here to avoid sleeping and potentially have a flaky test as synchronization not happen in 50ms?
|
|
||
| c.metrics.Misses.Inc() | ||
|
|
||
| result, err, shared := c.group.Do(string(c.key), func() (any, error) { |
There was a problem hiding this comment.
This group is always calling the same key, so it can be even the static string.
pkg/transaction/wrapped/wrapped.go
Outdated
| b.metrics.TotalRPCCalls.Inc() | ||
| b.metrics.BlockNumberCalls.Inc() | ||
|
|
||
| blockNumber, err := b.backend.BlockNumber(ctx) |
There was a problem hiding this comment.
The problem with the golang.org/x/sync/singleflight is that it is not context.Context aware. The context that is passed from the first caller will influence the end result of all other callers of this function. Meaning, if the first caller cancels or times out on the context, all other callers will receive that error even if they did not cancel or the timeout for their call did not happen. This is why I've created the resenje.org/signleflight, that is context aware and will not terminate the execution of the function until all callers terminate their contexts. There are a few places where resenje.org/singleflight is used in bee, if you like, you can look at it and consider using if you see important for this case. Basically GetOrLoad would require to accept context, pass to signleflight.Do and the context from the callback would be used in b.backend.BlockNumber.
cmd/bee/cmd/deploy.go
Outdated
| blocktime, | ||
| true, | ||
| c.config.GetUint64(optionNameMinimumGasTipCap), | ||
| blocktime-2, |
There was a problem hiding this comment.
This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.
Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.
|
|
||
| c.value = value | ||
| c.valid = true | ||
| c.expiresAt = now.Add(c.ttl) |
There was a problem hiding this comment.
Ideally, for the block number caching, the expiration time should be the time when the next block number is expected and that is around the block time interval. This is to have as least as possible time when the cache is reporting the older block number comapred what is the current block number.
In this generalized caching implementation, it is hard to generalize such requirement as it is very specific to how the block number is increased (in some ~ period, that is configurable). So the ideal situation would be to calculate what would be the time of the next block number and to set expiresAt to that time, or a bit later. And to watch if the block number is actually increased.
In the current implementation, the worst case would be that the cache will keep the block number cache up to the block time (15 minutes, for example on some networks) when the actual block number is increased, if the value is set on the cache a moment before the new block number. On average, statistically in a very large sample of caching, that time would be half of the block time (7 minutes).
I am not sure if such precision is needed, and if the delay of the new block number of up to the whole block time is acceptable. This is just my observation.
|
I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles. If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here). The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping. @janos wdyt? |
Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache. I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically. The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking. |
For me it sounds like overengineering a little bit from one side, and not so flexible implementations from another side (may be I miss smth?). I thought about whatif would like to cache another type of value? Than
I have couple of concerns:
I think, there can be other corner cases, which would be more difficult to catch, since debugging will become more complex. |
Checklist
Description
Removed RPC patterns in the transaction flow. Added cache layer for BlockNumber rpc call.
Fixed lint issues in the new cache package and cleaned up minor lint findings in adjacent test/helper code.
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
#5388
Screenshots (if appropriate):
Highly likely hit ratio is low because of high load errors (EOF), which is fix by PR