Skip to content

use metrics.RPCClientMetrics in multinode#97

Merged
guandali merged 9 commits intomainfrom
lli/integrate-rpc-metric
Apr 8, 2026
Merged

use metrics.RPCClientMetrics in multinode#97
guandali merged 9 commits intomainfrom
lli/integrate-rpc-metric

Conversation

@guandali
Copy link
Copy Markdown
Contributor

@guandali guandali commented Mar 26, 2026

Description

This PR integrates RPCClientMetrics into multinode so RPC head polling calls emit observability data directly from RPCClientBase. It records latency for LatestBlock and LatestFinalizedBlock with labels for chainFamily, chainID, rpcUrl, isSendOnly, success, and rpcCallName, allowing error rate to be derived from rpc_call_latency_count{success="false"} instead of maintaining a separate error counter.

It also adds test coverage for successful requests, failed RPC calls, and invalid-head responses

Instrument RPC client with Beholder metrics for latency and error rates (v1)

Requires Dependencies

@guandali guandali requested a review from a team as a code owner March 26, 2026 23:59
@github-actions
Copy link
Copy Markdown

👋 guandali, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

Copy link
Copy Markdown
Contributor

@dhaidashenko dhaidashenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to inject it on the framework level?
All the latency metrics are captured on rpc client level here.

Is there a reason to use a different approach?

vlfig
vlfig previously requested changes Mar 27, 2026
Copy link
Copy Markdown
Contributor

@vlfig vlfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the ergonomics for callers.

Comment thread multinode/rpc_client_base.go Outdated
FinalizedBlockPollInterval() time.Duration
}

type RPCClientBaseMetricsConfig struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense IMO to create a config that contains a RPCClientMetrics which the caller needs to construct with a RPCClientMetricsConfig that itself contains the chainFamily and ID.

I think it is a worthy change to have the caller call an interface instead of the metric directly so we can take care of things like moving from promauto to beholder and other details, but we should have the rpc_client care about it the least possible amount, and that'd be simply providing only the necessary config once.

So, if the URL and SendOnly are constant throughout the lifetime of the client, they can be part of RPCClientMetricsConfig, and that's it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Comment thread multinode/rpc_client_base.go Outdated
@guandali
Copy link
Copy Markdown
Contributor Author

guandali commented Apr 1, 2026

Why do we need to inject it on the framework level? All the latency metrics are captured on rpc client level here.

Is there a reason to use a different approach?

Hey @dhaidashenko yes, there is already latency instrumentation at the client layer. The client-level latency metrics capture raw RPC method timings, but they do not capture the multinode semantic operation (latest_block / latest_finalized_block) or post-call validation failures like invalid heads. We inject metrics at the framework level which gives us a consistent, chain-agnostic selection metric that reflects whether an RPC can return a valid, usable head for multinode, rather than only whether some underlying RPC method responded quickly.

@dhaidashenko
Copy link
Copy Markdown
Contributor

Why do we need to inject it on the framework level? All the latency metrics are captured on rpc client level here.
Is there a reason to use a different approach?

Hey @dhaidashenko yes, there is already latency instrumentation at the client layer. The client-level latency metrics capture raw RPC method timings, but they do not capture the multinode semantic operation (latest_block / latest_finalized_block) or post-call validation failures like invalid heads. We inject metrics at the framework level which gives us a consistent, chain-agnostic selection metric that reflects whether an RPC can return a valid, usable head for multinode, rather than only whether some underlying RPC method responded quickly.

Makes sense. Will approve once Vasco's comments are resolved.
Is it possible for you to include enabling beholder metrics on the EVM RPC client level into the scope of this work? It'll greatly improve observability for the NOPs deployments.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

📊 API Diff Results

No changes detected for module github.com/smartcontractkit/chainlink-framework/metrics

View full report

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

⚠️ API Diff Results - github.com/smartcontractkit/chainlink-framework/multinode

⚠️ Breaking Changes (1)

./ (1)
  • NewRPCClientBase — Type changed:
func(
  RPCClientBaseConfig, 
  time.Duration, 
  github.com/smartcontractkit/chainlink-common/pkg/logger.Logger, 
  func(ctx context.Context) (HEAD, 
  error), 
  func(ctx context.Context) (HEAD, 
  error), 
  + string, 
  + bool, 
  + github.com/smartcontractkit/chainlink-framework/metrics.RPCClientMetrics
)
*RPCClientBase[HEAD]

📄 View full apidiff report

Comment thread multinode/rpc_client_base.go Outdated
return
}

m.rpcMetrics.RecordRequest(ctx, rpcMetricsDefaultURL, rpcMetricsDefaultIsSendOnly, callName, time.Since(startedAt), err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we always pass the rpcMetricsDefaultURL? This way we won't be able to differentiate the metrics for different rpc instances

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@guandali guandali requested a review from dhaidashenko April 7, 2026 18:19
@guandali guandali enabled auto-merge (squash) April 8, 2026 16:19
@guandali guandali dismissed vlfig’s stale review April 8, 2026 16:19

Vasco is out

@guandali guandali merged commit b2dbdd9 into main Apr 8, 2026
26 checks passed
@guandali guandali deleted the lli/integrate-rpc-metric branch April 8, 2026 16:26
fernandezlautaro added a commit to smartcontractkit/chainlink-evm that referenced this pull request Apr 10, 2026
ilija42 pushed a commit to smartcontractkit/chainlink-evm that referenced this pull request Apr 10, 2026
* adding jovay's nonce too low regex

This reverts commit d5cad0f.

* bump chainlink-framework

* fixing API breaking change from smartcontractkit/chainlink-framework#97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants