-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RFC for OpenTelemetry integration #42
Conversation
text/0000-opentelemetry.md
Outdated
The OpenTelemetry project is a CNCF project that aims to standardize observability, especially in a cloud-native environment. | ||
OpenTelemetry is working to create a standard around representing metrics, traces and logs. | ||
We aim to bring full observability of Hyperledger Fabric to allow tracing of transactions all the way from the client | ||
to each of the chaincode invocations. We also want to capture chaincode interactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to capture chaincode interactions.
Why is this information helpful?
We want to be able to trace and correlate actions from chaincode events.
A chaincode event is simply an output of its execution. It is recorded on the blockchain. You can't trace it as it doesn't go to any node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information is helpful as it helps understand the path of execution of code - the equivalent of logging entering methods and exiting methods, but with a simple, lightweight object that can track metadata and tags for the purpose of powering operations.
Chaincode events are typically triggers for oracles that then spawn code execution. Using the trace information associated with the event allows to correlate the execution of arbitrary code back to the event that triggered it and the transaction that originated the event.
It seems to me the proposal is focused mainly on shims and SDKs. I'm not sure why tracing these two components is useful. An SDK is simply a software component that has a private key and can orchestrate transaction creation and notify the application about its success. A chaincode shim is nothing but a static process that receives as input a proposal, executes arbitrary logic and communicates with its peer (excluding the constellation where you deploy it as server which no one uses). If anything, the main thing that seems worthwhile to me to trace, is transaction identifiers. A transaction identifier flows from a client to peers and then back to the client to the ordering service, there it flows among nodes until it lands in a block and then it flows inside the peer in its stages of validation and commit, and tracing this end to end can be useful for analysis of bottlenecks and troubleshooting. I think that the proposal lacks examples of what you would want to trace, and also how this works with the fact that there is already a metric framework in Fabric that integrates with statsd and prometheus. |
The trace information is typically rich, with some trace providers arguing to replace logging and add as much metadata in the form of tags for traces. A transaction identifier would definitely be something to add to the trace. It would then be possible to search by transaction identifier for traces. The existing metrics framework of Hyperledger Fabric is not discussed here indeed - OpenTelemetry can collect Prometheus metrics, and we have been successful in collecting those metrics with the OpenTelemetry Collector or Fabric Logger. I have been looking at offering OpenTelemetry as a framework for metrics for Fabric as well, but I believe that would be a separate enhancement. OpenTelemetry supports exporting metrics in Prometheus format or using the OTLP protocol, so it would be a shoe-in for the existing metrics framework. In this enhancement, we propose that developers interested in shoring up metrics as part of their chaincode may do so, and allow for those metrics to be sent to a remote OTLP endpoint. This is a useful capability for chaincode developers. I will work on providing more samples. Thank you! |
It seems like this RFC is a good fit for some telemetry that was just optionally getting logged out in the Java chaincode implementation. Good to see that gRPC metadata works instead of making changes to the protobuf messages. |
I agree with Yacov that tracing a transaction end-to-end seems like the most valuable use case for Fabric with OpenTelemetry. I can see the need to instrument each of the SDKs for the end-to-end transaction view, but it seems it would be more efficient to instrument the peer and orderer for the transaction, rather than instrumenting each of the language-specific chaincode shims as is described in this RFC. Why implement in each chaincode language shim when it could be implemented once in the peer? |
Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
…racing info Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
224992d
to
146a4e4
Compare
I have amended the RFC based on the conversation on the maintainers call this morning. In particular, I have removed the notion of chaincode tracing. I will create a separate RFC specifically for this particular aspect, as well as adding trace information to chaincode events. |
…ng the maintainers call Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
146a4e4
to
8706a98
Compare
As we continue the discussion and reviews of this RFC, may I request that a number be associated with this RFC and we may move it along the process? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm late to the party but I agree with most of the comments that have been made by the other maintainers.
I also want to point out that the current implementation uses gRPC streams to communicate between the peer and chaincode. This pattern complicates things as messages from many different transactions will share the same stream context. I also want to note that Fabric does not consistently propagate contexts from the transport layer so there is some work to do beyond wiring in the interceptors.
Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
No, the streaming and context propagation might just be fine as the interceptor works at the level of a gRPC service and message exchange over the stream. Don't take my word for it - let's try it out! |
I'm trying to point to an area that you may want to spend a bit of time looking at. Obviously the PRs to Fabric will demonstrate if and how things work. |
No worries. I built an integration test that shows we can collect traces with the chaincode stream here: Maybe it can be enhanced to push a bit further, but chaincode is out of scope anyway. I guess I'll get started on a RFC just for chaincode soon. |
This request was approved a while back, should it be merged? What is the next step here? |
@denyeart - Can you have a look at this as well? I think we are almost ready for final review on this one |
Would it be possible to receive an update on this RFC please? |
@atoulme Sorry this got held up, I think we are close. The last thing that I think could be clarified in the RFC is some discussion of tracing versus metrics. This RFC mentions tracing the most, but also mentions metrics, so it is unclear if the focus is the former or both. Would it make sense as a first pass to focus on tracing only in this RFC? Or do tracing and metrics typically go hand-in-hand? Do you envision the metrics support complementing or replacing the existing Fabric metrics? Would you expect users to want to use both existing Fabric metrics and OpenTelemetry metrics or would that be overkill? Can the existing Prometheus service in the peer and orderer be leveraged for the OpenTelemetry metrics? I see you replied to Yacov with some metrics thoughts, but I think those thoughts either need to be expanded within the RFC text itself (as well as addressing my questions), or removed from the RFC to make it clear that the initial focus is on tracing. |
I am happy to slice and dice it however makes most sense, but OpenTelemetry provides support for metrics and traces. Therefore, if it is applied across the board as a standard, you should expect to see metric integration too. Since I introduced this RFC, OpenTelemetry also issued a standard for logs. |
@denyeart please let me know what you think? |
@atoulme I think we are very close... could you respond to each of my questions from May 11th and update the RFC accordingly? Thanks! |
Understood, working on it. Thank you for your patience. |
Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
3990a1a
to
66e6bf3
Compare
I have amended the scope of this RFC. @denyeart, please see my answers inline:
Sorry about that. I dropped all reference to metrics.
You can just do tracing for now. That's plenty to get started.
The OpenTelemetry SDK is really great to use for metric collection. It also allows to push metrics through the current Prometheus set up with no changes - you just integrate OpenTelemetry with a Prometheus collector. You can combine sending metrics with OpenTelemetry gRPC push and Prometheus. The current Prometheus exports are fine and can be used for now. OpenTelemetry Collector can scrape Prometheus metrics. We don't need to do this at this time or in this RFC.
I have removed metrics from the RFC so the scope is tighter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atoulme thanks for addressing the comments. Given three approvals we can enter final comment period. I'll announce on the Fabric mailing list.
this has three approvals and it's been a week - should this be merged? @denyeart |
@ryjones Yes, this one is ready to merge! |
Signed-off-by: Antoine Toulme antoine@lunar-ocean.com
This RFC offers a path to integrate OpenTelemetry, and specifically adding tracing
and metricsreporting capabilities, to client SDKs and chaincode shim frameworks. This RFC relies and references on the work done in a pull request made to the Java shim chaincode repository.Feedback and comments are welcome and encouraged. Thanks!