Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async/callback based interface between Span/Log Processor and Span/Log Exporter #1239

Closed
lalitb opened this issue Mar 1, 2022 · 28 comments
Closed
Assignees
Labels

Comments

@lalitb
Copy link
Member

lalitb commented Mar 1, 2022

As of now, the flow of spans from Span Processor to Exporter is synchronous, which means the processor has to wait for the upload/export status of the ongoing request before invoking a new request.

Application -> Span::OnEnd() -> SimpleProcessor::OnEnd -> Mutex Lock -> Exporter::Export()

The application has to wait for the span to be uploaded to continue further with business logic.

The Batch processor simplifies this by batching/caching the spans in the processor and uploading them in a separate thread.

Application -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)
            -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)
            -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)   
                                        ProcessorThread-> CircularBuffer::Drain() -> Export::Export()
Application -> Span::OnEnd() -> BatchProcessor::OnEnd() -> CircularBuffer:Add(span)  

This allows the Application to continue processing business logic without waiting for spans to export, and also Span::OnEnd() will not block even when export is ongoing in a separate thread ( there is no lock due to the use of CircularBuffer to store Spans).

With BatchProcessor, while the interface between application and processor is fast (no wait), the interface between processor and exporter is still synchronous. As the processor needs to wait for the ongoing span(s) export to finish before invoking new export. This causes a bottleneck if the application is creating spans faster than the time taken by the exporter to export the spans, and eventually, the CircularBuffer gets full. Once the CircularBuffer gets full, Span::OnEnd() needs to wait/block on the buffer to be available for writing.

The current design for the interface between processor and exporter is driven by specification as below

  • Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.
  • Success - The batch has been successfully exported. For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server.
  • Note: this result may be returned via an async mechanism or a callback, if that is idiomatic for the language implementation.

So, the solution could be to return the exporter::Export() result as a callback or async mechanism,

  • Async mechanism: C++ provide async mechanism using std::async/std::promise. It would be good to do more research if its is suitable for more complex scenarios, and whether all the compiler provide a stable implementation for it before deciding on it ( Good read - https://www.cppstories.com/2014/01/tasks-with-stdfuture-and-stdasync/ )
  • Callback mechanism: would be much simpler something like:
     bool Exporter:Export(const nostd::span<std::unique_ptr<Recordable>> &spans,
          nostd::function_ref<bool(ExportResult)> result_callback);

This allows the exporter to invoke the result_callback when the export status is available, and the processor to invoke the export without waiting for the ongoing export to complete.

Also, the interface change should be backward compatible, and shouldn't break the existing/external exporters. One option would be to have the processor decide which export mechanism to use based on the configurable parameter. The default should be as existing ( i.e, synchronous export).

@lalitb lalitb added area:exporter enhancement New feature or request good first issue Good for newcomers help wanted Good for taking. Extra help will be provided by maintainers and removed help wanted Good for taking. Extra help will be provided by maintainers labels Mar 1, 2022
@lalitb
Copy link
Member Author

lalitb commented Mar 1, 2022

To start with, this new interface can be quickly integrated with Zipkin, and OTLP HTTP exporter, as the existing ext::http::client support callback mechanism.

@owent
Copy link
Member

owent commented Mar 2, 2022

This is also for Log Processor and Log Exporter.

@lalitb lalitb changed the title Async/callback based interface between Span Processor and Span Exporter Async/callback based interface between Span/Log Processor and Span/Log Exporter Mar 2, 2022
@DebajitDas
Copy link
Member

Hi @lalitb, I would like to explore on this, obviously with your guidance if this is ok.

@lalitb
Copy link
Member Author

lalitb commented Mar 3, 2022

Thanks, @DebajitDas . Have assigned it to you now. Feel free to ask any questions.

@lalitb
Copy link
Member Author

lalitb commented Mar 3, 2022

@DebajitDas Also, you may want to look into a related PR #63, which was an attempt to add async support using libevent. The PR adds support for files i/o and timers and networking was to follow. The PR was never merged for various reasons.

JFYI, Feel free to decide on the best approach on adding the async support :)

@DebajitDas
Copy link
Member

Thanks @lalitb . I am looking at C++ async mechanism and see if that can be used to achieve asynchronous Export Call.
I looked at http::client callback mechanism, but that seems to be blocking call, even though it was implemented with async mechanism. Am I missing something?

@owent
Copy link
Member

owent commented Mar 4, 2022

I create another issue #1243 to implement async send for opentelemetry::ext::http::client::curl::HttpClient. But it's just for curl and http client.
To my understanding, there is no standard async mechanism in C++, select() can be used on all platforms we support but it has performence problem and it can not handle more than 1024 file descriptors(#1220 ) . Many libraries and system use libevent, libuv or [boost.asio] as their event poller. (libcurl support use both libevent and libuv or use internal select() as fallback.). hiredis support more event management libraries or frameworks.

Some framework implement their own event and timer management(e.g. redis). It need to implement different poller for different system, IOCP for Windows, epoll for linux(maybe io_uring for modern linux kernel), kqueue for BSD, evport for Solaris and etc.

I'm thinking to use curl_multi_poll as the async mechanism in opentelemetry::ext::http::client::curl::HttpClient. Of course, It's better if we have a our own event system and all modules can use the same async mechanism.

@lalitb
Copy link
Member Author

lalitb commented Mar 4, 2022

I looked at http::client callback mechanism, but that seems to be blocking call, even though it was implemented with async mechanism. Am I missing something?

@DebajitDas You are right. The abstract method http::client::Session::SendRequest takes a callback function as an argument, and it's curl implementation http::client::curl::Session::SendRequest is synchronous i.e, this method returns only after request is completed and callback function is called. This should probably change once @owent implements #1243. But we can take full advantage of this change only once the Exporter::export() also takes a callback as an argument. Also once we support multiple HTTP clients along with curl (#1145 and #1146), these clients may be supporting request handling through polling and it would be good if the SDK is already prepared to handle that.

To my understanding, there is no standard async mechanism in C++

@owent agree on this. Any async mechanism requires an event loop to run, and this is missing in C++. std::async potentially uses a separate thread to run the operation which is not a true async mechanism. That is the reason PR #63 to use libevent should be something we can think about. It would be also useful to use this for metrics implementation, where we need to poll for metrics from multiple instruments simultaneously. What are your thoughts on using libevent?

@owent
Copy link
Member

owent commented Mar 5, 2022

What are your thoughts on using libevent?

In my opnion, we should let the user decide which event loop to use. Should we also implement a adaptor just like hiredis? And we can use libevent by default.

@owent
Copy link
Member

owent commented Mar 7, 2022

I also use libwebsockets in our project, it alse support to use libuv , libevent, libev, libubox/uloop and etc. as event loop. I think libuv and libevent is widely used now.

@lalitb
Copy link
Member Author

lalitb commented Mar 7, 2022

In my opnion, we should let the user decide which event loop to use. Should we also implement a adaptor just like hiredis? And we can use libevent by default.

This is also somewhat what #63 is trying to achieve. Provides an abstract interface, with libevent implementation.

@DebajitDas
Copy link
Member

From the interface perspective of Processor ---> Exporter, callback mechanism could be achieved by the following:

  1. Using std::async and std::shared_future: On every call to Export, Exporter would use std::async (std::launch::async) to launch a separate thread and perform the export work there. And when the result is ready, invoke the callback.
    std::shared_future can be stored in queue and a separate worker thread would run in exporter to fetch the shared_future to wait on the task to complete.
    Caveat: Every call to export creates a new thread, hence there might be performance penalty and system thread limit might be reached.
    Also, according to this there is difference in implementation of GCC/LLVM and MSVC compilers. MSVC compiler uses threads from windows thread pool, therefore should carefully use thread_local variable as they might not be initialised.

  2. The other option would be to launch thread_pool in the exporter, where worker threads would pick up its work and perform export and would invoke the callback on export completion.
    And the number of threads could be kept configurable. In case libevent is used, I think one worker thread is sufficient.

I might be wrong in my understanding and not at par with the context. Please correct me if I am missing something.

@lalitb
Copy link
Member Author

lalitb commented Mar 7, 2022

I would prefer to offload most of the complexity of thread/concurrency management to Span/Log Processor and let the exporter logic be focused on the serialization and transport. This will also ease the work for exporter developers.

If we can scope this issue for only interface change, and think about concurrency separately, the rough changes for ElasticsearchLogExporter would be (not tested):

class ResponseHandler : public http_client::EventHandler
{
public:
  /**
   * Creates a response handler, that by default doesn't display to console
   */
  ResponseHandler(bool console_debug = false, 
        nostd::function_ref<bool(ExportResult)> result_callback, 
        std::shared_ptr<Session> &&session ) : 
	console_debug_{console_debug}, session_{std::move(session)}
  {
  }

  /**
   * Automatically called when the response is received, store the body into a string and notify any
   * threads blocked on this result
   */
  void OnResponse(http_client::Response &response) noexcept override
  {
	// Log error if response body has error
	session->FinishSession();
      result_callback(sdk::common::ExportResult::kSuccess);
	
  }

  void OnEvent(http_client::SessionState state, nostd::string_view reason) noexcept override
 {
    //Log reason / state
     session->FinishSession();
     result_callback(sdk::common::ExportResult::kFailure)
 }
};


void ElasticsearchLogExporter::Export(
    const nostd::span<std::unique_ptr<sdklogs::Recordable>> &records,
nostd::function_ref<bool(ExportResult)> result_callback) noexcept
{

  // Create a connection to the ElasticSearch instance
  auto session = http_client_->CreateSession(options_.host_ + std::to_string(options_.port_));
  auto request = session->CreateRequest();
  request->SetUri(options_.index_ + "/_bulk?pretty");
  request->SetMethod(http_client::Method::Post);
  request->AddHeader("Content-Type", "application/json");
  request->SetTimeoutMs(std::chrono::milliseconds(1000 * options_.response_timeout_));

   body_vec = serialise(records); 
  request->SetBody(body_vec);
  std::unique_ptr<ResponseHandler> handler(new ResponseHandler(options_.console_debug_, result_callback, session));
  session->SendRequest(*handler);

}

Again, would be good to have more thoughts before finalizing, and also can discuss in the community meeting.

@owent
Copy link
Member

owent commented Mar 7, 2022

I would prefer to offload most of the complexity of thread/concurrency management to Span/Log Processor and let the exporter logic be focused on the serialization and transport.

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish
IO?

@lalitb
Copy link
Member Author

lalitb commented Mar 7, 2022

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish
IO?

Yes, we definitely need to have an internal event loop. In the case of HTTP exporter, the decision of using the event loop for IO should be with ext::http::client implementation. So if the ext::http::curl want to use curl_multi_poll it should use the event loop if available. Do you think it would be beneficial for non-HTTP exporters?

@owent
Copy link
Member

owent commented Mar 7, 2022

Yes, we definitely need to have an internal event loop. In the case of HTTP exporter, the decision of using the event loop for IO should be with ext::http::client implementation. So if the ext::http::curl want to use curl_multi_poll it should use the event loop if available. Do you think it would be beneficial for non-HTTP exporters?

Yes. And if other exporters use any SDK which allow users to get the raw socket fd. We can also use our internal event loop to handle it.

@DebajitDas
Copy link
Member

DebajitDas commented Mar 7, 2022

std::unique_ptr<ResponseHandler>

@lalitb Thanks for clarifying the scope of this issue. With the above, we need to maintain such that RequestHandler object remains valid even when callback is called in a separate thread.

I have these follow up questions now:

  1. Should we maintain two different export versions - both with and without callback?
  2. Processor decides which version to call depending on configurable parameters.

@lalitb
Copy link
Member Author

lalitb commented Mar 7, 2022

With the above, we need to maintain such that RequestHandler object remains valid even when callback is called in a separate thread.

It's a good point. Probably we need to pass the ownership of ResultHandler from exporter to HttpClient, and let it clean up after calling. This would need changes in HTTP client library

Should we maintain two different export versions - both with and without callback?

Yes, we need to do that for backward compatibility. Though maintaining ABI compatibility would be tricky, as any external exporter (built against the old SDK) would no longer work with the new SDK, and would need recompilation.

Processor decides which version to call depending on configurable parameters.

True.

@DebajitDas
Copy link
Member

Thanks @lalitb for clarifying the doubts.
Now from the implementation perspective, we need to provide implementation for those exporters which uses http_client_curl internally, such as elastic_log_exporter, otlp_http_exporter, otlp_http_log_exporter and zipkin exporter. All other log/span exporter would have empty implementation in the callback version of export call?

@owent
Copy link
Member

owent commented Mar 8, 2022

After the new Export interface is done. I can modify #1209 to make otlp_http_exporter and otlp_http_log_exporter to use the new callback version Export call then.

@lalitb
Copy link
Member Author

lalitb commented Mar 8, 2022

Now from the implementation perspective, we need to provide implementation for those exporters which uses http_client_curl internally, such as elastic_log_exporter, otlp_http_exporter, otlp_http_log_exporter and zipkin exporter. All other log/span exporter would have empty implementation in the callback version of export call?

If it is possible to start with one of the exporters say Zipkin that would be good. Zipkin uses HTTP sync client as of now, so good use-case to convert that to use HTTP Async client, and see how it works. But if we are modifying the HTTPClient library ( say to transfer ownership of EventHandler, and manage callbacks), this may require changes in other exporters.

@lalitb
Copy link
Member Author

lalitb commented Mar 8, 2022

Agree. Can we also provide a internal event loop and let the exporters decide whether to use it or spawn a new thread to finish

Have created #1250 for supporting the event loop framework. Feel free to modify the issue, or add more data there.

@DebajitDas
Copy link
Member

As pointed out in [https://github.com/open-telemetry/opentelemetry-specification/issues/2434#issuecomment-1076997385]
I wanted to have the discussion around "max_async_export" which would be configurable parameter.

  1. What should be the default value of this variable?
  2. What should be the behaviour if the value is set as Zero (0) ? Since this holds no meaning, should this is re-set to default.
  3. What should be the max value allowed for this parameter?

@owent
Copy link
Member

owent commented Mar 30, 2022

As pointed out in [https://github.com/open-telemetry/opentelemetry-specification/issues/2434#issuecomment-1076997385] I wanted to have the discussion around "max_async_export" which would be configurable parameter.

  1. What should be the default value of this variable?
  2. What should be the behaviour if the value is set as Zero (0) ? Since this holds no meaning, should this is re-set to default.
  3. What should be the max value allowed for this parameter?

I use the variable name max_concurrent_requests which is mentioned in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlpgrpc-concurrent-requests in OtlpHttpExporter and OtlpHttpLogExporter now. But I didn't find the specification about the default value. Current default value ( 8 ) in OtlpHttpExporter and OtlpHttpLogExporter is just based on my local benchmark and may cost about 70% CPU time at most without droping data.It may take less cost after we use multi_handle to implement curl::HttpClient in the future.

What's your suggestion about the default value?

@DebajitDas
Copy link
Member

What's your suggestion about the default value?

Yes, I had that in mind about the default value used in max_concurrent_requests and to start with we can go with 8 as default value.

@github-actions
Copy link

This issue was marked as stale due to lack of activity. It will be closed in 7 days if no furthur activity occurs.

@owent
Copy link
Member

owent commented Jul 9, 2022

According to the latest https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md , Export() will never be called concurrently.
Could this issue be closed as #1413 is already merged?

@lalitb
Copy link
Member Author

lalitb commented Jul 11, 2022

#1413 fixes this.

@lalitb lalitb closed this as completed Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants