Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMQX-10534 integrate otel log handler #11921

Conversation

SergeTupchiy
Copy link
Contributor

@SergeTupchiy SergeTupchiy commented Nov 9, 2023

Closes: https://emqx.atlassian.net/browse/EMQX-10534

Documentation follow-up ticket: https://emqx.atlassian.net/browse/EMQX-11385
Tests follow-up ticket: https://emqx.atlassian.net/browse/EMQX-11432

Summary

馃 Generated by Copilot at 0cfb424

This pull request adds support for external log handlers using the OpenTelemetry protocol in EMQ X. It updates the dependencies, the configuration schema, and the application version in mix.exs and emqx_opentelemetry.app.src. It adds a new behaviour emqx_external_log_handler and a new module emqx_otel_log_handler that implements it. It modifies emqx_config_logger to enable the external log handlers. It also adds a new schema otel_log_handler and otel_exporter to define the configuration options for the OpenTelemetry log handler and exporter.

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

  • Added tests for the changes
  • Added property-based tests for code which performs user input validation
  • Changed lines covered in coverage report
  • Change log has been added to changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md files
  • For internal contributor: there is a jira ticket to track this change
  • Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
  • Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

  • If changed package build workflow, pass this action (manual trigger)
  • Change log has been added to changes/ dir for user-facing artifacts update

@SergeTupchiy SergeTupchiy changed the base branch from master to release-54 November 9, 2023 18:54
@SergeTupchiy SergeTupchiy changed the title Emqx 10534 integrate otel log handler EMQX-10534 integrate otel log handler Nov 9, 2023
@SergeTupchiy SergeTupchiy force-pushed the EMQX-10534-integrate-otel-log-handler branch 2 times, most recently from 8f33482 to 24a327a Compare November 13, 2023 15:28
lists:flatmap(fun(M) -> M:tr_handlers(Conf) end, tr_modules()).

tr_modules() ->
[emqx_otel_log_handler].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be better decoupled from the main emqx app, but I tried to keep it simple for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we install the handler at runtime? i.e. in emqx_opentelemetry application

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmstone, yes, but it would mean that before emqx_opentelemetry app is started, logs won't be sent to the external otel collector/backend. Unless we start emqx_opentelemetry earlier than any other app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, current otel_log_handler implementation anyway doesn't allow to start a handler at early stage (when kernel logger starts). I think we have two options for now:

  • change otel_log_handler to start under kernel's logger_sup, start accepting log event writes to the table but postpone 'full init' before kernel is fully started.
    Advantages: the handler can work similarly to native log handlers with no log event losses.
    Drawbacks: - have to rely on some kernel internals (logger_sup),
    - more complicated log_handler design is required
  • keep otel_log_handler as is and accept the fact that it can't be initialized at early boot stage, meaning some logs will always be missing (probably up to the point when emqx_conf app initiates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: my previous comment is not absolutely correct. There was actually a deadlock trying to start ssl app (opentelemetry_exporter dependency) in adding_handler/1 logger cb. I'm fixing that now in opentelemetry lib, so starting a handler under kernel logger_sup shouldn't be required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's so critical to have the boot logs sent to otel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved everything to emqx_opentelemetry app

@SergeTupchiy SergeTupchiy force-pushed the EMQX-10534-integrate-otel-log-handler branch from 24a327a to 4b8e9ab Compare November 13, 2023 16:23
@SergeTupchiy SergeTupchiy marked this pull request as ready for review November 13, 2023 17:17
@SergeTupchiy SergeTupchiy requested review from lafirest and a team as code owners November 13, 2023 17:17
@@ -29,18 +29,18 @@ roots() -> ["opentelemetry"].

fields("opentelemetry") ->
[
{exporter,
{"exporter",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep using atoms for filed names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why is it better to use atoms for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to atoms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for missing this reply.

I think we should try to use atoms for field names everywhere so we are one step closer to avoid generating atoms.

@SergeTupchiy SergeTupchiy force-pushed the EMQX-10534-integrate-otel-log-handler branch from 4b8e9ab to f3d687c Compare November 17, 2023 14:31
@SergeTupchiy
Copy link
Contributor Author

NOTE:
Thanks to upgrade_raw_conf/1, the new feature is backward compatible with cluster.hocon/emqx.conf that define metrics config under opentelmetry root key.
However, it requires Dashboard UI update to conform with the new schema.

exporter.label: "Exporter"

max_queue_size.desc:
"""The maximum queue size. After the size is reached Open Telemetry signals are dropped."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What is the strategy for dropping the signals? Does it drop the new signals until the queue is drained, or discards the queued data as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It discards new signals. No need to discard the queued data, since olp mechanism generally ensures that the 'queue' stops growing once the limit is reached. The 'queue' is actually not a real queue data structure, but an ETS table.
We keep using 'max_queue_size' because it's a term used in OpenTelemetry specification and our ETS -based implementation doesn't violate the spec.
Each insert to ETS table is done after decrementing and getting the value of the atomic counter. Once counter is decremented to 0 (it starts with max_queue_size), the new events are drooped and the processor (gen_statem process) is notified to start a next exporting cycle.
The actual implementation is slightly more complex since there are two ETS tables and they are reversed periodically, so that when the processor starts exporting, another table becomes available for inserts.
https://github.com/emqx/opentelemetry-erlang/blob/main/apps/opentelemetry/src/otel_batch_olp.erl#L178

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Perhaps it's worth mentioning in the documentation?

rel/i18n/emqx_otel_schema.hocon Outdated Show resolved Hide resolved
apps/emqx_opentelemetry/src/emqx_otel_schema.erl Outdated Show resolved Hide resolved
?HOCON(
?R_REF(emqx_schema, "ssl_client_opts"),
#{
default => #{<<"enable">> => false},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems ssl enable/disable is derived from the endpoint url scheme.
maybe create a new type here and call emqx_schema:common_ssl_opts_schema (export it) for the fields.

i.e. no need for enable field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed slightly differently, following similar cases in other schemas, e.g.: https://github.com/emqx/emqx/blob/master/apps/emqx_ldap/src/emqx_ldap.erl#L100

@SergeTupchiy SergeTupchiy force-pushed the EMQX-10534-integrate-otel-log-handler branch from f3d687c to d9f95cd Compare November 22, 2023 11:59
@SergeTupchiy SergeTupchiy merged commit 275cf05 into emqx:release-54 Nov 22, 2023
162 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants