Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(logs): make logger shutdown &self #1643

Merged
merged 11 commits into from
Apr 29, 2024

Conversation

TommyCpp
Copy link
Contributor

@TommyCpp TommyCpp commented Mar 26, 2024

This PR aim to address the issue brought up in #1625 (comment)

In summary we need to:

  • Ensure logger provider shut down will block creating new (functional) logger.
  • Ensure no log can be emitted from logger once the logger provider shutdown.

We discussed the solution of keeping Weak reference from LoggerProvider to Loggers but as I implemented this solution it seems to complicated. I revisited why we need mutable reference during LoggerProvider and found that LogProcessor shutdown doesn't have to take a mutable reference.

Changes

  • make LogProcessor shutdown taking &self instead of &mut self.

    • This decouple the shutdown from drop. If one LogProcessor is shared across multiple thread, any thread can call shutdown to stop the LogProcessor from emitting more logs. But this doesn't mean the LogProcessor will drop.
    • However, drop will call shutdown
    • This implementation stops the log emitting after shutdown in BatchLogProcessor. Emitting new logs after shutdown will result in a LogError saying the receiver on the worker task has already closed.
    • for SimpleLogProcessor we need a field to mark if the processor has been shutdown we also need to check it everytime before emitting the logs.
  • Add a field in LoggerProvider to mark if the logger provider has shutdown. If it has, return a noop logger

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 85.96491% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 70.0%. Comparing base (f203b03) to head (4e2eacc).
Report is 15 commits behind head on main.

❗ Current head 4e2eacc differs from pull request most recent head 5bb6471. Consider uploading reports for the commit 5bb6471 to get more accurate results

Files Patch % Lines
opentelemetry-sdk/src/logs/log_emitter.rs 80.2% 16 Missing ⚠️
opentelemetry/src/logs/record.rs 0.0% 5 Missing ⚠️
opentelemetry/src/logs/noop.rs 0.0% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1643     +/-   ##
=======================================
+ Coverage   69.3%   70.0%   +0.7%     
=======================================
  Files        136     136             
  Lines      19637   20029    +392     
=======================================
+ Hits       13610   14028    +418     
+ Misses      6027    6001     -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lalitb
Copy link
Member

lalitb commented Mar 27, 2024

This seems like a much more streamlined approach. In particular, the LoggerProvider (and so the LogProcessors) can be shut down without needing to wait for all loggers to be dropped.

[TODO] for SimpleLogProcessor we need a field to mark if the processor has been shutdown we also need to check it everytime before emitting the logs.

I believe it should be acceptable if the check is performed atomically. This isn't related to the current PR, but just to reiterate so I don't forget, this check would also need to be added in ReentrantLogProcessor for ETW and user_events exporter.

@lalitb
Copy link
Member

lalitb commented Mar 28, 2024

@TommyCpp Do you plan to make it ready for review? The changes look good to me.

@TommyCpp
Copy link
Contributor Author

Sorry busy week. Will get it done this weekend

@TommyCpp
Copy link
Contributor Author

this check would also need to be added in ReentrantLogProcessor for ETW and user_events exporter.

I think we should also make it clear that LogProcessor needs to make sure no new logs get processed after shutdown

@TommyCpp TommyCpp marked this pull request as ready for review April 1, 2024 00:46
@TommyCpp TommyCpp requested a review from a team as a code owner April 1, 2024 00:46
}

/// Attempts to shutdown this `LoggerProvider`, succeeding only when
/// all cloned `LoggerProvider` values have been dropped.
// todo: remove this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - If we are keeping try_shutdown() for backward compatibility, good to update the existing comments, as now it doesn't just attempt, but really shutdown the LoggerProvider.

opentelemetry-sdk/src/logs/log_processor.rs Show resolved Hide resolved
@TommyCpp TommyCpp force-pushed the shutdown_channel branch 2 times, most recently from 1b45663 to 52757f0 Compare April 7, 2024 21:27
@TommyCpp
Copy link
Contributor Author

TommyCpp commented Apr 7, 2024

Simple log processor took some hit as expected but it's acceptable IMHO

simple-log/no-context   time:   [109.01 ns 109.09 ns 109.20 ns]
                        change: [-2.4832% -2.3350% -2.1759%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
simple-log/with-context time:   [109.56 ns 109.62 ns 109.69 ns]
                        change: [+2.0309% +2.1353% +2.2391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

simple-log-with-int/no-context
                        time:   [146.61 ns 146.71 ns 146.81 ns]
                        change: [-3.8572% -3.6060% -3.4182%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
simple-log-with-int/with-context
                        time:   [146.41 ns 146.55 ns 146.70 ns]
                        change: [+2.7780% +2.9471% +3.1086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

simple-log-with-double/no-context
                        time:   [144.09 ns 144.30 ns 144.49 ns]
                        change: [-2.8585% -2.7094% -2.5562%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
simple-log-with-double/with-context
                        time:   [143.65 ns 143.75 ns 143.85 ns]
                        change: [-0.4290% -0.2448% -0.0662%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

simple-log-with-string/no-context
                        time:   [138.76 ns 138.87 ns 139.00 ns]
                        change: [-7.9410% -7.7945% -7.6528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild
simple-log-with-string/with-context
                        time:   [147.51 ns 147.61 ns 147.74 ns]
                        change: [+2.4718% +2.6915% +2.8987%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

simple-log-with-bool/no-context
                        time:   [144.49 ns 144.57 ns 144.64 ns]
                        change: [-1.3301% -1.2371% -1.1458%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
simple-log-with-bool/with-context
                        time:   [147.11 ns 147.26 ns 147.41 ns]
                        change: [+2.9351% +3.1008% +3.2648%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

simple-log-with-bytes/no-context
                        time:   [167.25 ns 167.39 ns 167.53 ns]
                        change: [+3.4868% +3.5855% +3.6920%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
simple-log-with-bytes/with-context
                        time:   [164.25 ns 164.52 ns 164.77 ns]
                        change: [-3.4532% -3.2357% -3.0189%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

simple-log-with-a-lot-of-bytes/no-context
                        time:   [158.78 ns 158.94 ns 159.09 ns]
                        change: [-6.1546% -5.9827% -5.7952%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
simple-log-with-a-lot-of-bytes/with-context
                        time:   [170.54 ns 171.53 ns 173.46 ns]
                        change: [+3.6391% +4.1090% +5.1313%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

simple-log-with-vec-any-value/no-context
                        time:   [203.13 ns 203.24 ns 203.36 ns]
                        change: [-6.0749% -5.9717% -5.8644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
simple-log-with-vec-any-value/with-context
                        time:   [211.36 ns 211.71 ns 212.08 ns]
                        change: [-1.1547% -0.9804% -0.8161%] (p = 0.00 < 0.05)
                        Change within noise threshold.

simple-log-with-inner-vec-any-value/no-context
                        time:   [276.62 ns 276.81 ns 277.00 ns]
                        change: [+0.8487% +0.9492% +1.0494%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
simple-log-with-inner-vec-any-value/with-context
                        time:   [274.47 ns 274.97 ns 275.65 ns]
                        change: [-1.1943% -0.8690% -0.5281%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

simple-log-with-map-any-value/no-context
                        time:   [233.37 ns 233.47 ns 233.59 ns]
                        change: [-2.3170% -2.1937% -2.0710%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
simple-log-with-map-any-value/with-context
                        time:   [233.39 ns 233.55 ns 233.71 ns]
                        change: [+0.4716% +0.7618% +1.0012%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

simple-log-with-inner-map-any-value/no-context
                        time:   [345.49 ns 345.65 ns 345.80 ns]
                        change: [+4.0620% +4.1413% +4.2130%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
simple-log-with-inner-map-any-value/with-context
                        time:   [348.78 ns 349.29 ns 350.15 ns]
                        change: [+5.3044% +5.5604% +5.8275%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

long-log/no-context     time:   [107.13 ns 107.18 ns 107.24 ns]
                        change: [+0.4446% +0.5142% +0.5924%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
long-log/with-context   time:   [109.07 ns 109.29 ns 109.69 ns]
                        change: [+1.3800% +1.5142% +1.6832%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

full-log/no-context     time:   [109.70 ns 109.79 ns 109.89 ns]
                        change: [+3.4539% +3.5922% +3.7297%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
full-log/with-context   time:   [111.41 ns 111.47 ns 111.55 ns]
                        change: [+0.1869% +0.2796% +0.3707%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

full-log-with-4-attributes/no-context
                        time:   [246.83 ns 246.95 ns 247.08 ns]
                        change: [-0.3164% -0.0559% +0.1822%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
full-log-with-4-attributes/with-context
                        time:   [249.19 ns 249.38 ns 249.59 ns]
                        change: [-4.3881% -4.1521% -3.9539%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

full-log-with-9-attributes/no-context
                        time:   [442.82 ns 443.27 ns 443.83 ns]
                        change: [-3.3624% -3.0116% -2.6846%] (p = 0.00 < 0.05)
                        Performance has improved.
full-log-with-9-attributes/with-context
                        time:   [454.05 ns 454.27 ns 454.51 ns]
                        change: [-1.8147% -1.7175% -1.6242%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

full-log-with-attributes/no-context
                        time:   [282.22 ns 282.36 ns 282.52 ns]
                        change: [-5.0554% -4.1746% -3.4991%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
full-log-with-attributes/with-context
                        time:   [292.55 ns 292.78 ns 293.09 ns]
                        change: [-2.2768% -1.9404% -1.6585%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@TommyCpp
Copy link
Contributor Author

@open-telemetry/rust-approvers can I get another review?

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not required for logging signal...
See https://github.com/open-telemetry/opentelemetry-rust/pull/1643/files#r1579840322

@@ -13,6 +13,7 @@
`ProcessResourceDetector` resource detectors, use the
[`opentelemetry-resource-detector`](https://crates.io/crates/opentelemetry-resource-detectors) instead.
- Baggage propagation error will be reported to global error handler [#1640](https://github.com/open-telemetry/opentelemetry-rust/pull/1640)
- Make `shutdown` method in `LoggerProvider` and `LogProcessor` taking immutable reference [#1643](https://github.com/open-telemetry/opentelemetry-rust/pull/1643)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets describe all the changes this PR is making:

  1. Loggers obtained after shutdown is now no-ops.
  2. Shutdown requires immutable ref for provider/processor.
  3. Simple and batch processors will ignore new logrecord after shutdown.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a suggestion to make changelog better to reflect the actuals.
Also please update the PR description to reflect the current state, for easy paper-trail in the future.

@cijothomas cijothomas requested a review from lalitb April 26, 2024 15:09
Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@cijothomas cijothomas merged commit 0ba4cbd into open-telemetry:main Apr 29, 2024
15 checks passed
@Expyron Expyron mentioned this pull request May 30, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants