Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Context holds SynchronizedSpan directly, not via HashMap #1268

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

shaun-cox
Copy link
Contributor

@shaun-cox shaun-cox commented Sep 13, 2023

Changes

Add the span field below:

pub struct Context {
    pub(super) span: Option<Arc<SynchronizedSpan>>,
    entries: HashMap<TypeId, Arc<dyn Any + Sync + Send>, BuildHasherDefault<IdHasher>>,
}

so that current span manipulation doesn't have to traverse the expensive HashMap and then get dispatched with virtual function calls.

Further, this keeps the HashMap empty for the common case of just having a span in context and no other types (like baggage).

I get that the opentelemetry specification is trying to be general with its view that Context is extensible and can hold anything, but there comes a real performance penalty for handling everything through that abstraction.

Results of cargo bench -p opentelemetry_sdk --bench context:

context/has_active_span/in-cx/alt
                        time:   [3.5772 ns 3.5806 ns 3.5842 ns]
                        thrpt:  [279.01 Melem/s 279.28 Melem/s 279.55 Melem/s]
                 change:
                        time:   [-47.017% -46.850% -46.708%] (p = 0.00 < 0.05)
                        thrpt:  [+87.646% +88.145% +88.739%]
                        Performance has improved.
context/is_sampled/in-cx/alt
                        time:   [5.0980 ns 5.1075 ns 5.1170 ns]
                        thrpt:  [195.43 Melem/s 195.79 Melem/s 196.15 Melem/s]
                 change:
                        time:   [-41.439% -41.320% -41.208%] (p = 0.00 < 0.05)
                        thrpt:  [+70.091% +70.414% +70.762%]
                        Performance has improved.
context/is_recording/in-cx/alt
                        time:   [5.3604 ns 5.3951 ns 5.4407 ns]
                        thrpt:  [183.80 Melem/s 185.35 Melem/s 186.55 Melem/s]
                 change:
                        time:   [-41.657% -41.417% -41.156%] (p = 0.00 < 0.05)
                        thrpt:  [+69.940% +70.698% +71.399%]
                        Performance has improved.
context/has_active_span/in-cx/spec
                        time:   [12.958 ns 12.988 ns 13.022 ns]
                        thrpt:  [76.791 Melem/s 76.995 Melem/s 77.174 Melem/s]
                 change:
                        time:   [-63.146% -63.063% -62.980%] (p = 0.00 < 0.05)
                        thrpt:  [+170.12% +170.73% +171.34%]
                        Performance has improved.
context/is_sampled/in-cx/spec
                        time:   [12.826 ns 12.843 ns 12.859 ns]
                        thrpt:  [77.764 Melem/s 77.865 Melem/s 77.967 Melem/s]
                 change:
                        time:   [-64.729% -64.672% -64.614%] (p = 0.00 < 0.05)
                        thrpt:  [+182.60% +183.06% +183.52%]
                        Performance has improved.
context/is_recording/in-cx/spec
                        time:   [13.718 ns 13.738 ns 13.756 ns]
                        thrpt:  [72.693 Melem/s 72.791 Melem/s 72.895 Melem/s]
                 change:
                        time:   [-62.407% -62.337% -62.271%] (p = 0.00 < 0.05)
                        thrpt:  [+165.05% +165.51% +166.01%]
                        Performance has improved.

Results of taskset -c 2,4 cargo bench -p opentelemetry-contrib --features="api" --bench new_span:

new_span/if_parent_sampled/in-cx/alt
                        time:   [433.95 ns 434.37 ns 434.78 ns]
                        thrpt:  [2.3000 Melem/s 2.3022 Melem/s 2.3044 Melem/s]
                 change:
                        time:   [-14.018% -13.813% -13.610%] (p = 0.00 < 0.05)
                        thrpt:  [+15.754% +16.026% +16.303%]
                        Performance has improved.
new_span/if_recording/in-cx/alt
                        time:   [9.5467 ns 9.5592 ns 9.5732 ns]
                        thrpt:  [104.46 Melem/s 104.61 Melem/s 104.75 Melem/s]
                 change:
                        time:   [-35.364% -35.131% -34.908%] (p = 0.00 < 0.05)
                        thrpt:  [+53.629% +54.157% +54.712%]
                        Performance has improved.
new_span/if_parent_sampled/in-cx/spec
                        time:   [414.49 ns 415.72 ns 417.35 ns]
                        thrpt:  [2.3961 Melem/s 2.4055 Melem/s 2.4126 Melem/s]
                 change:
                        time:   [-28.958% -25.772% -22.597%] (p = 0.00 < 0.05)
                        thrpt:  [+29.194% +34.720% +40.762%]
                        Performance has improved.
new_span/if_recording/in-cx/spec
                        time:   [15.837 ns 15.855 ns 15.877 ns]
                        thrpt:  [62.986 Melem/s 63.071 Melem/s 63.144 Melem/s]
                 change:
                        time:   [-58.590% -58.485% -58.383%] (p = 0.00 < 0.05)
                        thrpt:  [+140.29% +140.88% +141.49%]
                        Performance has improved.
new_span/if_parent_sampled/no-cx/alt
                        time:   [8.9579 ns 8.9766 ns 8.9966 ns]
                        thrpt:  [111.15 Melem/s 111.40 Melem/s 111.63 Melem/s]
                 change:
                        time:   [-13.891% -13.651% -13.425%] (p = 0.00 < 0.05)
                        thrpt:  [+15.507% +15.809% +16.131%]
                        Performance has improved.
new_span/if_recording/no-cx/alt
                        time:   [9.3771 ns 9.3938 ns 9.4134 ns]
                        thrpt:  [106.23 Melem/s 106.45 Melem/s 106.64 Melem/s]
                 change:
                        time:   [-15.247% -14.984% -14.692%] (p = 0.00 < 0.05)
                        thrpt:  [+17.222% +17.625% +17.990%]
                        Performance has improved.
new_span/if_parent_sampled/no-cx/spec
                        time:   [183.31 ns 183.63 ns 183.97 ns]
                        thrpt:  [5.4358 Melem/s 5.4458 Melem/s 5.4553 Melem/s]
                 change:
                        time:   [-9.4343% -9.0523% -8.7467%] (p = 0.00 < 0.05)
                        thrpt:  [+9.5851% +9.9533% +10.417%]
                        Performance has improved.
new_span/if_recording/no-cx/spec
                        time:   [11.159 ns 11.173 ns 11.189 ns]
                        thrpt:  [89.371 Melem/s 89.499 Melem/s 89.613 Melem/s]
                 change:
                        time:   [+8.9201% +9.2451% +9.5566%] (p = 0.00 < 0.05)
                        thrpt:  [-8.7230% -8.4627% -8.1896%]
                        Performance has regressed.
new_span/if_parent_sampled/no-sdk/alt
                        time:   [9.0222 ns 9.0478 ns 9.0743 ns]
                        thrpt:  [110.20 Melem/s 110.52 Melem/s 110.84 Melem/s]
                 change:
                        time:   [-12.113% -11.890% -11.642%] (p = 0.00 < 0.05)
                        thrpt:  [+13.175% +13.495% +13.783%]
                        Performance has improved.
new_span/if_recording/no-sdk/alt
                        time:   [9.3513 ns 9.3604 ns 9.3705 ns]
                        thrpt:  [106.72 Melem/s 106.83 Melem/s 106.94 Melem/s]
                 change:
                        time:   [-14.874% -14.721% -14.550%] (p = 0.00 < 0.05)
                        thrpt:  [+17.028% +17.262% +17.473%]
                        Performance has improved.
new_span/if_parent_sampled/no-sdk/spec
                        time:   [91.612 ns 91.722 ns 91.841 ns]
                        thrpt:  [10.888 Melem/s 10.902 Melem/s 10.916 Melem/s]
                 change:
                        time:   [-19.732% -19.562% -19.389%] (p = 0.00 < 0.05)
                        thrpt:  [+24.052% +24.319% +24.583%]
                        Performance has improved.
new_span/if_recording/no-sdk/spec
                        time:   [11.179 ns 11.195 ns 11.214 ns]
                        thrpt:  [89.175 Melem/s 89.324 Melem/s 89.456 Melem/s]
                 change:
                        time:   [+8.9043% +9.0850% +9.2879%] (p = 0.00 < 0.05)
                        thrpt:  [-8.4985% -8.3284% -8.1762%]
                        Performance has regressed.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes

@jtescher
Copy link
Member

Looks good, always nice to be able to do perf improvements without breaking api changes too 👍 what numbers are you seeing for the diff?

@cijothomas
Copy link
Member

TraceContext, Baggage, and some to fix this, would be common use-case for context. If we can make these 3 top-level fields instead of putting in map/dictionary, that'd be awesome.

@shaun-cox
Copy link
Contributor Author

Looks good, always nice to be able to do perf improvements without breaking api changes too 👍 what numbers are you seeing for the diff?

From 70% to 170% speedup in checking context for active span, or is sampled or is recording.
Up to 58% reduction in conditional span creation time (140% speedup) when creating a child span only if currently recording. Pasted results in the PR comment above.

@shaun-cox shaun-cox changed the title Context holds SynchronizedSpan directly, not via HashMap Performance: Context holds SynchronizedSpan directly, not via HashMap Sep 13, 2023
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage is 100.0% of modified lines.

Files Changed Coverage
opentelemetry/src/trace/mod.rs ø
opentelemetry/src/context.rs 100.0%
opentelemetry/src/trace/context.rs 100.0%

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this would be a bit easier to review if you squashed the changes for the span field into a single commit and had a separate commit that takes care of the Span imports.

What's the rationale for moving from per-module to per-crate imports? Is that converging on a common style or diverging from it?

opentelemetry/src/context.rs Outdated Show resolved Hide resolved
@shaun-cox
Copy link
Contributor Author

What's the rationale for moving from per-module to per-crate imports? Is that converging on a common style or diverging from it?

I thought I was moving to the common style, as its what's offered by auto-complete/rust-analyzer when adding new imports. I can revert if I didn't understand the current style tho...

@shaun-cox shaun-cox force-pushed the context_field_for_span branch 2 times, most recently from f6b3031 to e381565 Compare September 14, 2023 12:41
@shaun-cox
Copy link
Contributor Author

@djc, should be easier to review now. (Sorry, I thought I had squashed my two commits previously, but didn't.)

@shaun-cox shaun-cox requested a review from djc September 14, 2023 15:13
- lookup on TypeId and downcasting isn't necesary
- Context::with_value and current_with_value are more efficient
  as they no longer clone and overwrite the entry in the map
  which represens the current span
@TommyCpp TommyCpp merged commit ab9972f into open-telemetry:main Sep 21, 2023
12 checks passed
@shaun-cox shaun-cox deleted the context_field_for_span branch September 21, 2023 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants