New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jaeger-v2] Add remotesampling
extension
#5389
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5389 +/- ##
===========================================
- Coverage 94.54% 41.12% -53.42%
===========================================
Files 346 171 -175
Lines 16947 9091 -7856
===========================================
- Hits 16022 3739 -12283
- Misses 724 5018 +4294
- Partials 201 334 +133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
I am thinking of this approach.:
That's how it works in the current Jaeger collector, but we are serving this API in the collector server. Here, we have to create a new one. suggestions |
One more question: How can I run hotrod with jaeger-v2 binary? |
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
We need a
We also need an
All of this should require very minimal implementation, primarily just wiring up the existing components already implemented in |
The existing component (from jaeger v1) are in blue: flowchart LR
Receiver --> AdaptiveSamplingProcessor --> BatchProcessor --> Exporter
Exporter -->|"(1) get storage"| JaegerStorageExension
Exporter -->|"(2) write trace"| TraceStorage
AdaptiveSamplingProcessor -->|"getStorage()"| StorageConfig
OTEL_SDK[OTEL
SDK]
OTEL_SDK -->|"(1) GET /sampling"| HTTP_endpoint
HTTP_endpoint -->|"(2) getStrategy()"| StrategiesProvider
style HTTP_endpoint fill:blue,color:white
subgraph Jaeger Collector
Receiver
BatchProcessor[Batch
Processor]
Exporter
TraceStorage[(Trace
Storage)]
AdaptiveSamplingProcessor[Adaptive
Sampling
Processor]
AdaptiveSamplingProcessorV1[Adaptive
Sampling
Processor_v1]
style AdaptiveSamplingProcessorV1 fill:blue,color:white
AdaptiveSamplingProcessor -->|"[]*model.Span"| AdaptiveSamplingProcessorV1
AdaptiveSamplingProcessorV1 ---|use| SamplingStorage
subgraph JaegerStorageExension[Jaeger Storage Exension]
Storage[[Storage
Config]]
end
subgraph RemoteSamplingExtension[Remote Sampling Extension]
StrategiesProvider -->|"(3b) getStrategy()"| AdaptiveProvider
StrategiesProvider -->|"(3a) getStrategy()"| FileProvider
FileProvider --> FileConfig
AdaptiveProvider --> StorageConfig
HTTP_endpoint[HTTP
endpoint]
StrategiesProvider[Strategies
Provider]
FileProvider[File
Provider]
AdaptiveProvider[Adaptive
Provider]
style StrategiesProvider fill:blue,color:white
style FileProvider fill:blue,color:white
style AdaptiveProvider fill:blue,color:white
subgraph Config
FileConfig[[File Config]]
StorageConfig[[Storage Config]]
end
StorageConfig --- SamplingStorage
SamplingStorage[(Sampling
Storage)]
style SamplingStorage fill:blue,color:white
end
end
|
remotesampling
extension
…acing#5382) This PR started as a dependabot upgrade of OTEL Collector dependencies, but the [tests were failing](https://github.com/jaegertracing/jaeger/actions/runs/8854720626/job/24318352528?pr=5382). The main change here is adding a ping from `e2eInitialize` to the jaeger-v2 binary to make sure that it started before proceeding with tests (ideally OTEL Collector should have a healthcheck endpoint that signals when all components have been started successfully). Also added some better logging. --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Yuri Shkuro <github@ysh.us> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yuri Shkuro <github@ysh.us>
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.15.10 to 0.15.11. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/anchore/sbom-action/releases">anchore/sbom-action's releases</a>.</em></p> <blockquote> <h2>v0.15.11</h2> <h2>Changes in v0.15.11</h2> <ul> <li>chore(deps): update Syft to v1.3.0 (<a href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>) [<a href="https://github.com/anchore-actions-token-generator">anchore-actions-token-generator</a>]</li> <li>chore: remove outdated snapshot workflow (<a href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>) [<a href="https://github.com/spiffcs">spiffcs</a>]</li> <li>fix: don't pass in a separate env. This makes it impossible to pass env vars via the action context to syft. (<a href="https://redirect.github.com/anchore/sbom-action/issues/455">#455</a>) [<a href="https://github.com/iNoahNothing">iNoahNothing</a>]</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/anchore/sbom-action/commit/7ccf588e3cf3cc2611714c2eeae48550fbc17552"><code>7ccf588</code></a> chore(deps): update Syft to v1.3.0 (<a href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>)</li> <li><a href="https://github.com/anchore/sbom-action/commit/7f33cf5b409c25dd63ce12b34f585feaed60b5bf"><code>7f33cf5</code></a> chore: remove outdated snapshot workflow (<a href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>)</li> <li><a href="https://github.com/anchore/sbom-action/commit/04a486a88617c017d26e11a4d30b3cd198f44824"><code>04a486a</code></a> fix: extend existing environment when invoking syft instead of creating a new...</li> <li>See full diff in <a href="https://github.com/anchore/sbom-action/compare/ab5d7b5f48981941c4c5d6bf33aeb98fe3bae38c...7ccf588e3cf3cc2611714c2eeae48550fbc17552">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=anchore/sbom-action&package-manager=github_actions&previous-version=0.15.10&new-version=0.15.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes - Use consistent naming for config files ## How was this change tested? - CI Signed-off-by: Yuri Shkuro <github@ysh.us>
…gertracing#5399) ## Which problem is this PR solving? - Part of jaegertracing#5334 ## Description of the changes - An API design of Storage V2 spanstore and its proto file. - For the detailed discussion and how we arrived to this design, please take a look at https://docs.google.com/document/d/1IvUcDsdRxMPK-xTUE32w3NSAGTkUcmnDQttN6ijUnWs/edit?usp=sharing ## How was this change tested? - This PR hasn't been tested yet since it only contains interfaces and no actual implementation to be tested. ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [ ] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: James Ryans <james.ryans2012@gmail.com>
…5401) Bumps google.golang.org/protobuf from 1.33.0 to 1.34.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=google.golang.org/protobuf&package-manager=go_modules&previous-version=1.33.0&new-version=1.34.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes - This is the first jaeger release where Jaeger UI changes are included in the release notes and where we have version parity between the two repos. Signed-off-by: Albert Teoh <albert@packsmith.io> Co-authored-by: Albert Teoh <albert@packsmith.io>
## Which problem is this PR solving? - part of jaegertracing#5345 ## Description of the changes - Added Purge method for ES/OS - optimized integration test for es/os storage ## How was this change tested? - `STORAGE=elasticsearch make storage-integration-test` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
## Which problem is this PR solving? - The cache was initialized with incorrect parameter, and when I looked around it wasn't even used anywhere ## Description of the changes - delete indexCache and indexCacheTTL argument ## How was this change tested? ``` $ go test ./plugin/storage/es/spanstore ok github.com/jaegertracing/jaeger/plugin/storage/es/spanstore 0.471s ``` Signed-off-by: Yuri Shkuro <github@ysh.us>
…racing#5410) ## Which problem is this PR solving? - Resolves jaegertracing#5350 ## Description of the changes - Added the logic same in the docker-image CI to build linux/amd64 architecture only ## How was this change tested? - Not yet ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` Signed-off-by: Vamshi Maskuri <gwcchintu@gmail.com>
## Which problem is this PR solving? - Objective is to cut down on the number of CI failures that are often due to codecov uploads failing - Resolves jaegertracing#5173 - Supersedes and closes jaegertracing#5184 ## Description of the changes - Add a new helper action which encapsulates defining a CODECOV token and uploading with retries ## How was this change tested? - CI --------- Signed-off-by: Yuri Shkuro <github@ysh.us>
…ertracing#5345) ## Which problem is this PR solving? - part of jaegertracing#5254 ## Description of the changes - Utilizing existing `StorageIntegration` to test the jaeger-v2 OTel Collector and gRPC storage backend with the provided config file at `cmd/jaeger/config-elasticsearch.yaml`. ## How was this change tested? - Start a elasticsearch or opensearch docker instance. - Run `STORAGE=elasticsearch SPAN_STORAGE_TYPE=elasticsearch make jaeger-v2-storage-integration-test` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
## Which problem is this PR solving? - jaegertracing#5398 (comment) ## Description of the changes - added purge for method for cassandra ## How was this change tested? - via integration tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com> Signed-off-by: Yuri Shkuro <github@ysh.us> Co-authored-by: Yuri Shkuro <github@ysh.us>
## Which problem is this PR solving? - Previous PR did not include markup and the diagram is not rendering correctly ## Description of the changes - Add markup Signed-off-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
i haven;t pushed latest changes yet. |
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, in the right direction
cmd/jaeger/config-badger.yaml
Outdated
static: | ||
file: ./cmd/jaeger/sampling-strategies.json | ||
reload_interval: 1s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static: | |
file: ./cmd/jaeger/sampling-strategies.json | |
reload_interval: 1s | |
file: | |
path: ./cmd/jaeger/sampling-strategies.json | |
reload_interval: 1s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need reload_interval: 1s
? The current implementation reloads the file on changes, not on timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With its reload_interval: 1s
, it does both; it reloads after every 1s
+ on file change.
You are right; reloading only when the file changes is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think current implementation also supports reload_interval
: see plugin/sampling/strategystore/static/strategy_store.go
: L90
cmd/jaeger/config-badger.yaml
Outdated
static: | ||
file: ./cmd/jaeger/sampling-strategies.json | ||
reload_interval: 1s | ||
adaptive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File & adaptive sections do not make sense simultaneously, it's either one or another. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I already added a Conditional Check, which throws an error if both are provided.
Since these are production configs, I've included both options to make sure that the user is aware of their availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a comment in the configuration to make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping them both means the config file is invalid. Config files are supposed to work without changes.
"github.com/jaegertracing/jaeger/pkg/recoveryhandler" | ||
) | ||
|
||
func StartHttpServer(ss strategystore.StrategyStore, logger *zap.Logger, port string) (*http.Server, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure OTEL Collector does not already have a helper function for this? They do start various HTTP servers. I would check OTLP/HTTP receiver, z-pages extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not able to find any helper function.
In both OTLP/HTTP receiver, z-pages extension, a new http is initiated using http.NewServeMux()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, well if OTEL does not have a helper function, we certainly do have it in jaeger-v1, so let's not duplicate (refactor into public helper if needed)
also, they do have helper that creates HTTP server from standard HTTP config, which presumably deals with TLS and things like that: https://github.com/open-telemetry/opentelemetry-collector/blob/c8f9563f1b12470cb74f2f98c226faaffbcb38ac/receiver/otlpreceiver/otlp.go#L148
cmd/jaeger/internal/extension/remotesampling/testdata/strategy.json
Outdated
Show resolved
Hide resolved
ReloadInterval time.Duration `mapstructure:"reload_interval"` | ||
} | ||
|
||
type AdaptiveConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the parameters here are actually only needed for the adaptive sampling processor, which should run inside span processor, not inside the extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing it, but it makes the code more complicated. We call NewStrategyStore
in the extension, which requires all options. However, in the adaptive sampling processor, we only need these options: AggregationBuckets
and CalculationInterval
to create a new aggregator that calculates probability and stores it.
But if we still need to pass this configuration to the processor, we must find a way to transport the configuration from processor to the remote sampling extension internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I cannot find any way to transport the configuration internally from the processor to the extension.
cc @james-ryans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong, most params are indeed used by adaptive.Processor
, which is confusing since it's very different in purpose from the new adaptivesampling
processor that you added. I think it's actually a poor design choice in the old code, because what that Processor
does is it periodically loads the throughput numbers, calculates new probabilities and stores them. Then the "strategy store" part, which is really needed for the HTTP endpoint, only reads the probabilities and constructs sampling strategies. We can see the evidence here - starting two goroutines at once is a smell, breaking single responsibility principle:
jaeger/plugin/sampling/strategystore/adaptive/processor.go
Lines 174 to 175 in 5ddcaa9
p.runBackground(p.runCalculationLoop) | |
p.runBackground(p.runUpdateProbabilitiesLoop) |
Arguably a better design would've been for the Processor part to be co-located with the Aggregator (and run by adaptivesamping
processor), and only retrieval/serving logic would've been in the remotesampling
extension.
Think about it. Maybe we can do a bit of preemptive refactoring of the adaptive.Processor
class:
- keep the part that runs in the background and updates storage with probabilities
- move the loadProbabilities/runUpdateProbabilitiesLoop part into strategy_store.go (as a separate class).
But one thing to be careful about is if we do this we want to ensure that these config options are cleanly split into two disjoined sets. Otherwise users would have to match the options across two OTEL components, something I would prefer to avoid. Right now they only share the storage name (if you follow my suggestion in another comment about maxBuckets
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm creating a new PR to refactor v1
sampling.
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
return fmt.Errorf("failed to create the local file strategy store: %w", err) | ||
} | ||
|
||
server, err := internal.StartHttpServer(ss, j.telemetry.Logger, j.cfg.Port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting http server is not part of file/adaptive choice, it's always there
return nil | ||
} | ||
|
||
if j.cfg.Adaptive.StrategyStore != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these if()s are getting very busy, I recommend extracting their content into helper functions
|
||
const timeDuration = time.Minute | ||
|
||
// hard-coded : maxBuckets:int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add these parameters to configs where they must come from, it will make it easier to see how the configs are related
} | ||
|
||
func (tp *traceProcessor) start(_ context.Context, host component.Host) error { | ||
ss, err := remotesampling.GetSamplingStorageFactory(tp.config.StrategyStore, host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're only using ss
to call ss.CreateSamplingStore(maxBuckets)
. Do you actually need remotesampling.GetSamplingStorageFactory
, or perhaps only remotesampling.GetSamplingStorage
(without maxBuckets argument)? This way you would not need to repeat maxBuckets
config in the processor, only in the remotesampling extension
} | ||
|
||
func (tp *traceProcessor) close(context.Context) error { | ||
// Close Aggregator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Close Aggregator |
samplerType, samplerParam := span.GetSamplerParams(tp.logger) | ||
tp.aggregator.RecordThroughput(span.Process.ServiceName, span.OperationName, samplerType, samplerParam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're missing a bunch of critical business logic
func handleRootSpan(aggregator strategystore.Aggregator, logger *zap.Logger) ProcessSpan { |
I would suggest refactoring that logic into a helper method in aggregator.go like RecordThroughput(agg, span)
return err | ||
} | ||
|
||
// hard-coded : timeDuration:time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the accurate name is aggregationInterval
- it has to be a parameter of the processor/config
"github.com/jaegertracing/jaeger/pkg/recoveryhandler" | ||
) | ||
|
||
func StartHttpServer(ss strategystore.StrategyStore, logger *zap.Logger, port string) (*http.Server, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, well if OTEL does not have a helper function, we certainly do have it in jaeger-v1, so let's not duplicate (refactor into public helper if needed)
also, they do have helper that creates HTTP server from standard HTTP config, which presumably deals with TLS and things like that: https://github.com/open-telemetry/opentelemetry-collector/blob/c8f9563f1b12470cb74f2f98c226faaffbcb38ac/receiver/otlpreceiver/otlp.go#L148
ReloadInterval time.Duration `mapstructure:"reload_interval"` | ||
} | ||
|
||
type AdaptiveConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong, most params are indeed used by adaptive.Processor
, which is confusing since it's very different in purpose from the new adaptivesampling
processor that you added. I think it's actually a poor design choice in the old code, because what that Processor
does is it periodically loads the throughput numbers, calculates new probabilities and stores them. Then the "strategy store" part, which is really needed for the HTTP endpoint, only reads the probabilities and constructs sampling strategies. We can see the evidence here - starting two goroutines at once is a smell, breaking single responsibility principle:
jaeger/plugin/sampling/strategystore/adaptive/processor.go
Lines 174 to 175 in 5ddcaa9
p.runBackground(p.runCalculationLoop) | |
p.runBackground(p.runUpdateProbabilitiesLoop) |
Arguably a better design would've been for the Processor part to be co-located with the Aggregator (and run by adaptivesamping
processor), and only retrieval/serving logic would've been in the remotesampling
extension.
Think about it. Maybe we can do a bit of preemptive refactoring of the adaptive.Processor
class:
- keep the part that runs in the background and updates storage with probabilities
- move the loadProbabilities/runUpdateProbabilitiesLoop part into strategy_store.go (as a separate class).
But one thing to be careful about is if we do this we want to ensure that these config options are cleanly split into two disjoined sets. Otherwise users would have to match the options across two OTEL components, something I would prefer to avoid. Right now they only share the storage name (if you follow my suggestion in another comment about maxBuckets
).
type Config struct { | ||
File FileConfig `mapstructure:"file"` | ||
Adaptive AdaptiveConfig `mapstructure:"adaptive"` | ||
Port string `mapstructure:"port"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary? It needs to be full blown server config, e.g. go.opentelemetry.io/collector/config/confighttp.ServerConfig
Description of the changes
How was this change tested?
go run -tags=ui ./cmd/jaeger --config ./cmd/jaeger/config-badger.yaml
make test
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test