Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Add remotesampling extension #5389

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

Pushkarm029
Copy link
Member

@Pushkarm029 Pushkarm029 commented Apr 25, 2024

Description of the changes

  • adds static sampling support for otel-based jaeger-v2.
  • supports hot reload
  • added unit tests

How was this change tested?

  • go run -tags=ui ./cmd/jaeger --config ./cmd/jaeger/config-badger.yaml
  • make test

Checklist

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
@Pushkarm029 Pushkarm029 requested a review from a team as a code owner April 25, 2024 17:11
Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.12%. Comparing base (13cbaed) to head (076eb58).
Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #5389       +/-   ##
===========================================
- Coverage   94.54%   41.12%   -53.42%     
===========================================
  Files         346      171      -175     
  Lines       16947     9091     -7856     
===========================================
- Hits        16022     3739    -12283     
- Misses        724     5018     +4294     
- Partials      201      334      +133     
Flag Coverage Δ
badger_v1 10.28% <ø> (-0.01%) ⬇️
badger_v2 ?
cassandra-3.x ?
cassandra-3.x-v1 18.16% <ø> (?)
cassandra-3.x-v2 6.11% <ø> (?)
cassandra-4.x ?
cassandra-4.x-v1 18.16% <ø> (?)
cassandra-4.x-v2 6.14% <ø> (?)
elasticsearch-5.x 5.73% <ø> (-0.18%) ⬇️
elasticsearch-6.x 5.73% <ø> (-0.17%) ⬇️
elasticsearch-7.x 5.74% <ø> (-0.16%) ⬇️
elasticsearch-8.x 5.74% <ø> (-0.16%) ⬇️
grpc_v1 12.59% <ø> (-0.04%) ⬇️
grpc_v2 ?
kafka 9.99% <ø> (+0.02%) ⬆️
opensearch-1.x 5.73% <ø> (?)
opensearch-2.x 5.74% <ø> (-0.16%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
@Pushkarm029
Copy link
Member Author

I am thinking of this approach.:

  • create a gRPC sampling server
  • then, register api_v2.RegisterSamplingManagerServer(server, sampling.NewGRPCHandler(SamplingStore))
  • similarly, create an HTTP server and register a sampling API.

That's how it works in the current Jaeger collector, but we are serving this API in the collector server. Here, we have to create a new one.

suggestions

@Pushkarm029
Copy link
Member Author

One more question:

How can I run hotrod with jaeger-v2 binary?
I tried this, but Jaeger UI was not able to detect Hotrod service.
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 go run ./main.go all
go run -tags=ui ./cmd/jaeger --config ./cmd/jaeger/config-elasticsearch.yaml

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
@yurishkuro
Copy link
Member

We need a remotesampling extension which performs the following duties:

  • creates HTTP server that returns sampling strategies as JSON
  • allows user a choice of a sampling strategy store
    • file: a static file with auto-reload
    • adaptive: a backend storage that implements interfaces in sampling/strategystore package

We also need an adaptivesampling processor that

  • uses remotesampling to obtain storage backend (similar how jaegerexporter uses jaegerextension to get storage)
  • process spans and updates strategies in that storage (which then will be read by the extension's server)

All of this should require very minimal implementation, primarily just wiring up the existing components already implemented in app/collector/main

@yurishkuro
Copy link
Member

The existing component (from jaeger v1) are in blue:

flowchart LR
    Receiver --> AdaptiveSamplingProcessor --> BatchProcessor --> Exporter
    Exporter -->|"(1) get storage"| JaegerStorageExension
    Exporter -->|"(2) write trace"| TraceStorage
    AdaptiveSamplingProcessor -->|"getStorage()"| StorageConfig

    OTEL_SDK[OTEL
             SDK]
    OTEL_SDK -->|"(1) GET /sampling"| HTTP_endpoint
    HTTP_endpoint -->|"(2) getStrategy()"| StrategiesProvider
    style HTTP_endpoint fill:blue,color:white

    subgraph Jaeger Collector
        Receiver
        BatchProcessor[Batch
                       Processor]
        Exporter        
        TraceStorage[(Trace
                      Storage)]
        AdaptiveSamplingProcessor[Adaptive
                                  Sampling
                                  Processor]
        AdaptiveSamplingProcessorV1[Adaptive
                                    Sampling
                                    Processor_v1]
        style AdaptiveSamplingProcessorV1 fill:blue,color:white
        AdaptiveSamplingProcessor -->|"[]*model.Span"| AdaptiveSamplingProcessorV1
        AdaptiveSamplingProcessorV1 ---|use| SamplingStorage

        subgraph JaegerStorageExension[Jaeger Storage Exension]
            Storage[[Storage
                     Config]]
        end
        subgraph RemoteSamplingExtension[Remote Sampling Extension]
            StrategiesProvider -->|"(3b) getStrategy()"| AdaptiveProvider
            StrategiesProvider -->|"(3a) getStrategy()"| FileProvider
            FileProvider --> FileConfig
            AdaptiveProvider --> StorageConfig
    
            HTTP_endpoint[HTTP
                          endpoint]
            StrategiesProvider[Strategies
                               Provider]
            FileProvider[File
                         Provider]
            AdaptiveProvider[Adaptive
                             Provider]
            style StrategiesProvider fill:blue,color:white
            style FileProvider fill:blue,color:white
            style AdaptiveProvider fill:blue,color:white
            subgraph Config
                FileConfig[[File Config]]
                StorageConfig[[Storage Config]]
            end
            StorageConfig --- SamplingStorage
            SamplingStorage[(Sampling
                             Storage)]
            style SamplingStorage fill:blue,color:white
        end
    end

@Pushkarm029 Pushkarm029 changed the title [jaeger-v2] Add static sampling extension [jaeger-v2] Add remotesampling extension May 4, 2024
@Pushkarm029 Pushkarm029 marked this pull request as draft May 4, 2024 08:11
dependabot bot and others added 11 commits May 4, 2024 15:28
…acing#5382)

This PR started as a dependabot upgrade of OTEL Collector dependencies,
but the [tests were
failing](https://github.com/jaegertracing/jaeger/actions/runs/8854720626/job/24318352528?pr=5382).

The main change here is adding a ping from `e2eInitialize` to the
jaeger-v2 binary to make sure that it started before proceeding with
tests (ideally OTEL Collector should have a healthcheck endpoint that
signals when all components have been started successfully).

Also added some better logging.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from
0.15.10 to 0.15.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/anchore/sbom-action/releases">anchore/sbom-action's
releases</a>.</em></p>
<blockquote>
<h2>v0.15.11</h2>
<h2>Changes in v0.15.11</h2>
<ul>
<li>chore(deps): update Syft to v1.3.0 (<a
href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>)
[<a
href="https://github.com/anchore-actions-token-generator">anchore-actions-token-generator</a>]</li>
<li>chore: remove outdated snapshot workflow (<a
href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>)
[<a href="https://github.com/spiffcs">spiffcs</a>]</li>
<li>fix: don't pass in a separate env. This makes it impossible to pass
env vars via the action context to syft. (<a
href="https://redirect.github.com/anchore/sbom-action/issues/455">#455</a>)
[<a href="https://github.com/iNoahNothing">iNoahNothing</a>]</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/anchore/sbom-action/commit/7ccf588e3cf3cc2611714c2eeae48550fbc17552"><code>7ccf588</code></a>
chore(deps): update Syft to v1.3.0 (<a
href="https://redirect.github.com/anchore/sbom-action/issues/456">#456</a>)</li>
<li><a
href="https://github.com/anchore/sbom-action/commit/7f33cf5b409c25dd63ce12b34f585feaed60b5bf"><code>7f33cf5</code></a>
chore: remove outdated snapshot workflow (<a
href="https://redirect.github.com/anchore/sbom-action/issues/457">#457</a>)</li>
<li><a
href="https://github.com/anchore/sbom-action/commit/04a486a88617c017d26e11a4d30b3cd198f44824"><code>04a486a</code></a>
fix: extend existing environment when invoking syft instead of creating
a new...</li>
<li>See full diff in <a
href="https://github.com/anchore/sbom-action/compare/ab5d7b5f48981941c4c5d6bf33aeb98fe3bae38c...7ccf588e3cf3cc2611714c2eeae48550fbc17552">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=anchore/sbom-action&package-manager=github_actions&previous-version=0.15.10&new-version=0.15.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes
- Use consistent naming for config files

## How was this change tested?
- CI

Signed-off-by: Yuri Shkuro <github@ysh.us>
…gertracing#5399)

## Which problem is this PR solving?
- Part of jaegertracing#5334

## Description of the changes
- An API design of Storage V2 spanstore and its proto file.
- For the detailed discussion and how we arrived to this design, please
take a look at
https://docs.google.com/document/d/1IvUcDsdRxMPK-xTUE32w3NSAGTkUcmnDQttN6ijUnWs/edit?usp=sharing

## How was this change tested?
- This PR hasn't been tested yet since it only contains interfaces and
no actual implementation to be tested.

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: James Ryans <james.ryans2012@gmail.com>
…5401)

Bumps google.golang.org/protobuf from 1.33.0 to 1.34.0.


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=google.golang.org/protobuf&package-manager=go_modules&previous-version=1.33.0&new-version=1.34.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description of the changes
- This is the first jaeger release where Jaeger UI changes are included
in the release notes and where we have version parity between the two
repos.

Signed-off-by: Albert Teoh <albert@packsmith.io>
Co-authored-by: Albert Teoh <albert@packsmith.io>
## Which problem is this PR solving?
- part of jaegertracing#5345

## Description of the changes
- Added Purge method for ES/OS
- optimized integration test for es/os storage

## How was this change tested?
- `STORAGE=elasticsearch make storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
## Which problem is this PR solving?
- The cache was initialized with incorrect parameter, and when I looked
around it wasn't even used anywhere

## Description of the changes
- delete indexCache and indexCacheTTL argument

## How was this change tested?
```
$ go test ./plugin/storage/es/spanstore
ok      github.com/jaegertracing/jaeger/plugin/storage/es/spanstore     0.471s
```

Signed-off-by: Yuri Shkuro <github@ysh.us>
yurishkuro and others added 8 commits May 4, 2024 15:28
…racing#5410)

## Which problem is this PR solving?

- Resolves jaegertracing#5350 


## Description of the changes
- Added the logic same in the docker-image CI to build linux/amd64
architecture only

## How was this change tested?
- Not yet

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

Signed-off-by: Vamshi Maskuri <gwcchintu@gmail.com>
## Which problem is this PR solving?
- Objective is to cut down on the number of CI failures that are often
due to codecov uploads failing
- Resolves jaegertracing#5173
- Supersedes and closes jaegertracing#5184

## Description of the changes
- Add a new helper action which encapsulates defining a CODECOV token
and uploading with retries

## How was this change tested?
- CI

---------

Signed-off-by: Yuri Shkuro <github@ysh.us>
…ertracing#5345)

## Which problem is this PR solving?
- part of jaegertracing#5254 

## Description of the changes
- Utilizing existing `StorageIntegration` to test the jaeger-v2 OTel
Collector and gRPC storage backend with the provided config file at
`cmd/jaeger/config-elasticsearch.yaml`.

## How was this change tested?
- Start a elasticsearch or opensearch docker instance.
- Run `STORAGE=elasticsearch SPAN_STORAGE_TYPE=elasticsearch make
jaeger-v2-storage-integration-test`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
## Which problem is this PR solving?
-
jaegertracing#5398 (comment)

## Description of the changes
- added purge for method for cassandra

## How was this change tested?
- via integration tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
## Which problem is this PR solving?
- Previous PR did not include markup and the diagram is not rendering
correctly

## Description of the changes
- Add markup

Signed-off-by: Yuri Shkuro <github@ysh.us>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
@Pushkarm029
Copy link
Member Author

i haven;t pushed latest changes yet.

Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
@yurishkuro yurishkuro added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label May 7, 2024
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, in the right direction

Comment on lines 11 to 13
static:
file: ./cmd/jaeger/sampling-strategies.json
reload_interval: 1s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static:
file: ./cmd/jaeger/sampling-strategies.json
reload_interval: 1s
file:
path: ./cmd/jaeger/sampling-strategies.json
reload_interval: 1s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need reload_interval: 1s? The current implementation reloads the file on changes, not on timer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With its reload_interval: 1s, it does both; it reloads after every 1s + on file change.

You are right; reloading only when the file changes is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current implementation also supports reload_interval: see plugin/sampling/strategystore/static/strategy_store.go : L90

static:
file: ./cmd/jaeger/sampling-strategies.json
reload_interval: 1s
adaptive:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File & adaptive sections do not make sense simultaneously, it's either one or another. Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I already added a Conditional Check, which throws an error if both are provided.

Since these are production configs, I've included both options to make sure that the user is aware of their availability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment in the configuration to make it more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping them both means the config file is invalid. Config files are supposed to work without changes.

cmd/jaeger/internal/extension/remotesampling/config.go Outdated Show resolved Hide resolved
cmd/jaeger/internal/extension/remotesampling/factory.go Outdated Show resolved Hide resolved
"github.com/jaegertracing/jaeger/pkg/recoveryhandler"
)

func StartHttpServer(ss strategystore.StrategyStore, logger *zap.Logger, port string) (*http.Server, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure OTEL Collector does not already have a helper function for this? They do start various HTTP servers. I would check OTLP/HTTP receiver, z-pages extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to find any helper function.

In both OTLP/HTTP receiver, z-pages extension, a new http is initiated using http.NewServeMux().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, well if OTEL does not have a helper function, we certainly do have it in jaeger-v1, so let's not duplicate (refactor into public helper if needed)

also, they do have helper that creates HTTP server from standard HTTP config, which presumably deals with TLS and things like that: https://github.com/open-telemetry/opentelemetry-collector/blob/c8f9563f1b12470cb74f2f98c226faaffbcb38ac/receiver/otlpreceiver/otlp.go#L148

ReloadInterval time.Duration `mapstructure:"reload_interval"`
}

type AdaptiveConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the parameters here are actually only needed for the adaptive sampling processor, which should run inside span processor, not inside the extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing it, but it makes the code more complicated. We call NewStrategyStore in the extension, which requires all options. However, in the adaptive sampling processor, we only need these options: AggregationBuckets and CalculationInterval to create a new aggregator that calculates probability and stores it.

But if we still need to pass this configuration to the processor, we must find a way to transport the configuration from processor to the remote sampling extension internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I cannot find any way to transport the configuration internally from the processor to the extension.

cc @james-ryans

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, most params are indeed used by adaptive.Processor, which is confusing since it's very different in purpose from the new adaptivesampling processor that you added. I think it's actually a poor design choice in the old code, because what that Processor does is it periodically loads the throughput numbers, calculates new probabilities and stores them. Then the "strategy store" part, which is really needed for the HTTP endpoint, only reads the probabilities and constructs sampling strategies. We can see the evidence here - starting two goroutines at once is a smell, breaking single responsibility principle:

p.runBackground(p.runCalculationLoop)
p.runBackground(p.runUpdateProbabilitiesLoop)

Arguably a better design would've been for the Processor part to be co-located with the Aggregator (and run by adaptivesamping processor), and only retrieval/serving logic would've been in the remotesampling extension.

Think about it. Maybe we can do a bit of preemptive refactoring of the adaptive.Processor class:

  • keep the part that runs in the background and updates storage with probabilities
  • move the loadProbabilities/runUpdateProbabilitiesLoop part into strategy_store.go (as a separate class).

But one thing to be careful about is if we do this we want to ensure that these config options are cleanly split into two disjoined sets. Otherwise users would have to match the options across two OTEL components, something I would prefer to avoid. Right now they only share the storage name (if you follow my suggestion in another comment about maxBuckets).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm creating a new PR to refactor v1 sampling.

pkg/clientcfg/clientcfghttp/handler.go Outdated Show resolved Hide resolved
pkg/clientcfg/clientcfghttp/handler.go Outdated Show resolved Hide resolved
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
Signed-off-by: Pushkar Mishra <pushkarmishra029@gmail.com>
return fmt.Errorf("failed to create the local file strategy store: %w", err)
}

server, err := internal.StartHttpServer(ss, j.telemetry.Logger, j.cfg.Port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting http server is not part of file/adaptive choice, it's always there

return nil
}

if j.cfg.Adaptive.StrategyStore != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these if()s are getting very busy, I recommend extracting their content into helper functions


const timeDuration = time.Minute

// hard-coded : maxBuckets:int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add these parameters to configs where they must come from, it will make it easier to see how the configs are related

}

func (tp *traceProcessor) start(_ context.Context, host component.Host) error {
ss, err := remotesampling.GetSamplingStorageFactory(tp.config.StrategyStore, host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're only using ss to call ss.CreateSamplingStore(maxBuckets). Do you actually need remotesampling.GetSamplingStorageFactory, or perhaps only remotesampling.GetSamplingStorage (without maxBuckets argument)? This way you would not need to repeat maxBuckets config in the processor, only in the remotesampling extension

}

func (tp *traceProcessor) close(context.Context) error {
// Close Aggregator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Close Aggregator

Comment on lines +79 to +80
samplerType, samplerParam := span.GetSamplerParams(tp.logger)
tp.aggregator.RecordThroughput(span.Process.ServiceName, span.OperationName, samplerType, samplerParam)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing a bunch of critical business logic

func handleRootSpan(aggregator strategystore.Aggregator, logger *zap.Logger) ProcessSpan {

I would suggest refactoring that logic into a helper method in aggregator.go like RecordThroughput(agg, span)

return err
}

// hard-coded : timeDuration:time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the accurate name is aggregationInterval - it has to be a parameter of the processor/config

"github.com/jaegertracing/jaeger/pkg/recoveryhandler"
)

func StartHttpServer(ss strategystore.StrategyStore, logger *zap.Logger, port string) (*http.Server, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, well if OTEL does not have a helper function, we certainly do have it in jaeger-v1, so let's not duplicate (refactor into public helper if needed)

also, they do have helper that creates HTTP server from standard HTTP config, which presumably deals with TLS and things like that: https://github.com/open-telemetry/opentelemetry-collector/blob/c8f9563f1b12470cb74f2f98c226faaffbcb38ac/receiver/otlpreceiver/otlp.go#L148

ReloadInterval time.Duration `mapstructure:"reload_interval"`
}

type AdaptiveConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, most params are indeed used by adaptive.Processor, which is confusing since it's very different in purpose from the new adaptivesampling processor that you added. I think it's actually a poor design choice in the old code, because what that Processor does is it periodically loads the throughput numbers, calculates new probabilities and stores them. Then the "strategy store" part, which is really needed for the HTTP endpoint, only reads the probabilities and constructs sampling strategies. We can see the evidence here - starting two goroutines at once is a smell, breaking single responsibility principle:

p.runBackground(p.runCalculationLoop)
p.runBackground(p.runUpdateProbabilitiesLoop)

Arguably a better design would've been for the Processor part to be co-located with the Aggregator (and run by adaptivesamping processor), and only retrieval/serving logic would've been in the remotesampling extension.

Think about it. Maybe we can do a bit of preemptive refactoring of the adaptive.Processor class:

  • keep the part that runs in the background and updates storage with probabilities
  • move the loadProbabilities/runUpdateProbabilitiesLoop part into strategy_store.go (as a separate class).

But one thing to be careful about is if we do this we want to ensure that these config options are cleanly split into two disjoined sets. Otherwise users would have to match the options across two OTEL components, something I would prefer to avoid. Right now they only share the storage name (if you follow my suggestion in another comment about maxBuckets).

type Config struct {
File FileConfig `mapstructure:"file"`
Adaptive AdaptiveConfig `mapstructure:"adaptive"`
Port string `mapstructure:"port"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary? It needs to be full blown server config, e.g. go.opentelemetry.io/collector/config/confighttp.ServerConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:new-feature Change that should be called out as new feature in CHANGELOG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants