Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Port of jaeger remote sampler support #3827

Conversation

sconover
Copy link

@sconover sconover commented Apr 2, 2024

  • Introduce a dependency on tornado, in order to use IOLoop in RemoteControlledSampler
  • Introduce a variety of new samplers, which support the jaeger remote sampling protocol

DRAFT - DO NOT MERGE

There are a variety of challenging TODO's (marked TODO(sconover)) and, no doubt, changes to be made based on feedback from project maintainers.

The goal of this initial set of changes is to conduct a pretty "straight" port of the corresponding code in jaeger-client-python, to get discussion going on how opentelemetry-python support for jaeger remote sampling should ultimately work.

Please see the jaeger-client-python project and especially sampler.py and test_sampler.py for the source of much of this port.

New dependencies:

  • threadloop>=1.0.2
  • tornado>=6.4

I expect there's a very good chance that opentelemetry python docs will need to be updated in advance of an ultimate merge, and would appreciate advice on what those areas might be.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Ported unit tests from jaeger-client-python
  • For lack of end-to-end tests, I put together a sample flask app and, running alongside a jaeger agent, manually verified that remote configuration changes indeed sync over to the client and change the client's sampling decision behavior.

Does This PR Require a Contrib Repo Change?

Answer the following question based on these examples of changes that would require a Contrib Repo Change:

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

I'm leaving all of the above "unchecked". I've made some style changes in the direction of this project and python 3, however I'm hesitant to "compromise" the relatively straight port at this point, to make comparison/contrasting easier for initial reviewers.

Regarding (even) unit tests: while this code is unit tested, the tests are ported, and stylistically are quite different from tests in this project - for example they make extensive use of mocks, among other differences.

Sample flask server and explanation:

# A jaeger agent may be started using a command like:
#
# docker run \
# -v /dev/scratch/config:/config \
# -e SAMPLING_CONFIG_TYPE=file \
# -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
# -p 5775:5775/udp \
# -p 6831:6831/udp \ 
# -p 6832:6832/udp \ 
# -p 5778:5778 \
# -p 16686:16686 \   
# -p 14268:14268 \ 
# -p 14250:14250 \ 
# -p 9411:9411 \
# jaegertracing/all-in-one:1.22 \
# --sampling.strategies-file=/config/strategies.json \
# --sampling.strategies-reload-interval=5s
#
# Where the sampling strategies config file is available locally at
# 
# /dev/scratch/config/strategies.json 
#
#
# Run this python script and make a request using:
#   curl localhost:5000/
# If the sampling decision is 'sample' then the ConsoleSpanExporter
# will print the emitted span to the console.
#
# If the agent's strategies.json files is changed to something like
#
# 
# {
#   "service_strategies": [
#     {
#       "service": "foo",
#       "type": "probabilistic",
#       "param": 0.333
#     }
#   ],
#   "default_strategy": {
#     "type": "probabilistic",
#     "param": 1
#   }
# }
#
# ...once these settings sync over via RemoteControlledSampler,
# it will take several curl invocations to see an emitted span,
# reflecting the 1-in-3 chance that a trace is created.

from opentelemetry.sdk.resources import SERVICE_NAME, Resource

from opentelemetry import trace
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

from opentelemetry.sdk.trace.sampling import RemoteControlledSampler
from opentelemetry.sdk.trace.local_agent_net import LocalAgentSender

import threading
import tornado

main_loop = tornado.ioloop.IOLoop().current()

sampler = RemoteControlledSampler(
    channel=LocalAgentSender('localhost', 5778, 5778, io_loop=main_loop),
    service_name='foo',
    sampling_refresh_interval = 5,
)

resource = Resource(attributes={
    SERVICE_NAME: "foo"
})
traceProvider = TracerProvider(resource=resource, sampler=sampler)
traceProvider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter("foo")))
trace.set_tracer_provider(traceProvider)

from flask import (
    Flask, 
    jsonify
)

tracer = trace.get_tracer("foo")

def create_app():
    app = Flask(__name__)

    @app.route('/')
    def hello_world(): 
        with tracer.start_as_current_span("foo") as span:
                span.set_attribute("hello.value", "world")
                return jsonify({
                    "status": "success",
                    "message": "Hello World!"
                })
     
    return app

app = create_app()

if __name__ == '__main__':
    threading.Thread(target=lambda: app.run(debug=True, use_reloader=False)).start()
    main_loop.start()

@sconover sconover requested a review from a team as a code owner April 2, 2024 01:59
@sconover sconover marked this pull request as draft April 2, 2024 02:01
@srikanthccv
Copy link
Member

FYI, contrib is the appropriate place for components like this https://github.com/open-telemetry/opentelemetry-python-contrib. It can't be merged into SDK.

- Introduce a dependency on tornado, in order to use IOLoop
  in RemoteControlledSampler
- Introduce a variety of new samplers, which support the
  jaeger remote sampling protocol

DRAFT - DO NOT MERGE

There are a variety of challenging TODO's and, no doubt,
changes to be made based on feedback from project maintainers.

The goal of this initial set of changes is to conduct a pretty
"straight" port of the corresponding code in jaeger-client-python,
to get discussion going on how opentelemetry-python support for
jaeger remote sampling should ultimately work.
@sconover sconover force-pushed the steve/jaeger-remote-sampler branch from 0897f5a to 1f5b19b Compare April 4, 2024 20:21
@sconover
Copy link
Author

@srikanthccv Thanks for the initial feedback, and I completely agree that it seems a little strange/wrong that this isn't in contrib. However there are a couple of key places where, given the way the code is structured, the code at present, unless I'm missing something, technically must be in here. I will cite the areas I see as issues, and I'd appreciate guidance on what to do about them. I'd be happy to put together a minimal PR to this repo to "prep" the lib for accommodation of the larger portion (all the jaeger remote sampling impl) which I'd put in the contrib repo.

# for RemoteControlledSampler
# Q: Where should sampler.close() be called? I believe it might be
# TracerProvider#shutdown (but not entirely sure)
def close(self) -> None:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanthccv Existing samplers in this repo are simple and don't need an explicit lifecycle event like this. Are we ok with me proposing to add a close method like this...and then, to call that in something like TracerProvider#shutdown?

rate_limit_strategy = strategy.get(RATE_LIMITING_SAMPLING_STR)
if not rate_limit_strategy:
return DEFAULT_LOWER_BOUND
return rate_limit_strategy.get(MAX_TRACES_PER_SECOND_STR, DEFAULT_LOWER_BOUND)

_KNOWN_SAMPLERS = {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanthccv This is a comprehensive list of samplers, which is checked in _get_from_env_or_default. Obviously this works against the idea of there being samplers from outside of this repo - what do you suggest I do about this?

self.assertEqual(sampler.get_description(), 'GuaranteedThroughputProbabilisticSampler{op, 0.51, 3}')

def test_sampler_equality(self):
const1 = sampling.StaticSampler(True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srikanthccv for testing purposes in the ported code, these core samplers are meant to have equality implementations. Are we ok with me submitting equality tests and impl's into this repo?

@sconover
Copy link
Author

Closing in favor of splitting this into this core PR and this contrib PR per @srikanthccv 's advice

@sconover sconover closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants