Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default configurators do more for distros #1937

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Jul 8, 2021

Description

In order to make it easier for users to write distros, we remove the useful code in the opentelemetry-distro package and make it available to all distros in the opentelemetry-sdk package.

Important points

  • Every distro must provide a Configurator and a Distro.

  • The Configurator can simply subclass _OTelSDKConfigurator which provides a default initalization of TracerProvider among other things. However, this class should never be instantiated itself. Configurators should belong to distros.

  • No dependencies changed because

    • opentelemetry-distro depends on opentelemetry-sdk
    • opentelemetry-sdk depends on opentelemetry-instrumentation
    • opentelemetry-distro depends on opentelemetry-instrumentation
    • Therefore, you still only need to pip install opentelemetry-distro to get started with Auto Instrumentation
  • I tried to do something clever to preserve git history when moving opentelemetry-distro -> opentelemetry-sdk but I'm worried it will be lost in the Squash & Merge :]

Fixes open-telemetry/opentelemetry-python-contrib#551

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

I used editable installs (e.g. pip install -e ~/my/repo/path/opentelemetry-python/optelemetry-distro) to verify that I can still use the Auto Instrumentation executable opentelemetry-instrument python3 my_app.py and the distro still gets asked to configure OTel Python.

Does This PR Require a Contrib Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
    - [ ] Unit tests have been added (We didn't ADD any tests but we did move the related Configurator tests to the test-core-sdk test suite!)
  • Documentation has been updated

@NathanielRN NathanielRN requested review from a team, codeboten and lzchen and removed request for a team July 8, 2021 22:48
@NathanielRN
Copy link
Contributor Author

@owais Would appreciate your input on this PR! Specifically what you think about moving it under the opentelemtry.sdk.instrumentation import path (although only distros will ever import this path).

Hopefully this makes it easier to create more distros :) Thank you!

@NathanielRN
Copy link
Contributor Author

Converting to draft while I fix the unit tests.

@NathanielRN NathanielRN marked this pull request as draft July 8, 2021 22:53
Comment on lines 90 to 93
"opentelemetry.sdk.instrumentation.TracerProvider", Provider
)
self.get_processor_patcher = patch(
"opentelemetry.distro.BatchSpanProcessor", Processor
"opentelemetry.sdk.instrumentation.BatchSpanProcessor", Processor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests only really check these helpful Configurator methods so I thought they made sense to move completely to the opentelemetry-sdk package. There's nothing "opentelemetry-distro" related about them really...

@owais
Copy link
Contributor

owais commented Jul 9, 2021

Since distro already depends on opentelemetry-instrumentation which is still not stable, can we move the resuable code to that package instead of SDK? We can let it mature there and move to SDK later if we want. It'd be impossible or very hard to change these interfaces later if we add them to SDK.

@@ -50,9 +50,9 @@ where = src

[options.entry_points]
opentelemetry_distro =
distro = opentelemetry.distro:OpenTelemetryDistro
otel-distro = opentelemetry.distro:OpenTelemetryDistro
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we really need OpenTelemetry in the name if it is already in the path. opentelemetry.distro.Distro sounds nicer to me but may be that's just me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing about the plugin name. Why opentelemetry_distro.otel-distro instead of opentelemetry_distro.distro. I know neither is perfect but is there a reason to repeat otel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, I actually wouldn't have changed except for what I learned in this Contrib PR to Enforce 1 Distro/Configurator: open-telemetry/opentelemetry-python-contrib#571

In that PR, I want to log a useful error message if the user has multiple distros installed, but when it was just distro I didn't know where that distro was combing from. I thought have a message like Multiple distros were found: (otel-distro,aws-distro,splunk-distro) would be better because you know where they are coming from?

I guess the user knows what they installed though so we can keep just distro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I misunderstand, but is "plugin name" the same as "entry_point"? Actually you gave me a good idea :]

I print out the attributes available on the entry_point and say this:

otel-distro = opentelemetry.distro:OpenTelemetryDistro
{'attrs': ('OpenTelemetryDistro',),
 'dist': opentelemetry-distro 0.23.dev0 (/usr/local/lib/python3.9/site-packages),
 'extras': (),
 'module_name': 'opentelemetry.distro',
 'name': 'otel-distro'}

So I can use the module_name for my purpose in that Contrib repo. Will change this back!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but in that case we don't need to include distro in it probably. It could just be otel, aws, lightstep etc but doesn't really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll just use module_name because it's easiest and we can always change it later (try to delete the "distro" substring) if we want.

@owais
Copy link
Contributor

owais commented Jul 9, 2021

Since distro already depends on opentelemetry-instrumentation which is still not stable, can we move the resuable code to that package instead of SDK? We can let it mature there and move to SDK later if we want. It'd be impossible or very hard to change these interfaces later if we add them to SDK.

I see now that most of the things are private. I think in that case adding to SDK is much more simpler. I'm not sure about opentelemtry.sdk.instrumentation path though. Distro is very useful when auto-instrumenting apps but instrumentation is not it's sole purpose. Plus we already have a dedicated package for instrumentations. How do we decide what does into opentelemetry.sdk.instrumentation vs opentelemetry.instrumentation? I think it'll cause confusion. May be name it somethign like opentelemetry.sdk.configuration or opentelemetry.sdk.setup? We can also make the whole package private by naming it something like opentelemetry.sdk._configuration. This will buy us time to come up with a better name later.

@ocelotl
Copy link
Contributor

ocelotl commented Jul 12, 2021

Just as a follow up to our conversation in here a couple of comments (which I believe everyone involved in this conversation already know but just so that they are written somewhere):

  1. This adds to the SDK things that are not defined in the spec, which is inconvenient in my opinion.
  2. This approach tries to minimize the problem of not being able to reuse distro code in other distros but does not solve it completely because making it impossible to install more than one distro makes it also impossible to reuse code from other distro in a distro.

I understand that not bothering the user with setting an environment variable to use a distro is desirable, and I agree with that. I also understand that it is expected that most situations will require only one distro being installed. Considering that, I suggest we do the following:

  1. Allow multiple distros to be installed.
  2. Provide a default distro.
  3. Make the default distro be the active one if there is no other distro installed.
  4. If there is only one other installed distro, (besides the default one, of course) make this other distro the active one without it being necessary to set the environment variable.
  5. Make the distro pointed out by the environment variable the active one if there is more than one distro installed.

What are your thoughts?

@lzchen, @owais, @NathanielRN, @lonewolf3739, @aabmass, @codeboten

@srikanthccv
Copy link
Member

Since all of the configuration code is private I am fine with it and it even makes sense that there is some mechanism to configure sdk without installing another package. Maybe we can propose an abstract concept of SDK configuration to spec. I am still kind of confused between only install one distro-allow multiple distros install but use one-allow multiple distros usage see my comment from other thread open-telemetry/opentelemetry-python-contrib#571 (comment).

@owais
Copy link
Contributor

owais commented Jul 13, 2021

IMO allowing multiple distros no matter how detailed the heuristics we come up with will end up causing pain to end-users. I think this will be especially true if we go out of our way to make it possible to allow using distros as libraries for code reuse. That'll just encourage people to extend other distros even more resulting in the problem getting even worse.

I think it's fine if distros are not reusable as libraries. If we can provide the most useful code as a util/base library then each distro should only contain vendor-specific configuration and it wouldn't make any sense for anyone to extend such distros.

To me, distros are more like CLI commands meant to be consumed by the end-user. Sure, with a dynamic language like Python, it is possible to import CLI command (or distro) code and use it as a library but that doesn't mean the command author intended to ship a library or that one should use it as one. In other words, just because the Python import system allows to import of a piece of code does not mean the author intended the code to be a library. IMO we should explicitly call out that distros are not libraries but end-user consumable/runnable products.

While I agree with the sentiment that code should be reusable, in practice I don't see why anyone would want to reuse code from an AWS, Google or Lightstep distro to build their own. The only valid use case that comes to mind is if a distro contains a bug and a user temporarily fixes the bug by subclassing the distro. This is the only practical situation I can think of where allowing multiple distros could help. That said, I think it's fine to expect users to use a fork in temporary situations like these.

@owais
Copy link
Contributor

owais commented Jul 13, 2021

If we really wanted to support multiple distros, I'd prefer a method that works deterministically and reliably without any intervention required by the user. For example,

  1. Distros are responsible to work well with existing pipelines/other distros: instrument command just iterates over all distros and initializes each one in sequence. Maybe it can sort them beforehand by some key to create a more deterministic behaviour. Each distro would have to check if tracing/metrics pipelines have already been set up and would either set up a new one or enhance the existing one. Distros could also raise exceptions if they want to force users to have only one distro installed. This adds a lot of complexity to distros but this allows distro authors to ship a very specific behaviour without requiring manual intervention from end-users.

  2. Add a priority level: Each distro would ship with a priority number and the distro with the highest priority would always win. For example, default otel distro would ship with priority level 0 while vendor-specific distro would have priority level 10. Anyone building a distro on top of another one would have to bump the priority level of their distro. So if I wanted to build on top of the AWS distro (with priority 10), I'd add AWS distro as a dependency, import and use it as a library but set my distros priority to 20. This would ensure that the opentelemetry-instrument command would always prefer my distro. This also gives authors the ability to reuse other distros as libraries but at the same time ship a deterministic behaviour without requiring end-users to have any knowledge about distros and make any decisions. This does not allow users to manually install multiple distros but allows distro authors to build on other distros which is what I think @NathanielRN and @ocelotl both want to be able to do.

If we had to allow multiple ones, I'd go with 2 but I still strongly prefer to just never allow more than one distros. I think it simplifies the overall design and UX and the price to pay (fork vs import) is a very small one IMO especially if we make useful functions available elsewhere (sdk/instrumentation).

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks like a step in the right direction to me.

@NathanielRN
Copy link
Contributor Author

@owais

I don't see why anyone would want to reuse code from an AWS, Google or Lightstep distro to build their own

Agreed. More specifically, I doubt people will distribute a opentelemetry-distro-aws-modified. Like you mentioned, bug fixes will be fixed by the distro owners.

If we had to allow multiple ones, I'd go with 2 but I still strongly prefer to just never allow more than one distros.

Agreed. I don't think we should over complicate this, besides this seems out of scope for this PR.

@ocelotl I understand your concerns, but at the very least the discussion on this PR is running away from its original purpose. Merging this PR does not mean multiple distros can't be installed. It just moves useful methods to the SDK that all distros can use. Design opinions may differ because some may want distros to inherit from others, but like @owais pointed out, distros should really just be setting environment variables to configure the SDK.

This PR just makes these useful functions available so it enables developers. The current setup is blocking developers.

Would appreciate your reviews @lzchen @owais @ocelotl @lonewolf3739 of this PR independently without thinking about the multiple distros issue which should be apart of open-telemetry/opentelemetry-python-contrib#571 (and now that the instrumentation package is moved this will need a new PR).

@NathanielRN NathanielRN force-pushed the default-configurators-do-more-for-distros branch from 664701f to df33ede Compare July 16, 2021 20:45
@ocelotl
Copy link
Contributor

ocelotl commented Jul 16, 2021

@NathanielRN ok, I understand your point. I'll look into this ASAP

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments after reviewing the public api check. I think the change from Configurator -> OpenTelemetryConfigurator is ok, probably best to hide or remove the logger though.

CHANGELOG.md Outdated Show resolved Hide resolved
from opentelemetry.sdk.trace.export import BatchSpanProcessor, SpanExporter
from opentelemetry.sdk.trace.id_generator import IdGenerator

logger = getLogger(__file__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is logger ever used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used, you're right! It was actually already there from the distro file previously, but since it's not used I'll remove it and anything else redundant.

@codeboten codeboten added Skip Changelog PRs that do not require a CHANGELOG.md entry Approve Public API check This label shows that the public symbols added or changed in a PR are strictly necessary and removed Skip Changelog PRs that do not require a CHANGELOG.md entry labels Jul 16, 2021
@ocelotl
Copy link
Contributor

ocelotl commented Jul 18, 2021

Ok, I am ok with this change being merged, because the code being moved into the SDK is private. I actually disagree with having anything distro-related in the SDK because it is outside of the SDK spec.

@owais
Copy link
Contributor

owais commented Jul 18, 2021

@ocelotl If the contention is adding this to SDK, why don't we add it to instrumentation package instead? This should be a non-issue given that the package already hosts distro related code (THE base distro actually)

https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-instrumentation/src/opentelemetry/instrumentation/distro.py

@ocelotl
Copy link
Contributor

ocelotl commented Jul 18, 2021

@ocelotl If the contention is adding this to SDK, why don't we add it to instrumentation package instead? This should be a non-issue given that the package already hosts distro related code (THE base distro actually)

https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-instrumentation/src/opentelemetry/instrumentation/distro.py

That sounds good, but @NathanielRN has mentioned that this PR is blocking developers and I don't want to hold it longer, specially now that it has been approved already. I say we merge it as it is and consider the next step later if necessary.

@NathanielRN
Copy link
Contributor Author

@owais

why don't we add it to instrumentation package instead?

We probably wouldn't want to do this, since these methods configure opentelemetry-sdk and that would make the instrumentation package opentelemetry-instrumentation take a dependency on the SDK package, which we have previously avoided.

@ocelotl

Ok, I am ok with this change being merged, because the code being moved into the SDK is private. I actually disagree with having anything distro-related in the SDK because it is outside of the SDK spec.

If that's the concern, one solution I can think of is to make a totally new package called opentelemetry-default-configurator and put these methods there. It's nice because it can be developed independently, but I also think it would be confusing to the typical user. The current private class solution need only be used by distro developers (vendors) to simplify usage in a way that the user doesn't need to be aware about.

CHANGELOG.md Outdated
@@ -5,6 +5,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased](https://github.com/open-telemetry/opentelemetry-python/compare/v1.3.0-0.22b0...HEAD)
- `opentelemetry-distro` & `opentelemetry-sdk` Moved Auto Instrumentation Configurator code to SDK
to let distros use its default implementation
([#1934](https://github.com/open-telemetry/opentelemetry-python/pull/1934))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
([#1934](https://github.com/open-telemetry/opentelemetry-python/pull/1934))
([#1937](https://github.com/open-telemetry/opentelemetry-python/pull/1937))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -41,7 +41,7 @@ packages=find_namespace:
zip_safe = False
include_package_data = True
install_requires =
opentelemetry-api ~= 1.3
opentelemetry-api == 1.4.0.dev0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize we were pinning version specifically as of #1933 ... I thought this was just missed in a release 😅 . Will revert this change!

Configurators can subclass and slightly alter this initialization.

NOTE: This class should not be instantiated nor should it become an entry
point on the `opentelemetry-sdk` package. Instead, distros should subclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should distros be subclassing this class or OpenTelemetryConfigurator?

@codeboten
Copy link
Contributor

@NathanielRN please resolve conflicts, @lzchen can you approve if all your comments have been addressed. Thanks!

@NathanielRN NathanielRN force-pushed the default-configurators-do-more-for-distros branch from ea9e20a to 47c0755 Compare July 22, 2021 21:48
@NathanielRN NathanielRN requested a review from lzchen July 22, 2021 21:49
@NathanielRN
Copy link
Contributor Author

@codeboten Conflicts resolved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approve Public API check This label shows that the public symbols added or changed in a PR are strictly necessary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making it easier to create custom distros
6 participants