Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataType a separate type within Component #10222

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

ankitpatel96
Copy link
Contributor

@ankitpatel96 ankitpatel96 commented May 25, 2024

Description

Followup from #10069

The end goal of this PR is to separate DataType from Type. DataType becomes an enum for Traces, Logs, and Metrics.

Context (to quote @mx-psi):

A component.ID currently is a pair of component.Type and a (possibly empty) name for the component (e.g. otlp/name). We use the fact that component.DataType is also a component.Type to also use it for pipeline names (e.g. traces/sampled).

To solve this, I introduce a component.PipelineID as suggested by @TylerHelmuth. This PipelineID is very similar to component.ID - it just only lets you use a DataType as the Type.

The PR is large but hundreds of these changes are just changing test files to use a DataType instead of a Type - I swear its not as bad as it looks. Most of the actual changes are in component, connector and service/internal/graph/

Followup Work:
I wanted to move PipelineID to service/pipelines - but that would create a circular dependency with InstanceID, which uses PipelineID but lives in component. I want to fix that.

I also want to rename DataType to Signal or something like that, and move it out to its own package.

I didn't do either of those things in this PR since it's already huge.

Link to tracking issue

Fixes #9429

Testing

Unit tests

Copy link

codecov bot commented May 25, 2024

Codecov Report

Attention: Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.

Project coverage is 92.53%. Comparing base (effe267) to head (304d0b0).
Report is 54 commits behind head on main.

Current head 304d0b0 differs from pull request most recent head c1252e4

Please upload reports for the commit c1252e4 to get more accurate results.

Files Patch % Lines
component/pipeline.go 93.93% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10222      +/-   ##
==========================================
+ Coverage   92.52%   92.53%   +0.01%     
==========================================
  Files         388      389       +1     
  Lines       18310    18364      +54     
==========================================
+ Hits        16942    16994      +52     
- Misses       1022     1023       +1     
- Partials      346      347       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ankitpatel96 ankitpatel96 marked this pull request as ready for review May 29, 2024 16:47
@ankitpatel96 ankitpatel96 requested a review from a team as a code owner May 29, 2024 16:47
@ankitpatel96 ankitpatel96 requested a review from atoulme May 29, 2024 16:47
@ankitpatel96 ankitpatel96 changed the title [DRAFT] [NOT READY] Make DataType a separate type within Component Make DataType a separate type within Component May 29, 2024
@ankitpatel96
Copy link
Contributor Author

This is changing an API - and it seems like I will have to change contrib in lockstep with this change (bunch of tests are failing). Any advice on how to do that would be welcome!

@mx-psi
Copy link
Member

mx-psi commented Jun 5, 2024

Requesting review from the reviewers on #10069

component/config.go Outdated Show resolved Hide resolved
component/pipeline.go Outdated Show resolved Hide resolved
component/pipeline.go Outdated Show resolved Hide resolved
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the dependency graph, I think InstanceID should not be part of the service module. Either we make service/pipelines into a separate module or we move this to a separate pipelines module altogether.

@@ -189,10 +189,3 @@ type CreateDefaultConfigFunc func() Config
func (f CreateDefaultConfigFunc) CreateDefaultConfig() Config {
return f()
}

// InstanceID uniquely identifies a component instance
type InstanceID struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a

// Deprecated use [<ref to new type>]
type InstanceID = <ref to new type>

here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I? That would make component dependent on pipelines... which is most definitely circular

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use an internal module and make this and the actual type aliases of a symbol in that internal module until we remove it from component:

// package internal

type InstanceID struct { /* actual definition goes here *? } 
// package component

// Deprecated: [v0.103.0] use pipelines.InstanceID
type InstanceID = internal.InstanceID
// package pipelines

type InstanceID = internal.InstanceID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work -

Pipeline depends on internal, which depends on pipeline since InstanceID depends on pipeline.ID

If I move pipeline.ID into this internal package, we solve that problem but then the component.InstanceID shim doesn't work - that means component imports internal and then internal imports component (InstanceID contains component.ID and component.Kind).

Open to any ideas to fixing this..

Side note - this internal package has to be collector/internal/ since it needs to be accessible by collector/component and collector/pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solution here would be to also move component.ID and component.Kind to internal and make shims also for these two on component. It sounds like a pain, so let's first discuss the rest of the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right that would probably work... but I do question whether its worth it for stability concerns. Moving component.ID to internal/ would make component dependent on the root collector module, thus adding that dependency to almost every other module.

I don't think the blast radius of this change is especially large for contrib and I think, in this case, it makes sense to keep this change as is and not try to provide backwards compatible import shims.

@ankitpatel96
Copy link
Contributor Author

I moved InstanceID and PipelineID into a new pipeline module!

@ankitpatel96
Copy link
Contributor Author

Bump - would really appreciate a review here before I do another merge to fix all these conflicts

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR makes sense to merge with this public API. There are a bunch of breaking changes which are hard to work around in some instances. IMO we should

  1. document the list of breaking changes on the changelog
  2. evaluate the impact of breaking changes looking at the contrib tests and think about adding two-step deprecation process for the more impactful ones

component/config.go Outdated Show resolved Hide resolved
// typeAndNameSeparator is the separator that is used between type and name in type/name composite keys.
const typeAndNameSeparator = "/"
// TypeAndNameSeparator is the separator that is used between type and name in type/name composite keys.
const TypeAndNameSeparator = "/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be renamed since we also use it for separating DataType and pipeline name. Alternatively, move it to internal and make it private

Comment on lines -19 to +20
Consumer(...component.ID) (consumer.Logs, error)
PipelineIDs() []component.ID
Consumer(...pipeline.ID) (consumer.Logs, error)
PipelineIDs() []pipeline.ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the ones on metrics and traces router) are breaking changes, it seems hard to work around them so we can just add an API changelog entry

@@ -21,7 +22,7 @@ func NewStatusWatcherExtensionCreateSettings() extension.Settings {

// NewStatusWatcherExtensionFactory returns a component.ExtensionFactory to construct a status watcher extension.
func NewStatusWatcherExtensionFactory(
onStatusChanged func(source *component.InstanceID, event *component.StatusEvent),
onStatusChanged func(source *pipeline.InstanceID, event *component.StatusEvent),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a breaking change, but if we end up not having a type alias for component.InstanceID, users of this API are going to have a build time error regardless so they will be 'notified' that way. We can list it in the API changelog as a breaking change as well.

}

// UnmarshalText implements the encoding.TextUnmarshaler interface.
func (id *ID) UnmarshalText(text []byte) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have an internal package that has this implementation and allows it to be reused by both component.ID and pipeline.ID. One way of doing this would be having a function that returns two strings and we can reuse the parsing logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is actually desirable - the root module (which internal is a part of) already imports component, so this would create a dependency loop. It shouldn't create an import loop but I'm still not sure if this is desirable.

service/extensions/extensions.go Show resolved Hide resolved
Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got a chance to take a look at this, and now I can see that my original suggestion did not factor in InstanceID, which even further complicates this change.

Being able to see this change now in its entirety makes me rethink if this change is valuable. While conceptually I think it is nice, to an end-user this changes nothing. I am concerned about the amount of time we'll spend dealing with this breaking change for what essentially ends up being a rename (a nice rename, but a rename non-the-less.

If, after 1.0, we decide this is valuable, it could be something that is considered for 2.0, but at this point we've been using component.InstanceID, component.Type, and component.DataType successfully without end-users (or component authors) complaining.

At this point I suggest we abandon this change and call #9429 completed as "we decided this wasn't worth it". @ankitpatel96 thank you for your effort on this PR, seeing it all laid out really helped.

@mx-psi
Copy link
Member

mx-psi commented Jul 8, 2024

. While conceptually I think it is nice, to an end-user this changes nothing. I am concerned about the amount of time we'll spend dealing with this breaking change for what essentially ends up being a rename (a nice rename, but a rename non-the-less.

This may end up being the case, but thinking about this a bit further, before completely abandoning this I personally want to:

  • Define for how long we want to support the Go API on different Go modules
  • Think about how costly different Go API changes will be after 1.0

The first PR related to these two goals is #10539, which will help us then define the support policy for Go API modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does DataType belong to component? Can we make it a type itself?
4 participants