Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade open-telemetry/opentelemetry-collector-contrib to v0.89.0 #3148

Merged
merged 3 commits into from
Dec 11, 2023

Conversation

gebn
Copy link
Contributor

@gebn gebn commented Nov 16, 2023

make vendor-check fmt passing

What this PR does: Upgrades the open-telemetry/opentelemetry-collector-contrib dependency to v0.89.0, including transitive dependencies as necessary. This version includes open-telemetry/opentelemetry-collector-contrib#26022, which we would like to benefit from in Tempo. N.B. one code change in cmd/tempo/main.go, due to NewTracer() being removed in go.opentelemetry.io/otel/bridge/opencensus@v0.44.0.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2023

CLA assistant check
All committers have signed the CLA.

@gebn
Copy link
Contributor Author

gebn commented Nov 17, 2023

Seeking approval for the CLA, then I can rebase with my work email. Please let me know of any thoughts in the meantime.

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from dd19707 to 2069f0b Compare November 17, 2023 00:08
@joe-elliott
Copy link
Member

Thanks for the PR. If you can get the CLA signed then we can merge.

@joe-elliott
Copy link
Member

Linter issue seems like a simple fix. Tempodb test failure is likely transient. I will re-run tests. However, there does appear to be a panic on startup in integration tests:

13:23:29 tempo: panic: runtime error: invalid memory address or nil pointer dereference
13:23:29 tempo: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x207cf8b]
13:23:29 tempo: goroutine 336 [running]:
13:23:29 tempo: go.opentelemetry.io/collector/internal/sharedcomponent.(*SharedComponent[...]).Start.func1()
13:23:29 tempo: /home/runner/work/tempo/tempo/vendor/go.opentelemetry.io/collector/internal/sharedcomponent/sharedcomponent.go:92 +0xcb
13:23:29 tempo: sync.(*Once).doSlow(0xc00093d340?, 0x0?)
13:23:29 tempo: /opt/hostedtoolcache/go/1.21.3/x64/src/sync/once.go:74 +0xbf
13:23:29 tempo: sync.(*Once).Do(...)
13:23:29 tempo: /opt/hostedtoolcache/go/1.21.3/x64/src/sync/once.go:65
13:23:29 tempo: go.opentelemetry.io/collector/internal/sharedcomponent.(*SharedComponent[...]).Start(0x1da[264](https://github.com/grafana/tempo/actions/runs/6898014680/job/18785339327?pr=3148#step:4:265)7?, {0x2f28fe8?, 0xc000a11090?}, {0x2f296b0?, 0xc000268770?}
13:23:29 tempo: )
13:23:29 tempo: /home/runner/work/tempo/tempo/vendor/go.opentelemetry.io/collector/internal/sharedcomponent/sharedcomponent.go:87 +0x9b
13:23:29 tempo: github.com/grafana/tempo/modules/distributor/receiver.(*receiversShim).starting(0xc000[268](https://github.com/grafana/tempo/actions/runs/6898014680/job/18785339327?pr=3148#step:4:269)770, {0x2f28fe8, 0xc000a11090})
13:23:29 tempo: /home/runner/work/tempo/tempo/modules/distributor/receiver/shim.go:293 +0x83
13:23:29 tempo: github.com/grafana/dskit/services.(*BasicService).main(0xc0004bf360)
13:23:29 tempo: /home/runner/work/tempo/tempo/vendor/github.com/grafana/dskit/services/basic_service.go:157 +0x5a
13:23:29 tempo: created by github.com/grafana/dskit/services.(*BasicService).StartAsync.func1 in goroutine 331
13:23:29 tempo: /home/runner/work/tempo/tempo/vendor/github.com/grafana/dskit/services/basic_service.go:119 +0x105
    api_test.go:26: 
        	Error Trace:	/home/runner/work/tempo/tempo/integration/e2e/api_test.go:26
        	Error:      	Received unexpected error:
        	            	docker container tempo failed to start: exit status 1
        	Test:       	TestSearchTagValuesV2

Take a look and see what you can find. Let me know if you have any questions and I can help dig some.

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from 2069f0b to 721a39d Compare November 17, 2023 17:50
Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the changelog

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from 721a39d to 6795c92 Compare November 23, 2023 21:34
@gebn
Copy link
Contributor Author

gebn commented Nov 23, 2023

I've fixed the low-hanging fruit. There's still a SIGSEGV panic in vendor/go.opentelemetry.io/collector/internal/sharedcomponent/sharedcomponent.go:92 when running the e2e tests, however I'm not sure where it's coming from - an if _ == nil before each dereference is not entered.

@joe-elliott
Copy link
Member

The nil pointer dereference is b/c ReportComponentStatus is nil on the line referenced above (vendor/go.opentelemetry.io/collector/internal/sharedcomponent/sharedcomponent.go:92).

I can find two places this field is set:

https://github.com/open-telemetry/opentelemetry-collector/blob/main/service/service.go#L125

and

https://github.com/open-telemetry/opentelemetry-collector/blob/0ae738f0c8eebf669967cd1bdda5072929df4b9d/service/internal/servicetelemetry/telemetry_settings.go#L18

Maybe we need to create our receivers through one of these code paths to make sure the field is set correctly?

Running the following will show the issue. Integration tests not required:

go run ./cmd/tempo --storage.trace.backend=local --storage.trace.local.path=/tmp/tempo/traces --storage.trace.wal.path=/tmp/tempo/wal --server.http-listen-port=8080

@gebn
Copy link
Contributor Author

gebn commented Nov 30, 2023

Thanks @joe-elliott, I didn't realise it was a nil-able field as opposed to a method. I've dug through, and the simple fix is:

diff --git a/modules/distributor/receiver/shim.go b/modules/distributor/receiver/shim.go
index cf9ecd41d..11fc47c8d 100644
--- a/modules/distributor/receiver/shim.go
+++ b/modules/distributor/receiver/shim.go
@@ -228,11 +228,16 @@ func New(receiverCfg map[string]interface{}, pusher TracesPusher, middleware Mid
 
        // todo: propagate a real context?  translate our log configuration into zap?
        ctx := context.Background()
-       params := receiver.CreateSettings{TelemetrySettings: component.TelemetrySettings{
-               Logger:         zapLogger,
-               TracerProvider: tracenoop.NewTracerProvider(),
-               MeterProvider:  metricnoop.NewMeterProvider(),
-       }}
+       params := receiver.CreateSettings{
+               TelemetrySettings: component.TelemetrySettings{
+                       Logger:         zapLogger,
+                       TracerProvider: tracenoop.NewTracerProvider(),
+                       MeterProvider:  metricnoop.NewMeterProvider(),
+                       ReportComponentStatus: func(*component.StatusEvent) error {
+                               return nil
+                       },
+               },
+       }
 
        for componentID, cfg := range conf.Receivers {
                factoryBase := receiverFactories[componentID.Type()]

As you mention, it looks like the OTel Collector authors assumed Service would be used to indirectly create the TelemetrySettings struct. There are no exported abstractions for doing this elsewhere. This struct appears to effectively represent the entire collector - it houses build info, creates a logger from scratch, and uses status changes to signal shutdown. Indeed the whole feature seems to have been introduced purely to ease implementation of the official collector, which is presumably not what Tempo is interested in. It may be desirable to hook Tempo's existing lifecycle management into this callback, however given Tempo currently handles errors correctly, perhaps that can come later.

I've pushed the no-op implementation. Please let me know your thoughts.

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch 2 times, most recently from 9da48c6 to e552879 Compare November 30, 2023 01:54
@gebn gebn changed the title Upgrade open-telemetry/opentelemetry-collector-contrib to v0.89.0 Upgrade open-telemetry/opentelemetry-collector-contrib to v0.90.0 Nov 30, 2023
@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from e552879 to 337d923 Compare November 30, 2023 02:20
@gebn gebn changed the title Upgrade open-telemetry/opentelemetry-collector-contrib to v0.90.0 Upgrade open-telemetry/opentelemetry-collector-contrib to v0.89.0 Nov 30, 2023
@gebn
Copy link
Contributor Author

gebn commented Nov 30, 2023

Please hold off on merging, I'm still going through approvals for AGPL contribution. I'll take a look at the tempodb tests

@joe-elliott
Copy link
Member

Please hold off on merging, I'm still going through approvals for AGPL contribution.

Will do.

I'll take a look at the tempodb tests

@mapno was having this issue yesterday. He solved it by rebasing onto main (vs. merging main into his branch). It's a really weird test failure and he was unable to track down the root cause.

@mapno
Copy link
Member

mapno commented Nov 30, 2023

$ git pull --rebase origin main fixed it for me

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from 337d923 to c7d70f7 Compare December 3, 2023 22:00
@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch 2 times, most recently from a80e149 to a4bd230 Compare December 11, 2023 00:11
@gebn
Copy link
Contributor Author

gebn commented Dec 11, 2023

Rebased off main. I now have approval for contributing, so also switched commit author to my work email 👍

@gebn gebn force-pushed the otel-collector-contrib@v0.89.0 branch from a4bd230 to ad2ad43 Compare December 11, 2023 00:57
@joe-elliott
Copy link
Member

Great update! Thanks.

@joe-elliott joe-elliott merged commit 12f5af8 into grafana:main Dec 11, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants