Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic due to index out of bounds #3063

Closed
gsaraf opened this issue Aug 4, 2022 · 10 comments
Closed

Panic due to index out of bounds #3063

gsaraf opened this issue Aug 4, 2022 · 10 comments
Assignees
Labels
bug Something isn't working
Projects

Comments

@gsaraf
Copy link

gsaraf commented Aug 4, 2022

Description

I'm using Buildkitd (v0.10.3) in a container, and have used the JAEGER_TRACE env var to enable open telemetry tracing pointed to a Jaeger all in one instance. Occasionally, the Buildkitd process will panic with a stack trace such as (two examples below). This is bad because any running builds will be discontinued, and clients get an ugly error.
Stack trace 1:

panic: runtime error: slice bounds out of range [3483:3471]

goroutine 247642 [running]:
bytes.(*Buffer).Write(0xc003cf9a88, {0xc000c8423d, 0xeda7da3fd, 0x40323e})
    /usr/local/go/src/bytes/buffer.go:174 +0xd6
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeVarint32(0xe28c8a, 0x158640)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:691 +0x6b
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).WriteI16(0xc0004d8280, {0xc000a74c30, 0x8}, 0xd340)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:307 +0x26
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeFieldBeginInternal(0xc000c841e0, {0x154cb20, 0xc007b02500}, {0xc003cf87d8, 0x0}, 0x8, 0x1, 0x0)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:234 +0xed
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).WriteFieldBegin(0xc000c841e0, {0x154cb20, 0xc007b02500}, {0x137455b, 0x40323e}, 0xc8, 0x3f5)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:202 +0x73
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Tag).writeField1(0xc005265c70, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:456 +0x64\go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Tag).Write(0xc005265c70, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:424 +0x90
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Log).writeField2(0xc0008be280, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:763 +0x1e5
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Log).Write(0xc0008be280, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:729 +0xb3
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Span).writeField11(0xc006bde2d0, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:1679 +0x205
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Span).Write(0xc006bde2d0, {0x154cb20, 0xc007b02500}, {0x1582028, 0xc000c841e0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:1507 +0x227
go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).calcSizeOfSerializedThrift(0xc000700c60, {0x154cb20, 0xc007b02500}, {0x1532d68, 0xc006bde2d0})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/agent.go:205 +0x5c
go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).EmitBatch(0xc000700c60, {0x154cb20, 0xc007b02500}, 0xc009a162a0)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/agent.go:137 +0x188
go.opentelemetry.io/otel/exporters/jaeger.(*agentUploader).upload(0xc0063cc420, {0x154cb20, 0xc007b02500}, 0xc000114340)
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/uploader.go:269 +0x26
go.opentelemetry.io/otel/exporters/jaeger.(*Exporter).ExportSpans(0xc00040ee00, {0x154cb20, 0xc007b022c0}, {0xc0063cc420, 0x4, 0x6})
    /src/vendor/go.opentelemetry.io/otel/exporters/jaeger/jaeger.go:105 +0x29c
github.com/moby/buildkit/control.(*Controller).Export(0xc000c4af00, {0x154cb20, 0xc007b022c0}, 0x75e363b52d889d26)
    /src/control/control.go:201 +0x97
go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler.func1({0x154cb20, 0xc007b022c0}, {0x125c620, 0xc000e755c0})
    /src/vendor/go.opentelemetry.io/proto/otlp/collector/trace/v1/trace_service_grpc.pb.go:85 +0x78
github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor({0x154cb20, 0xc007b022c0}, {0x125c620, 0xc000e755c0}, 0xc0005a8680, 0x109a6c8)
    /src/util/grpcerrors/intercept.go:14 +0x3d
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x154cb20, 0xc007b022c0}, {0x125c620, 0xc000e755c0})
    /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
main.unaryInterceptor.func1({0x154cbc8, 0xc009a16120}, {0x125c620, 0xc000e755c0}, 0xc000c15b40, 0xc000c15b60)
    /src/cmd/buildkitd/main.go:563 +0x31f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x154cbc8, 0xc009a16120}, {0x125c620, 0xc000e755c0})
    /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x154cbc8, 0xc009a16120}, {0x125c620, 0xc000e755c0}, 0xc00275abd0, 0x11c54a0)
    /src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xbf
go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler({0x12dab20, 0xc000c4af00}, {0x154cbc8, 0xc009a16120}, 0xc008dd6180, 0xc000c3aea0)
    /src/vendor/go.opentelemetry.io/proto/otlp/collector/trace/v1/trace_service_grpc.pb.go:87 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000c50540, {0x15651f0, 0xc001215ba0}, 0xc000e3e240, 0xc000918b70, 0x1e76410, 0x0)
    /src/vendor/google.golang.org/grpc/server.go:1282 +0xccf
google.golang.org/grpc.(*Server).handleStream(0xc000c50540, {0x15651f0, 0xc001215ba0}, 0xc000e3e240, 0x0)
    /src/vendor/google.golang.org/grpc/server.go:1619 +0xa2a
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    /src/vendor/google.golang.org/grpc/server.go:921 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /src/vendor/google.golang.org/grpc/server.go:919 +0x294	

Stack trace 2:

panic: runtime error: index out of range [1241] with length 0

goroutine 10949 [running]:
bytes.(*Buffer).WriteByte(0xc000740640, 0x27)
	/usr/local/go/src/bytes/buffer.go:269 +0x85
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeByteDirect(...)
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:736
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeFieldBeginInternal(0xc000bda500, {0x154cb20, 0xc001edcd40}, {0x1375a58, 0x0}, 0x5, 0x2, 0x0)
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:224 +0x9f
go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).WriteFieldBegin(0xc000bda500, {0x154cb20, 0xc001edcd40}, {0x137616f, 0x40323e}, 0x3e, 0x40)
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:202 +0x73
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Tag).writeField2(0xc002052aa0, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:469 +0x64
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Tag).Write(0xc002052aa0, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:427 +0xb7
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Log).writeField2(0xc001848ce0, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:763 +0x1e5
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Log).Write(0xc001848ce0, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:729 +0xb3
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Span).writeField11(0xc0006bf680, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:1679 +0x205
go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Span).Write(0xc0006bf680, {0x154cb20, 0xc001edcd40}, {0x1582028, 0xc000bda500})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger/jaeger.go:1507 +0x227
go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).calcSizeOfSerializedThrift(0xc000596240, {0x154cb20, 0xc001edcd40}, {0x1532d68, 0xc0006bf680})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/agent.go:205 +0x5c
go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).EmitBatch(0xc000596240, {0x154cb20, 0xc001edcd40}, 0xc00192bc80)
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/agent.go:137 +0x188
go.opentelemetry.io/otel/exporters/jaeger.(*agentUploader).upload(0xc0022cf360, {0x154cb20, 0xc001edcd40}, 0xc000422a80)
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/uploader.go:269 +0x26
go.opentelemetry.io/otel/exporters/jaeger.(*Exporter).ExportSpans(0xc0000c1380, {0x154cb20, 0xc001edc9c0}, {0xc0022cf360, 0x5, 0x5})
	/src/vendor/go.opentelemetry.io/otel/exporters/jaeger/jaeger.go:105 +0x29c
github.com/moby/buildkit/control.(*Controller).Export(0xc000bdaaa0, {0x154cb20, 0xc001edc9c0}, 0x432d2fa0dc942e73)
	/src/control/control.go:201 +0x97
go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler.func1({0x154cb20, 0xc001edc9c0}, {0x125c620, 0xc00175e2c0})
	/src/vendor/go.opentelemetry.io/proto/otlp/collector/trace/v1/trace_service_grpc.pb.go:85 +0x78
github.com/moby/buildkit/util/grpcerrors.UnaryServerInterceptor({0x154cb20, 0xc001edc9c0}, {0x125c620, 0xc00175e2c0}, 0xc000e09ba0, 0x109a6c8)
	/src/util/grpcerrors/intercept.go:14 +0x3d
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x154cb20, 0xc001edc9c0}, {0x125c620, 0xc00175e2c0})
	/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
main.unaryInterceptor.func1({0x154cbc8, 0xc00192b320}, {0x125c620, 0xc00175e2c0}, 0xc001848800, 0xc001848840)
	/src/cmd/buildkitd/main.go:563 +0x31f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x154cbc8, 0xc00192b320}, {0x125c620, 0xc00175e2c0})
	/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x154cbc8, 0xc00192b320}, {0x125c620, 0xc00175e2c0}, 0xc0010edbd0, 0x11c54a0)
	/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xbf
go.opentelemetry.io/proto/otlp/collector/trace/v1._TraceService_Export_Handler({0x12dab20, 0xc000bdaaa0}, {0x154cbc8, 0xc00192b320}, 0xc00185f1a0, 0xc000119680)
	/src/vendor/go.opentelemetry.io/proto/otlp/collector/trace/v1/trace_service_grpc.pb.go:87 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000a6c000, {0x15651f0, 0xc000e76820}, 0xc000646120, 0xc000591d70, 0x1e76410, 0x0)
	/src/vendor/google.golang.org/grpc/server.go:1282 +0xccf
google.golang.org/grpc.(*Server).handleStream(0xc000a6c000, {0x15651f0, 0xc000e76820}, 0xc000646120, 0x0)
	/src/vendor/google.golang.org/grpc/server.go:1619 +0xa2a
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/src/vendor/google.golang.org/grpc/server.go:921 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/src/vendor/google.golang.org/grpc/server.go:919 +0x294

Environment

  • OS: Linux
  • Architecture: x86_64
  • Go Version: 1.17
  • opentelemetry-go version: from buildkit's go.mod - go.opentelemetry.io/otel v1.4.1

Steps To Reproduce

Unsure, unfortunately, beyond running buildkitd, with JAEGER_TRACE on.

Expected behavior

No panic? :)

I'm not entirely sure this is the right project to report this panic. Please let me know if there is any other info I can supply that will help.

@gsaraf gsaraf added the bug Something isn't working label Aug 4, 2022
@MrAlias
Copy link
Contributor

MrAlias commented Aug 4, 2022

What code are you running with buildkit?

It is unclear if buildkit is generating unexpected telemetry, if we are generating unexpected telemetry, or the thrift library is not handling valid telemetry.

@MrAlias MrAlias added this to Needs triage in Bugs via automation Aug 4, 2022
@MrAlias MrAlias added the response needed Waiting on user input before progress can be made label Aug 4, 2022
@Aneurysm9
Copy link
Member

Have you raised this issue with buildkit? It looks like they are using a custom controller to handle interaction with the exporter. That controller has a mutex on it, implying it might be used concurrently, but that mutex is not locked around interactions with the exporter, which is not safe for concurrent use.

@gsaraf
Copy link
Author

gsaraf commented Aug 4, 2022

Thanks for the quick responses!

@MrAlias - I'm not sure what code you are referring to. We use buildctl as the client for a buildkitd server running in a container.

@Aneurysm9 - I haven't yet. Thanks for the pointer! I'll take a look and bring it up with them.

@tritueviet
Copy link

Same issue

2022-08-09 09:10:06 | library/http_caller/http_caller.go:428 +0x65
| | 2022-08-09 09:10:06 | .../library/http_caller.(*http_caller).Receive(0x1b57780?, {0x1aba5c0, 0xc0013161b0}, {0x1abf0c0, 0xc00108ff80})
| | 2022-08-09 09:10:06 | library/http_caller/http_caller.go:458 +0x258
| | 2022-08-09 09:10:06 | .../library/http_caller.(*http_caller).Do(0xc0017ad140, 0x0?, {0x1aba5c0, 0xc0013161b0}, {0x1abf0c0?, 0xc00108ff80})
| | 2022-08-09 09:10:06 | library/http_caller/http_caller.go:478 +0xfb
| | 2022-08-09 09:10:06 | .../library/http_caller.decodeResponse(0xc00199b710, 0x1daeb80?, {0x21afd40, 0x3666390}, {0x1aba5c0?, 0xc0013161b0}, {0x1abf0c0?, 0xc00108ff80})
| | 2022-08-09 09:10:06 | library/http_caller/decoder.go:24
| | 2022-08-09 09:10:06 | .../library/http_caller.jsonDecoder.Decode(...)
| | 2022-08-09 09:10:06 | GOROOT/src/encoding/json/stream.go:63 +0x78
| | 2022-08-09 09:10:06 | encoding/json.(*Decoder).Decode(0xc002618140, {0x1aba5c0, 0xc0013161b0})
| | 2022-08-09 09:10:06 | GOROOT/src/encoding/json/stream.go:140 +0xbb
| | 2022-08-09 09:10:06 | encoding/json.(*Decoder).readValue(0xc002618140)
| | 2022-08-09 09:10:06 | GOROOT/src/encoding/json/stream.go:165 +0x17f
| | 2022-08-09 09:10:06 | encoding/json.(*Decoder).refill(0xc002618140)
| | 2022-08-09 09:10:06 | library/otel/http/stats.go:229 +0x33
| | 2022-08-09 09:10:06 | .../library/otel/http.(*tracker).Read(0xc00248a380, {0xc002678600?, 0x1b07780?, 0xc002618101?})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_contrib_instrumentation_net_http_otelhttp/transport.go:128 +0xf0
| | 2022-08-09 09:10:06 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*wrappedBody).Read(0xc00263b500, {0xc002678600?, 0x7f860ca86b38?, 0x1000?})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_sdk/trace/span.go:280 +0x8dc
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End(0xc001796000, {0x0, 0x0, 0x800?})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_sdk/trace/simple_span_processor.go:58 +0x1e6
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/sdk/trace.(*simpleSpanProcessor).OnEnd(0xc000180c40, {0x21d24e8?, 0xc002615800})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/jaeger.go:105 +0x299
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger.(*Exporter).ExportSpans(0xc000180940, {0x21c4028, 0xc00005e040}, {0xc001d03c20, 0x1, 0x1})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/uploader.go:260 +0x26
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger.(*agentUploader).upload(0xc001d03c20?, {0x21c3ff0?, 0xc00066c1c0?}, 0xc0001524c8?)
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/agent.go:120 +0x58
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).EmitBatch(0xc0000dec60, {0x21c3ff0, 0xc00066c1c0}, 0xc00263b5c0)
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/agent.go:196 +0x5c
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger.(*agentClientUDP).calcSizeOfSerializedThrift(0xc0000dec60, {0x21c3ff0?, 0xc00066c1c0?}, {0x21b2c88?, 0xc00263b590?})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/internal/gen-go/jaeger/jaeger.go:1874 +0x8c
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Process).Write(0xc00263b590, {0x21c3ff0, 0xc00066c1c0}, {0x21dabc8?, 0xc00013d680?})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/internal/gen-go/jaeger/jaeger.go:1891 +0x64
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger/internal/gen-go/jaeger.(*Process).writeField1(0xc00263b590, {0x21c3ff0, 0xc00066c1c0}, {0x21dabc8, 0xc00013d680})
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:202 +0x73
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).WriteFieldBegin(0xc0017ac5a8?, {0x21c3ff0?, 0xc00066c1c0?}, {0x1e9dad1?, 0xc0017ac5a8?}, 0x78?, 0x17a?)
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:224 +0x9f
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeFieldBeginInternal(0xc00013d680, {0x21c3ff0, 0xc00066c1c0}, {0x0?, 0x8?}, 0x30?, 0x1, 0xc0?)
| | 2022-08-09 09:10:06 | external/io_opentelemetry_go_otel_exporters_jaeger/internal/third_party/thrift/lib/go/thrift/compact_protocol.go:736
| | 2022-08-09 09:10:06 | go.opentelemetry.io/otel/exporters/jaeger/internal/third_party/thrift/lib/go/thrift.(*TCompactProtocol).writeByteDirect(...)
| | 2022-08-09 09:10:06 | GOROOT/src/bytes/buffer.go:269 +0x85
| | 2022-08-09 09:10:06 | bytes.(*Buffer).WriteByte(0x1b809e0?, 0x10?)
| | 2022-08-09 09:10:06 | GOROOT/src/runtime/panic.go:838 +0x207
| | 2022-08-09 09:10:06 | panic({0x1d33c60, 0xc00265e0a8})
| | 2022-08-09 09:10:06 | .../internal/middleware/interceptor.go:67 +0xdc
| | 2022-08-09 09:10:06 | .../.../internal/middleware.HandleErrorInterceptor.func1.1()
| | 2022-08-09 09:10:06 | GOROOT/src/runtime/debug/stack.go:16 +0x19
| | 2022-08-09 09:10:06 | runtime/debug.PrintStack()
| | 2022-08-09 09:10:06 | GOROOT/src/runtime/debug/stack.go:24 +0x65
| | 2022-08-09 09:10:06 | runtime/debug.Stack()
| | 2022-08-09 09:10:06 | goroutine 207703 [running]:

@gsaraf
Copy link
Author

gsaraf commented Aug 9, 2022

@tritueviet - What service is this from? Is this buildkit? I can't tell from the stack trace.

@Aneurysm9 - From preliminary testing, it looks like adding a mutex around the ExportSpans call does indeed solve. I'll followup with them :).

@gsaraf
Copy link
Author

gsaraf commented Aug 24, 2022

Hi! I've opened a PR to the buildkit repo, and they had some questions. @Aneurysm9 - would you mind participating in that discussion as well? I'm not sure I know how to answer. moby/buildkit#3058

I've also noticed the following comment in the SpanExporter interface:

	// This function is called synchronously, so there is no concurrency
	// safety requirement. However, due to the synchronous calling pattern,
	// it is critical that all timeouts and cancellations contained in the
	// passed context must be honored.

I'm not sure how the function being called synchronously relates to the concurrency safety requirement, and whether that is at all correct given that it seems there actually is a requirement. Should that be changed? Or am I just not understanding the comment?

@Aneurysm9
Copy link
Member

That comment is valid from the perspective of the consumers of the interface (or, at least, those that live in this repository). i.e., the batch span processor will not make concurrent invocations of its associated exporter. If there are other components that may invoke that method then the exporter will need to ensure that it is safe to call that method concurrently.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 20, 2022

It seems like that comment needs to rephrased to communicate to the user of the interface (not the implementer). E.g:

This method is not guaranteed to be concurrent safe.

The timeout and cancellation of the passed context halt the export when they expire. Data-loss is possible in this situation.

@gsaraf
Copy link
Author

gsaraf commented Dec 4, 2022

Since this is confusing, I'd like to change the wording - can I add what @MrAlias wrote?

@MrAlias MrAlias added help wanted Extra attention is needed and removed response needed Waiting on user input before progress can be made labels Feb 17, 2023
@pellared pellared self-assigned this Nov 2, 2023
@pellared pellared removed the help wanted Extra attention is needed label Nov 2, 2023
@pellared
Copy link
Member

pellared commented Nov 2, 2023

It seems like that comment needs to rephrased to communicate to the user of the interface (not the implementer).

I disagree. The interface should communicate the contract for the implementer. The implementation should have its own documentation and can communicate its concurrent safety.

I decide to close this issue as it is stale and there is no consensus on how to resolve it.

@pellared pellared closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Closed
Bugs
  
Needs triage
Development

No branches or pull requests

5 participants