From c5485bf4ddef9c0acce85bff1fed60f669aabf6a Mon Sep 17 00:00:00 2001 From: Joshua MacDonald Date: Mon, 13 May 2024 00:47:46 -0700 Subject: [PATCH 1/2] OpenTelemetry Protocol with Apache Arrow Exporter component (#31996) **Description:** This is the same code as OTel-Arrow at https://github.com/open-telemetry/otel-arrow/releases/tag/v0.23.0 (plus [backported lint and test fixes](https://github.com/open-telemetry/otel-arrow/commit/0910113d46454c80881db840e21f25485dce2499)). Only import statements change here, to match the host repository. **Link to tracking Issue:** #26491 **Testing:** Test coverage is approximately 90%. **Documentation:** I double-checked and the existing README had only a few updates needed. --- .chloggen/otelarrowexporter.yaml | 27 + exporter/otelarrowexporter/README.md | 15 +- exporter/otelarrowexporter/config.go | 36 +- exporter/otelarrowexporter/config_test.go | 28 +- exporter/otelarrowexporter/doc.go | 4 - exporter/otelarrowexporter/factory.go | 46 +- exporter/otelarrowexporter/factory_test.go | 13 +- .../generated_component_test.go | 132 ++ .../generated_package_test.go | 4 +- exporter/otelarrowexporter/go.mod | 28 +- exporter/otelarrowexporter/go.sum | 50 +- .../internal/arrow/bestofn.go | 152 +++ .../internal/arrow/common_test.go | 413 ++++++ .../internal/arrow/exporter.go | 345 ++++- .../internal/arrow/exporter_test.go | 890 ++++++++++++ .../internal/arrow/grpcmock/credentials.go | 74 + .../internal/arrow/prioritizer.go | 107 ++ .../internal/arrow/stream.go | 477 +++++++ .../internal/arrow/stream_test.go | 349 +++++ exporter/otelarrowexporter/metadata.yaml | 7 +- exporter/otelarrowexporter/otelarrow.go | 311 ++++- exporter/otelarrowexporter/otelarrow_test.go | 1189 +++++++++++++++++ .../otelarrowexporter/testdata/config.yaml | 1 + 23 files changed, 4568 insertions(+), 130 deletions(-) create mode 100644 .chloggen/otelarrowexporter.yaml create mode 100644 exporter/otelarrowexporter/internal/arrow/bestofn.go create mode 100644 exporter/otelarrowexporter/internal/arrow/common_test.go create mode 100644 exporter/otelarrowexporter/internal/arrow/exporter_test.go create mode 100644 exporter/otelarrowexporter/internal/arrow/grpcmock/credentials.go create mode 100644 exporter/otelarrowexporter/internal/arrow/prioritizer.go create mode 100644 exporter/otelarrowexporter/internal/arrow/stream.go create mode 100644 exporter/otelarrowexporter/internal/arrow/stream_test.go create mode 100644 exporter/otelarrowexporter/otelarrow_test.go diff --git a/.chloggen/otelarrowexporter.yaml b/.chloggen/otelarrowexporter.yaml new file mode 100644 index 000000000000..c1d7f0c8f147 --- /dev/null +++ b/.chloggen/otelarrowexporter.yaml @@ -0,0 +1,27 @@ +# Use this changelog template to create an entry for release notes. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: new_component + +# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) +component: OpenTelemetry Protocol with Apache Arrow Exporter + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: Implementation copied from opentelemetry/otel-arrow repository @v0.20.0. + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +issues: [26491] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: + +# If your change doesn't affect end users or the exported elements of any package, +# you should instead start your pull request title with [chore] or use the "Skip Changelog" label. +# Optional: The change log or logs in which this entry should be included. +# e.g. '[user]' or '[user, api]' +# Include 'user' if the change is relevant to end users. +# Include 'api' if there is a change to a library API. +# Default: '[user]' +change_logs: [user] diff --git a/exporter/otelarrowexporter/README.md b/exporter/otelarrowexporter/README.md index 4fdc86c1896f..f712ca42c125 100644 --- a/exporter/otelarrowexporter/README.md +++ b/exporter/otelarrowexporter/README.md @@ -31,11 +31,8 @@ Apache Arrow. OpenTelemetry Protocol with Apache Arrow supports column-oriented data transport using the Apache Arrow data format. This component converts OTLP data into an optimized representation and then sends batches of -data using Apache Arrow to encode the stream. The OpenTelemetry -Protocol with Apache Arrow receiver component contains logic to reverse the process used in this +data using Apache Arrow to encode the stream. The [OpenTelemetry +Protocol with Apache Arrow receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/otelarrowreceiver) component contains logic to reverse the process used in this component. The use of an OpenTelemetry Protocol with Apache Arrow @@ -51,7 +48,7 @@ exporter component. This is as simple as replacing "otlp" with To enable the OpenTelemetry Protocol with Apache Arrow exporter, include it in the list of exporters for a pipeline. The `endpoint` -setting is required. The `tls` setting is requirede for insecure +setting is required. The `tls` setting is required for insecure transport. - `endpoint` (no default): host:port to which the exporter is going to send OTLP trace data, @@ -143,13 +140,9 @@ exporters: When this is configured, the stream will terminate cleanly without causing retries, with `OK` gRPC status. -The corresponding `otelarrowreceiver` keepalive setting, that is +The [corresponding `otelarrowreceiver` keepalive setting](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/otelarrowreceiver#keepalive-configuration), that is compatible with the one above, reads: - - ``` receivers: otelarrow: diff --git a/exporter/otelarrowexporter/config.go b/exporter/otelarrowexporter/config.go index 96f5cb7d7c06..68837d818b16 100644 --- a/exporter/otelarrowexporter/config.go +++ b/exporter/otelarrowexporter/config.go @@ -15,6 +15,8 @@ import ( "go.opentelemetry.io/collector/config/configretry" "go.opentelemetry.io/collector/exporter/exporterhelper" "google.golang.org/grpc" + + "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" ) // Config defines configuration for OTLP exporter. @@ -26,12 +28,12 @@ type Config struct { exporterhelper.TimeoutSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct. exporterhelper.QueueSettings `mapstructure:"sending_queue"` - RetrySettings configretry.BackOffConfig `mapstructure:"retry_on_failure"` + RetryConfig configretry.BackOffConfig `mapstructure:"retry_on_failure"` configgrpc.ClientConfig `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct. // Arrow includes settings specific to OTel Arrow. - Arrow ArrowSettings `mapstructure:"arrow"` + Arrow ArrowConfig `mapstructure:"arrow"` // UserDialOptions cannot be configured via `mapstructure` // schemes. This is useful for custom purposes where the @@ -40,9 +42,9 @@ type Config struct { UserDialOptions []grpc.DialOption `mapstructure:"-"` } -// ArrowSettings includes whether Arrow is enabled and the number of +// ArrowConfig includes whether Arrow is enabled and the number of // concurrent Arrow streams. -type ArrowSettings struct { +type ArrowConfig struct { // NumStreams determines the number of OTel Arrow streams. NumStreams int `mapstructure:"num_streams"` @@ -65,7 +67,7 @@ type ArrowSettings struct { // Note that `Zstd` applies to gRPC, not Arrow compression. PayloadCompression configcompression.Type `mapstructure:"payload_compression"` - // Disabled prevents using OTel Arrow streams. The exporter + // Disabled prevents using OTel-Arrow streams. The exporter // falls back to standard OTLP. Disabled bool `mapstructure:"disabled"` @@ -73,24 +75,18 @@ type ArrowSettings struct { // to standard OTLP. If the Arrow service is unavailable, it // will retry and/or fail. DisableDowngrade bool `mapstructure:"disable_downgrade"` + + // Prioritizer is a policy name for how load is distributed + // across streams. + Prioritizer arrow.PrioritizerName `mapstructure:"prioritizer"` } var _ component.Config = (*Config)(nil) -// Validate checks if the exporter configuration is valid -func (cfg *Config) Validate() error { - if err := cfg.QueueSettings.Validate(); err != nil { - return fmt.Errorf("queue settings has invalid configuration: %w", err) - } - if err := cfg.Arrow.Validate(); err != nil { - return fmt.Errorf("arrow settings has invalid configuration: %w", err) - } - - return nil -} +var _ component.ConfigValidator = (*ArrowConfig)(nil) // Validate returns an error when the number of streams is less than 1. -func (cfg *ArrowSettings) Validate() error { +func (cfg *ArrowConfig) Validate() error { if cfg.NumStreams < 1 { return fmt.Errorf("stream count must be > 0: %d", cfg.NumStreams) } @@ -103,6 +99,10 @@ func (cfg *ArrowSettings) Validate() error { return fmt.Errorf("zstd encoder: invalid configuration: %w", err) } + if err := cfg.Prioritizer.Validate(); err != nil { + return fmt.Errorf("invalid prioritizer: %w", err) + } + // The cfg.PayloadCompression field is validated by the underlying library, // but we only support Zstd or none. switch cfg.PayloadCompression { @@ -113,7 +113,7 @@ func (cfg *ArrowSettings) Validate() error { return nil } -func (cfg *ArrowSettings) toArrowProducerOptions() (arrowOpts []config.Option) { +func (cfg *ArrowConfig) toArrowProducerOptions() (arrowOpts []config.Option) { switch cfg.PayloadCompression { case configcompression.TypeZstd: arrowOpts = append(arrowOpts, config.WithZstd()) diff --git a/exporter/otelarrowexporter/config_test.go b/exporter/otelarrowexporter/config_test.go index e855fa078d57..b1a6253837df 100644 --- a/exporter/otelarrowexporter/config_test.go +++ b/exporter/otelarrowexporter/config_test.go @@ -22,6 +22,8 @@ import ( "go.opentelemetry.io/collector/config/configtls" "go.opentelemetry.io/collector/confmap/confmaptest" "go.opentelemetry.io/collector/exporter/exporterhelper" + + "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" ) func TestUnmarshalDefaultConfig(t *testing.T) { @@ -32,6 +34,7 @@ func TestUnmarshalDefaultConfig(t *testing.T) { assert.NoError(t, component.UnmarshalConfig(cm, cfg)) assert.Equal(t, factory.CreateDefaultConfig(), cfg) assert.Equal(t, "round_robin", cfg.(*Config).ClientConfig.BalancerName) + assert.Equal(t, arrow.DefaultPrioritizer, cfg.(*Config).Arrow.Prioritizer) } func TestUnmarshalConfig(t *testing.T) { @@ -45,7 +48,7 @@ func TestUnmarshalConfig(t *testing.T) { TimeoutSettings: exporterhelper.TimeoutSettings{ Timeout: 10 * time.Second, }, - RetrySettings: configretry.BackOffConfig{ + RetryConfig: configretry.BackOffConfig{ Enabled: true, InitialInterval: 10 * time.Second, RandomizationFactor: 0.7, @@ -79,20 +82,21 @@ func TestUnmarshalConfig(t *testing.T) { }, WriteBufferSize: 512 * 1024, BalancerName: "experimental", - Auth: &configauth.Authentication{AuthenticatorID: component.MustNewID("nop")}, + Auth: &configauth.Authentication{AuthenticatorID: component.NewID(component.MustNewType("nop"))}, }, - Arrow: ArrowSettings{ + Arrow: ArrowConfig{ NumStreams: 2, MaxStreamLifetime: 2 * time.Hour, PayloadCompression: configcompression.TypeZstd, Zstd: zstd.DefaultEncoderConfig(), + Prioritizer: "leastloaded8", }, }, cfg) } -func TestArrowSettingsValidate(t *testing.T) { - settings := func(enabled bool, numStreams int, maxStreamLifetime time.Duration, level zstd.Level) *ArrowSettings { - return &ArrowSettings{ +func TestArrowConfigValidate(t *testing.T) { + settings := func(enabled bool, numStreams int, maxStreamLifetime time.Duration, level zstd.Level) *ArrowConfig { + return &ArrowConfig{ Disabled: !enabled, NumStreams: numStreams, MaxStreamLifetime: maxStreamLifetime, @@ -118,16 +122,16 @@ func TestArrowSettingsValidate(t *testing.T) { require.Error(t, settings(true, math.MaxInt, 10*time.Second, zstd.MaxLevel+1).Validate()) } -func TestDefaultSettingsValid(t *testing.T) { +func TestDefaultConfigValid(t *testing.T) { cfg := createDefaultConfig() // this must be set by the user and config // validation always checks that a value is set. cfg.(*Config).Arrow.MaxStreamLifetime = 2 * time.Second - require.NoError(t, cfg.(*Config).Validate()) + require.NoError(t, component.ValidateConfig(cfg)) } -func TestArrowSettingsPayloadCompressionZstd(t *testing.T) { - settings := ArrowSettings{ +func TestArrowConfigPayloadCompressionZstd(t *testing.T) { + settings := ArrowConfig{ PayloadCompression: configcompression.TypeZstd, } var config config.Config @@ -137,9 +141,9 @@ func TestArrowSettingsPayloadCompressionZstd(t *testing.T) { require.True(t, config.Zstd) } -func TestArrowSettingsPayloadCompressionNone(t *testing.T) { +func TestArrowConfigPayloadCompressionNone(t *testing.T) { for _, value := range []string{"", "none"} { - settings := ArrowSettings{ + settings := ArrowConfig{ PayloadCompression: configcompression.Type(value), } var config config.Config diff --git a/exporter/otelarrowexporter/doc.go b/exporter/otelarrowexporter/doc.go index e76c5e35612b..bb6fcbefc7ad 100644 --- a/exporter/otelarrowexporter/doc.go +++ b/exporter/otelarrowexporter/doc.go @@ -3,8 +3,4 @@ //go:generate mdatagen metadata.yaml -// Package otelarrowexporter exports telemetry using OpenTelemetry -// Protocol with Apache Arrow and/or standard OpenTelemetry Protocol -// data using configuration structures similar to the core OTLP -// exporter. package otelarrowexporter // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter" diff --git a/exporter/otelarrowexporter/factory.go b/exporter/otelarrowexporter/factory.go index edc1c5f2c3fc..9a459f14e8dc 100644 --- a/exporter/otelarrowexporter/factory.go +++ b/exporter/otelarrowexporter/factory.go @@ -25,7 +25,7 @@ import ( "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/metadata" ) -// NewFactory creates a factory for OTel-Arrow exporter. +// NewFactory creates a factory for OTLP exporter. func NewFactory() exporter.Factory { return exporter.NewFactory( metadata.Type, @@ -39,9 +39,8 @@ func NewFactory() exporter.Factory { func createDefaultConfig() component.Config { return &Config{ TimeoutSettings: exporterhelper.NewDefaultTimeoutSettings(), - RetrySettings: configretry.NewDefaultBackOffConfig(), + RetryConfig: configretry.NewDefaultBackOffConfig(), QueueSettings: exporterhelper.NewDefaultQueueSettings(), - ClientConfig: configgrpc.ClientConfig{ Headers: map[string]configopaque.String{}, // Default to zstd compression @@ -54,11 +53,12 @@ func createDefaultConfig() component.Config { // destination. BalancerName: "round_robin", }, - Arrow: ArrowSettings{ + Arrow: ArrowConfig{ NumStreams: runtime.NumCPU(), MaxStreamLifetime: time.Hour, - Zstd: zstd.DefaultEncoderConfig(), + Zstd: zstd.DefaultEncoderConfig(), + Prioritizer: arrow.DefaultPrioritizer, // PayloadCompression is off by default because gRPC // compression is on by default, above. @@ -67,14 +67,14 @@ func createDefaultConfig() component.Config { } } -func (e *baseExporter) helperOptions() []exporterhelper.Option { +func (oce *baseExporter) helperOptions() []exporterhelper.Option { return []exporterhelper.Option{ exporterhelper.WithCapabilities(consumer.Capabilities{MutatesData: false}), - exporterhelper.WithTimeout(e.config.TimeoutSettings), - exporterhelper.WithRetry(e.config.RetrySettings), - exporterhelper.WithQueue(e.config.QueueSettings), - exporterhelper.WithStart(e.start), - exporterhelper.WithShutdown(e.shutdown), + exporterhelper.WithTimeout(oce.config.TimeoutSettings), + exporterhelper.WithRetry(oce.config.RetryConfig), + exporterhelper.WithQueue(oce.config.QueueSettings), + exporterhelper.WithStart(oce.start), + exporterhelper.WithShutdown(oce.shutdown), } } @@ -97,13 +97,13 @@ func createTracesExporter( set exporter.CreateSettings, cfg component.Config, ) (exporter.Traces, error) { - exp, err := newExporter(cfg, set, createArrowTracesStream) + oce, err := newExporter(cfg, set, createArrowTracesStream) if err != nil { return nil, err } - return exporterhelper.NewTracesExporter(ctx, exp.settings, exp.config, - exp.pushTraces, - exp.helperOptions()..., + return exporterhelper.NewTracesExporter(ctx, oce.settings, oce.config, + oce.pushTraces, + oce.helperOptions()..., ) } @@ -116,13 +116,13 @@ func createMetricsExporter( set exporter.CreateSettings, cfg component.Config, ) (exporter.Metrics, error) { - exp, err := newExporter(cfg, set, createArrowMetricsStream) + oce, err := newExporter(cfg, set, createArrowMetricsStream) if err != nil { return nil, err } - return exporterhelper.NewMetricsExporter(ctx, exp.settings, exp.config, - exp.pushMetrics, - exp.helperOptions()..., + return exporterhelper.NewMetricsExporter(ctx, oce.settings, oce.config, + oce.pushMetrics, + oce.helperOptions()..., ) } @@ -135,12 +135,12 @@ func createLogsExporter( set exporter.CreateSettings, cfg component.Config, ) (exporter.Logs, error) { - exp, err := newExporter(cfg, set, createArrowLogsStream) + oce, err := newExporter(cfg, set, createArrowLogsStream) if err != nil { return nil, err } - return exporterhelper.NewLogsExporter(ctx, exp.settings, exp.config, - exp.pushLogs, - exp.helperOptions()..., + return exporterhelper.NewLogsExporter(ctx, oce.settings, oce.config, + oce.pushLogs, + oce.helperOptions()..., ) } diff --git a/exporter/otelarrowexporter/factory_test.go b/exporter/otelarrowexporter/factory_test.go index 45d20e553288..75dfcc736a4b 100644 --- a/exporter/otelarrowexporter/factory_test.go +++ b/exporter/otelarrowexporter/factory_test.go @@ -22,6 +22,8 @@ import ( "go.opentelemetry.io/collector/config/configtls" "go.opentelemetry.io/collector/exporter/exporterhelper" "go.opentelemetry.io/collector/exporter/exportertest" + + "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" ) func TestCreateDefaultConfig(t *testing.T) { @@ -31,16 +33,17 @@ func TestCreateDefaultConfig(t *testing.T) { assert.NoError(t, componenttest.CheckConfigStruct(cfg)) ocfg, ok := factory.CreateDefaultConfig().(*Config) assert.True(t, ok) - assert.Equal(t, ocfg.RetrySettings, configretry.NewDefaultBackOffConfig()) + assert.Equal(t, ocfg.RetryConfig, configretry.NewDefaultBackOffConfig()) assert.Equal(t, ocfg.QueueSettings, exporterhelper.NewDefaultQueueSettings()) assert.Equal(t, ocfg.TimeoutSettings, exporterhelper.NewDefaultTimeoutSettings()) assert.Equal(t, ocfg.Compression, configcompression.TypeZstd) - assert.Equal(t, ocfg.Arrow, ArrowSettings{ + assert.Equal(t, ocfg.Arrow, ArrowConfig{ Disabled: false, NumStreams: runtime.NumCPU(), MaxStreamLifetime: time.Hour, PayloadCompression: "", Zstd: zstd.DefaultEncoderConfig(), + Prioritizer: arrow.DefaultPrioritizer, }) } @@ -185,8 +188,8 @@ func TestCreateTracesExporter(t *testing.T) { t.Run(tt.name, func(t *testing.T) { factory := NewFactory() set := exportertest.NewNopCreateSettings() - config := tt.config - consumer, err := factory.CreateTracesExporter(context.Background(), set, &config) + cfg := tt.config + consumer, err := factory.CreateTracesExporter(context.Background(), set, &cfg) if tt.mustFailOnCreate { assert.NotNil(t, err) return @@ -225,7 +228,7 @@ func TestCreateArrowTracesExporter(t *testing.T) { factory := NewFactory() cfg := factory.CreateDefaultConfig().(*Config) cfg.ClientConfig.Endpoint = testutil.GetAvailableLocalAddress(t) - cfg.Arrow = ArrowSettings{ + cfg.Arrow = ArrowConfig{ NumStreams: 1, } set := exportertest.NewNopCreateSettings() diff --git a/exporter/otelarrowexporter/generated_component_test.go b/exporter/otelarrowexporter/generated_component_test.go index 0b323ce6937d..aa7ea0c9a47f 100644 --- a/exporter/otelarrowexporter/generated_component_test.go +++ b/exporter/otelarrowexporter/generated_component_test.go @@ -3,10 +3,20 @@ package otelarrowexporter import ( + "context" "testing" + "time" "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/component" "go.opentelemetry.io/collector/component/componenttest" + "go.opentelemetry.io/collector/confmap/confmaptest" + "go.opentelemetry.io/collector/exporter" + "go.opentelemetry.io/collector/exporter/exportertest" + "go.opentelemetry.io/collector/pdata/pcommon" + "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/ptrace" ) func TestComponentFactoryType(t *testing.T) { @@ -16,3 +26,125 @@ func TestComponentFactoryType(t *testing.T) { func TestComponentConfigStruct(t *testing.T) { require.NoError(t, componenttest.CheckConfigStruct(NewFactory().CreateDefaultConfig())) } + +func TestComponentLifecycle(t *testing.T) { + factory := NewFactory() + + tests := []struct { + name string + createFn func(ctx context.Context, set exporter.CreateSettings, cfg component.Config) (component.Component, error) + }{ + + { + name: "logs", + createFn: func(ctx context.Context, set exporter.CreateSettings, cfg component.Config) (component.Component, error) { + return factory.CreateLogsExporter(ctx, set, cfg) + }, + }, + + { + name: "metrics", + createFn: func(ctx context.Context, set exporter.CreateSettings, cfg component.Config) (component.Component, error) { + return factory.CreateMetricsExporter(ctx, set, cfg) + }, + }, + + { + name: "traces", + createFn: func(ctx context.Context, set exporter.CreateSettings, cfg component.Config) (component.Component, error) { + return factory.CreateTracesExporter(ctx, set, cfg) + }, + }, + } + + cm, err := confmaptest.LoadConf("metadata.yaml") + require.NoError(t, err) + cfg := factory.CreateDefaultConfig() + sub, err := cm.Sub("tests::config") + require.NoError(t, err) + require.NoError(t, component.UnmarshalConfig(sub, cfg)) + + for _, test := range tests { + t.Run(test.name+"-shutdown", func(t *testing.T) { + c, err := test.createFn(context.Background(), exportertest.NewNopCreateSettings(), cfg) + require.NoError(t, err) + err = c.Shutdown(context.Background()) + require.NoError(t, err) + }) + t.Run(test.name+"-lifecycle", func(t *testing.T) { + c, err := test.createFn(context.Background(), exportertest.NewNopCreateSettings(), cfg) + require.NoError(t, err) + host := componenttest.NewNopHost() + err = c.Start(context.Background(), host) + require.NoError(t, err) + require.NotPanics(t, func() { + switch test.name { + case "logs": + e, ok := c.(exporter.Logs) + require.True(t, ok) + logs := generateLifecycleTestLogs() + if !e.Capabilities().MutatesData { + logs.MarkReadOnly() + } + err = e.ConsumeLogs(context.Background(), logs) + case "metrics": + e, ok := c.(exporter.Metrics) + require.True(t, ok) + metrics := generateLifecycleTestMetrics() + if !e.Capabilities().MutatesData { + metrics.MarkReadOnly() + } + err = e.ConsumeMetrics(context.Background(), metrics) + case "traces": + e, ok := c.(exporter.Traces) + require.True(t, ok) + traces := generateLifecycleTestTraces() + if !e.Capabilities().MutatesData { + traces.MarkReadOnly() + } + err = e.ConsumeTraces(context.Background(), traces) + } + }) + + require.NoError(t, err) + + err = c.Shutdown(context.Background()) + require.NoError(t, err) + }) + } +} + +func generateLifecycleTestLogs() plog.Logs { + logs := plog.NewLogs() + rl := logs.ResourceLogs().AppendEmpty() + rl.Resource().Attributes().PutStr("resource", "R1") + l := rl.ScopeLogs().AppendEmpty().LogRecords().AppendEmpty() + l.Body().SetStr("test log message") + l.SetTimestamp(pcommon.NewTimestampFromTime(time.Now())) + return logs +} + +func generateLifecycleTestMetrics() pmetric.Metrics { + metrics := pmetric.NewMetrics() + rm := metrics.ResourceMetrics().AppendEmpty() + rm.Resource().Attributes().PutStr("resource", "R1") + m := rm.ScopeMetrics().AppendEmpty().Metrics().AppendEmpty() + m.SetName("test_metric") + dp := m.SetEmptyGauge().DataPoints().AppendEmpty() + dp.Attributes().PutStr("test_attr", "value_1") + dp.SetIntValue(123) + dp.SetTimestamp(pcommon.NewTimestampFromTime(time.Now())) + return metrics +} + +func generateLifecycleTestTraces() ptrace.Traces { + traces := ptrace.NewTraces() + rs := traces.ResourceSpans().AppendEmpty() + rs.Resource().Attributes().PutStr("resource", "R1") + span := rs.ScopeSpans().AppendEmpty().Spans().AppendEmpty() + span.Attributes().PutStr("test_attr", "value_1") + span.SetName("test_span") + span.SetStartTimestamp(pcommon.NewTimestampFromTime(time.Now().Add(-1 * time.Second))) + span.SetEndTimestamp(pcommon.NewTimestampFromTime(time.Now())) + return traces +} diff --git a/exporter/otelarrowexporter/generated_package_test.go b/exporter/otelarrowexporter/generated_package_test.go index eca1471d7dfd..c19cf02cbd7f 100644 --- a/exporter/otelarrowexporter/generated_package_test.go +++ b/exporter/otelarrowexporter/generated_package_test.go @@ -4,8 +4,10 @@ package otelarrowexporter import ( "testing" + + "go.uber.org/goleak" ) func TestMain(m *testing.M) { - // skipping goleak test as per metadata.yml configuration + goleak.VerifyTestMain(m) } diff --git a/exporter/otelarrowexporter/go.mod b/exporter/otelarrowexporter/go.mod index ddac000a2253..52d7279728ec 100644 --- a/exporter/otelarrowexporter/go.mod +++ b/exporter/otelarrowexporter/go.mod @@ -3,9 +3,11 @@ module github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelar go 1.21.0 require ( + github.com/apache/arrow/go/v14 v14.0.2 github.com/open-telemetry/otel-arrow v0.22.0 github.com/open-telemetry/otel-arrow/collector v0.22.0 github.com/stretchr/testify v1.9.0 + go.opentelemetry.io/collector v0.100.0 go.opentelemetry.io/collector/component v0.100.0 go.opentelemetry.io/collector/config/configauth v0.100.0 go.opentelemetry.io/collector/config/configcompression v1.7.0 @@ -16,19 +18,32 @@ require ( go.opentelemetry.io/collector/confmap v0.100.0 go.opentelemetry.io/collector/consumer v0.100.0 go.opentelemetry.io/collector/exporter v0.100.0 + go.opentelemetry.io/collector/extension v0.100.0 + go.opentelemetry.io/collector/extension/auth v0.100.0 go.opentelemetry.io/collector/pdata v1.7.0 + go.opentelemetry.io/otel v1.26.0 go.opentelemetry.io/otel/metric v1.26.0 go.opentelemetry.io/otel/trace v1.26.0 + go.uber.org/goleak v1.3.0 + go.uber.org/mock v0.4.0 + go.uber.org/multierr v1.11.0 + go.uber.org/zap v1.27.0 + golang.org/x/net v0.24.0 + google.golang.org/genproto/googleapis/rpc v0.0.0-20240401170217-c3f982113cda google.golang.org/grpc v1.63.2 + google.golang.org/protobuf v1.34.0 ) require ( - github.com/apache/arrow/go/v14 v14.0.2 // indirect + github.com/HdrHistogram/hdrhistogram-go v1.1.2 // indirect + github.com/axiomhq/hyperloglog v0.0.0-20230201085229-3ddf4bad03dc // indirect github.com/beorn7/perks v1.0.1 // indirect github.com/cenkalti/backoff/v4 v4.3.0 // indirect github.com/cespare/xxhash/v2 v2.2.0 // indirect github.com/davecgh/go-spew v1.1.1 // indirect + github.com/dgryski/go-metro v0.0.0-20180109044635-280f6062b5bc // indirect github.com/fsnotify/fsnotify v1.7.0 // indirect + github.com/fxamacker/cbor/v2 v2.4.0 // indirect github.com/go-logr/logr v1.4.1 // indirect github.com/go-logr/stdr v1.2.2 // indirect github.com/go-viper/mapstructure/v2 v2.0.0-alpha.1 // indirect @@ -49,34 +64,27 @@ require ( github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect github.com/modern-go/reflect2 v1.0.2 // indirect github.com/mostynb/go-grpc-compression v1.2.2 // indirect + github.com/pierrec/lz4/v4 v4.1.18 // indirect github.com/pmezard/go-difflib v1.0.0 // indirect github.com/prometheus/client_golang v1.19.0 // indirect github.com/prometheus/client_model v0.6.1 // indirect github.com/prometheus/common v0.53.0 // indirect github.com/prometheus/procfs v0.12.0 // indirect + github.com/x448/float16 v0.8.4 // indirect github.com/zeebo/xxh3 v1.0.2 // indirect - go.opentelemetry.io/collector v0.100.0 // indirect go.opentelemetry.io/collector/config/confignet v0.100.0 // indirect go.opentelemetry.io/collector/config/configtelemetry v0.100.0 // indirect go.opentelemetry.io/collector/config/internal v0.100.0 // indirect - go.opentelemetry.io/collector/extension v0.100.0 // indirect - go.opentelemetry.io/collector/extension/auth v0.100.0 // indirect go.opentelemetry.io/collector/featuregate v1.7.0 // indirect go.opentelemetry.io/collector/receiver v0.100.0 // indirect go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.51.0 // indirect - go.opentelemetry.io/otel v1.26.0 // indirect go.opentelemetry.io/otel/exporters/prometheus v0.48.0 // indirect go.opentelemetry.io/otel/sdk v1.26.0 // indirect go.opentelemetry.io/otel/sdk/metric v1.26.0 // indirect - go.uber.org/multierr v1.11.0 // indirect - go.uber.org/zap v1.27.0 // indirect golang.org/x/mod v0.14.0 // indirect - golang.org/x/net v0.24.0 // indirect golang.org/x/sys v0.19.0 // indirect golang.org/x/text v0.14.0 // indirect golang.org/x/tools v0.15.0 // indirect golang.org/x/xerrors v0.0.0-20231012003039-104605ab7028 // indirect - google.golang.org/genproto/googleapis/rpc v0.0.0-20240401170217-c3f982113cda // indirect - google.golang.org/protobuf v1.34.0 // indirect gopkg.in/yaml.v3 v3.0.1 // indirect ) diff --git a/exporter/otelarrowexporter/go.sum b/exporter/otelarrowexporter/go.sum index db5468f4d691..f2ab85fc618f 100644 --- a/exporter/otelarrowexporter/go.sum +++ b/exporter/otelarrowexporter/go.sum @@ -1,16 +1,32 @@ +dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU= +github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo= +github.com/HdrHistogram/hdrhistogram-go v1.1.2 h1:5IcZpTvzydCQeHzK4Ef/D5rrSqwxob0t8PQPMybUNFM= +github.com/HdrHistogram/hdrhistogram-go v1.1.2/go.mod h1:yDgFjdqOqDEKOvasDdhWNXYg9BVp4O+o5f6V/ehm6Oo= +github.com/ajstarks/svgo v0.0.0-20180226025133-644b8db467af/go.mod h1:K08gAheRH3/J6wwsYMMT4xOr94bZjxIelGM0+d/wbFw= github.com/apache/arrow/go/v14 v14.0.2 h1:N8OkaJEOfI3mEZt07BIkvo4sC6XDbL+48MBPWO5IONw= github.com/apache/arrow/go/v14 v14.0.2/go.mod h1:u3fgh3EdgN/YQ8cVQRguVW3R+seMybFg8QBQ5LU+eBY= +github.com/axiomhq/hyperloglog v0.0.0-20230201085229-3ddf4bad03dc h1:Keo7wQ7UODUaHcEi7ltENhbAK2VgZjfat6mLy03tQzo= +github.com/axiomhq/hyperloglog v0.0.0-20230201085229-3ddf4bad03dc/go.mod h1:k08r+Yj1PRAmuayFiRK6MYuR5Ve4IuZtTfxErMIh0+c= github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= +github.com/brianvoe/gofakeit/v6 v6.17.0 h1:obbQTJeHfktJtiZzq0Q1bEpsNUs+yHrYlPVWt7BtmJ4= +github.com/brianvoe/gofakeit/v6 v6.17.0/go.mod h1:Ow6qC71xtwm79anlwKRlWZW6zVq9D2XHE4QSSMP/rU8= github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8= github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE= github.com/cespare/xxhash/v2 v2.2.0 h1:DC2CZ1Ep5Y4k3ZQ899DldepgrayRUGE6BBZ/cd9Cj44= github.com/cespare/xxhash/v2 v2.2.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= +github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/dgryski/go-metro v0.0.0-20180109044635-280f6062b5bc h1:8WFBn63wegobsYAX0YjD+8suexZDga5CctH4CCTx2+8= +github.com/dgryski/go-metro v0.0.0-20180109044635-280f6062b5bc/go.mod h1:c9O8+fpSOX1DM8cPNSkX/qsBWdkD4yd2dpciOWQjpBw= +github.com/fogleman/gg v1.2.1-0.20190220221249-0403632d5b90/go.mod h1:R/bRT+9gY/C5z7JzPU0zXsXHKM4/ayA+zqcVNZzPa1k= github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA= github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM= +github.com/fxamacker/cbor/v2 v2.4.0 h1:ri0ArlOR+5XunOP8CRUowT0pSJOwhW098ZCUyskZD88= +github.com/fxamacker/cbor/v2 v2.4.0/go.mod h1:TA1xS00nchWmaBnEIxPSE5oHLuJBAVvqrtAnWBwBCVo= +github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU= github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A= github.com/go-logr/logr v1.4.1 h1:pKouT5E8xu9zeFC39JXRDukb6JFQPXM5p5I91188VAQ= github.com/go-logr/logr v1.4.1/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= @@ -22,10 +38,12 @@ github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU= github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I= github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q= github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q= +github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0/go.mod h1:E/TSTwGwJL78qG/PmXZO1EjYhfJinVAhrmmHX6Z8B9k= github.com/golang/snappy v0.0.5-0.20220116011046-fa5810519dcb h1:PBC98N2aIaM3XXiurYmW7fx4GZkL8feAMVq7nEjURHk= github.com/golang/snappy v0.0.5-0.20220116011046-fa5810519dcb/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q= github.com/google/flatbuffers v23.5.26+incompatible h1:M9dgRyhJemaM4Sw8+66GHBu8ioaQmyPLg1b8VwK5WJg= github.com/google/flatbuffers v23.5.26+incompatible/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8= +github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= @@ -35,6 +53,7 @@ github.com/hashicorp/go-version v1.6.0 h1:feTTfFNnjP967rlCxM/I9g701jU+RN74YKx2mO github.com/hashicorp/go-version v1.6.0/go.mod h1:fltr4n8CU8Ke44wwGCBoEymUuxUHl09ZGVZPK5anwXA= github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo= +github.com/jung-kurt/gofpdf v1.0.3-0.20190309125859-24315acbbda5/go.mod h1:7Id9E/uU8ce6rXgefFLlgrJj/GYY22cpxn+r32jIOes= github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8= github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck= github.com/klauspost/compress v1.17.8 h1:YcnTYrq7MikUT7k0Yb5eceMmALQPYBW/Xltxn0NAMnU= @@ -49,6 +68,8 @@ github.com/knadh/koanf/v2 v2.1.1 h1:/R8eXqasSTsmDCsAyYj+81Wteg8AqrV9CP6gvsTsOmM= github.com/knadh/koanf/v2 v2.1.1/go.mod h1:4mnTRbZCK+ALuBXHZMjDfG9y714L7TykVnZkXbMU3Es= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= +github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= +github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw= @@ -62,11 +83,11 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= github.com/mostynb/go-grpc-compression v1.2.2 h1:XaDbnRvt2+1vgr0b/l0qh4mJAfIxE0bKXtz2Znl3GGI= github.com/mostynb/go-grpc-compression v1.2.2/go.mod h1:GOCr2KBxXcblCuczg3YdLQlcin1/NfyDA348ckuCH6w= +github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e/go.mod h1:zD1mROLANZcx1PVRCS0qkT7pwLkGfwJo4zjcN/Tysno= github.com/open-telemetry/otel-arrow v0.22.0 h1:G1jgtqAM2ho5pyKQ4tyrDzk9Y0VcJ+GZQRJgN26vRlI= github.com/open-telemetry/otel-arrow v0.22.0/go.mod h1:F50XFaiNfkfB0MYftZIUKFULm6pxfGqjbgQzevi+65M= github.com/open-telemetry/otel-arrow/collector v0.22.0 h1:lHFjzkh5PbsiW8B63SRntnP9W7bLCXV9lslO4zI0s/Y= github.com/open-telemetry/otel-arrow/collector v0.22.0/go.mod h1:R7hRwuGDxoGLB27dkJUFKDK7mGG7Yb02ODnLHx8Whis= -github.com/pierrec/lz4 v2.0.5+incompatible h1:2xWsjqPFWcplujydGg4WmhC/6fZqK42wMM8aXeqhl0I= github.com/pierrec/lz4/v4 v4.1.18 h1:xaKrnTkyoqfh1YItXl56+6KJNVYWlEEPuAQW9xsplYQ= github.com/pierrec/lz4/v4 v4.1.18/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= @@ -83,8 +104,11 @@ github.com/rogpeppe/go-internal v1.11.0 h1:cWPaGQEPrBb5/AsnsZesgZZ9yb1OQ+GOISoDN github.com/rogpeppe/go-internal v1.11.0/go.mod h1:ddIwULY96R17DhadqLgMfk9H9tvdUzkipdSkR5nkCZA= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= +github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= +github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM= +github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg= github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/zeebo/assert v1.3.0 h1:g7C04CbJuIDKNPFHmsk4hwZDO5O+kntRxzaUoNXj+IQ= @@ -147,15 +171,28 @@ go.opentelemetry.io/otel/trace v1.26.0 h1:1ieeAUb4y0TE26jUFrCIXKpTuVK7uJGN9/Z/2L go.opentelemetry.io/otel/trace v1.26.0/go.mod h1:4iDxvGDQuUkHve82hJJ8UqrwswHYsZuWCBllGV2U2y0= go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= +go.uber.org/mock v0.4.0 h1:VcM4ZOtdbR4f6VXfiOpwpVJDL6lCReaZ6mw31wqh7KU= +go.uber.org/mock v0.4.0/go.mod h1:a6FSlNadKUHUa9IP5Vyt1zh4fC7uAwxMutEAscFbkZc= go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0= go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y= go.uber.org/zap v1.27.0 h1:aJMhYGrd5QSmlpLMr2MftRKl7t8J8PTZPA732ud/XR8= go.uber.org/zap v1.27.0/go.mod h1:GB2qFLM7cTU87MWRP2mPIjqfIDnGu+VIO4V/SdhGo2E= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= +golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= +golang.org/x/exp v0.0.0-20180321215751-8460e604b9de/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20180807140117-3d87b88a115f/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20190125153040-c74c464bbbf2/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= +golang.org/x/exp v0.0.0-20191030013958-a1ab85dbe136/go.mod h1:JXzH8nQsPlswgeRAPE3MuO9GYsAcnJvJ4vnMwN/5qkY= golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa h1:FRnLl4eNAQl8hwxVVC17teOw8kdjVDVAiFMtgUdTSRQ= golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa/go.mod h1:zk2irFbV9DP96SEBUUAy67IdHUaZuSnrz1n472HUCLE= +golang.org/x/image v0.0.0-20180708004352-c73c2afc3b81/go.mod h1:ux5Hcp/YLpHSI86hEcLt0YII63i6oz57MZXIpbrjZUs= +golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js= +golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0= +golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028/go.mod h1:E/iHnbuqvinMTCcRqshq8CkpyQDoeVncDDYHnLhea+o= +golang.org/x/mod v0.1.0/go.mod h1:0QHyrYULN0/3qlju5TqG8bIK38QM8yzMo5ekMj3DlcY= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.14.0 h1:dGoOF9QVLYng8IHTm7BAyWqCqSheQ5pYWGhzW00YJr0= @@ -172,6 +209,7 @@ golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJ golang.org/x/sync v0.6.0 h1:5BMeUDZ7vkXGfEr1x9B4bRcTH4lpkTkpdh0T/J+qjbQ= golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20190312061237-fead79001313/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= @@ -181,7 +219,10 @@ golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.14.0 h1:ScX5w1eTa3QqT8oi6+ziP7dTV1S2+ALU0bI+0zXKWiQ= golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU= +golang.org/x/tools v0.0.0-20180525024113-a5b4c53f6e8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= +golang.org/x/tools v0.0.0-20190206041539-40960b6deb8e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= +golang.org/x/tools v0.0.0-20191012152004-8de300cfc20a/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= @@ -193,8 +234,12 @@ golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8T golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20231012003039-104605ab7028 h1:+cNy6SZtPcJQH3LJVLOSmiC7MMxXNOb3PU/VUEz+EhU= golang.org/x/xerrors v0.0.0-20231012003039-104605ab7028/go.mod h1:NDW/Ps6MPRej6fsCIbMTohpP40sJ/P/vI1MoTEGwX90= +gonum.org/v1/gonum v0.0.0-20180816165407-929014505bf4/go.mod h1:Y+Yx5eoAFn32cQvJDxZx5Dpnq+c3wtXuadVZAcxbbBo= +gonum.org/v1/gonum v0.8.2/go.mod h1:oe/vMfY3deqTw+1EZJhuvEW2iwGF1bW9wwu7XCu0+v0= gonum.org/v1/gonum v0.15.0 h1:2lYxjRbTYyxkJxlhC+LvJIx3SsANPdRybu1tGj9/OrQ= gonum.org/v1/gonum v0.15.0/go.mod h1:xzZVBJBtS+Mz4q0Yl2LJTk+OxOg4jiXZ7qBoM0uISGo= +gonum.org/v1/netlib v0.0.0-20190313105609-8cb42192e0e0/go.mod h1:wa6Ws7BG/ESfp6dHfk7C6KdzKA7wR7u/rKwOGE66zvw= +gonum.org/v1/plot v0.0.0-20190515093506-e2840ee46a6b/go.mod h1:Wt8AAjI+ypCyYX3nZBvf6cAIx93T+c/OS2HFAYskSZc= google.golang.org/genproto/googleapis/rpc v0.0.0-20240401170217-c3f982113cda h1:LI5DOvAxUPMv/50agcLLoo+AdWc1irS9Rzz4vPuD1V4= google.golang.org/genproto/googleapis/rpc v0.0.0-20240401170217-c3f982113cda/go.mod h1:WtryC6hu0hhx87FDGxWCDptyssuo68sk10vYjF+T9fY= google.golang.org/grpc v1.63.2 h1:MUeiw1B2maTVZthpU5xvASfTh3LDbxHd6IJ6QQVU+xM= @@ -202,7 +247,10 @@ google.golang.org/grpc v1.63.2/go.mod h1:WAX/8DgncnokcFUldAxq7GeB5DXHDbMF+lLvDom google.golang.org/protobuf v1.34.0 h1:Qo/qEd2RZPCf2nKuorzksSknv0d3ERwp1vFG38gSmH4= google.golang.org/protobuf v1.34.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= +gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4= diff --git a/exporter/otelarrowexporter/internal/arrow/bestofn.go b/exporter/otelarrowexporter/internal/arrow/bestofn.go new file mode 100644 index 000000000000..ae4bce633643 --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/bestofn.go @@ -0,0 +1,152 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" + +import ( + "context" + "math/rand" + "runtime" + "sort" +) + +// bestOfNPrioritizer is a prioritizer that selects a less-loaded stream to write. +// https://smallrye.io/smallrye-stork/1.1.1/load-balancer/power-of-two-choices/ +type bestOfNPrioritizer struct { + doneCancel + + // input from the pipeline, as processed data with headers and + // a return channel for the result. This channel is never + // closed and is buffered. At shutdown, items of telemetry can + // be left in this channel, but users are expected to complete + // their requests before calling shutdown (and the collector's + // graph package ensures this). + input chan writeItem + + // state tracks the work being handled by all streams. + state []*streamWorkState + + // numChoices is the number of streams to consder in each decision. + numChoices int + + // loadFunc is the load function. + loadFunc loadFunc +} + +type loadFunc func(*streamWorkState) float64 + +type streamSorter struct { + work *streamWorkState + load float64 +} + +var _ streamPrioritizer = &bestOfNPrioritizer{} + +func newBestOfNPrioritizer(dc doneCancel, numChoices, numStreams int, lf loadFunc) (*bestOfNPrioritizer, []*streamWorkState) { + var state []*streamWorkState + + // Limit numChoices to the number of streams. + numChoices = min(numStreams, numChoices) + + for i := 0; i < numStreams; i++ { + ws := &streamWorkState{ + waiters: map[int64]chan<- error{}, + toWrite: make(chan writeItem, 1), + } + + state = append(state, ws) + } + + lp := &bestOfNPrioritizer{ + doneCancel: dc, + input: make(chan writeItem, runtime.NumCPU()), + state: state, + numChoices: numChoices, + loadFunc: lf, + } + + for i := 0; i < numStreams; i++ { + // TODO It's not clear if/when the the prioritizer can + // become a bottleneck. + go lp.run() + } + + return lp, state +} + +func (lp *bestOfNPrioritizer) downgrade(ctx context.Context) { + for _, ws := range lp.state { + go drain(ws.toWrite, ctx.Done()) + } +} + +func (lp *bestOfNPrioritizer) sendOne(item writeItem, rnd *rand.Rand, tmp []streamSorter) { + stream := lp.streamFor(item, rnd, tmp) + writeCh := stream.toWrite + select { + case writeCh <- item: + return + + case <-lp.done: + // All other cases: signal restart. + } + item.errCh <- ErrStreamRestarting +} + +func (lp *bestOfNPrioritizer) run() { + tmp := make([]streamSorter, len(lp.state)) + rnd := rand.New(rand.NewSource(rand.Int63())) + for { + select { + case <-lp.done: + return + case item := <-lp.input: + lp.sendOne(item, rnd, tmp) + } + } +} + +// sendAndWait implements streamWriter +func (lp *bestOfNPrioritizer) sendAndWait(ctx context.Context, errCh <-chan error, wri writeItem) error { + select { + case <-lp.done: + return ErrStreamRestarting + case <-ctx.Done(): + return context.Canceled + case lp.input <- wri: + return waitForWrite(ctx, errCh, lp.done) + } +} + +func (lp *bestOfNPrioritizer) nextWriter() streamWriter { + select { + case <-lp.done: + // In case of downgrade, return nil to return into a + // non-Arrow code path. + return nil + default: + // Fall through to sendAndWait(). + return lp + } +} + +func (lp *bestOfNPrioritizer) streamFor(_ writeItem, rnd *rand.Rand, tmp []streamSorter) *streamWorkState { + // Place all streams into the temporary slice. + for idx, item := range lp.state { + tmp[idx].work = item + } + // Select numChoices at random by shifting the selection into the start + // of the temporary slice. + for i := 0; i < lp.numChoices; i++ { + pick := rnd.Intn(lp.numChoices - i) + tmp[i], tmp[i+pick] = tmp[i+pick], tmp[i] + } + for i := 0; i < lp.numChoices; i++ { + // TODO: skip channels w/ a pending item (maybe) + tmp[i].load = lp.loadFunc(tmp[i].work) + } + sort.Slice(tmp[0:lp.numChoices], func(i, j int) bool { + return tmp[i].load < tmp[j].load + }) + return tmp[0].work +} diff --git a/exporter/otelarrowexporter/internal/arrow/common_test.go b/exporter/otelarrowexporter/internal/arrow/common_test.go new file mode 100644 index 000000000000..f0f6f3823c09 --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/common_test.go @@ -0,0 +1,413 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow + +import ( + "context" + "fmt" + "io" + + arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + arrowCollectorMock "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1/mock" + "github.com/open-telemetry/otel-arrow/collector/testdata" + "go.opentelemetry.io/collector/component" + "go.opentelemetry.io/collector/component/componenttest" + "go.uber.org/mock/gomock" + "go.uber.org/zap" + "go.uber.org/zap/zapcore" + "go.uber.org/zap/zaptest" + "go.uber.org/zap/zaptest/observer" + "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/credentials" + "google.golang.org/grpc/status" + + "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow/grpcmock" +) + +var ( + twoTraces = testdata.GenerateTraces(2) + twoMetrics = testdata.GenerateMetrics(2) + twoLogs = testdata.GenerateLogs(2) +) + +type testChannel interface { + onRecv(context.Context) func() (*arrowpb.BatchStatus, error) + onSend(context.Context) func(*arrowpb.BatchArrowRecords) error + onConnect(context.Context) error + onCloseSend() func() error +} + +type commonTestCase struct { + ctrl *gomock.Controller + telset component.TelemetrySettings + observedLogs *observer.ObservedLogs + traceClient StreamClientFunc + traceCall *gomock.Call + perRPCCredentials credentials.PerRPCCredentials + requestMetadataCall *gomock.Call +} + +type noisyTest bool + +const Noisy noisyTest = true +const NotNoisy noisyTest = false + +func newTestTelemetry(t zaptest.TestingT, noisy noisyTest) (component.TelemetrySettings, *observer.ObservedLogs) { + telset := componenttest.NewNopTelemetrySettings() + if noisy { + return telset, nil + } + core, obslogs := observer.New(zapcore.InfoLevel) + telset.Logger = zap.New(zapcore.NewTee(core, zaptest.NewLogger(t).Core())) + return telset, obslogs +} + +type z2m struct { + zaptest.TestingT +} + +var _ gomock.TestReporter = z2m{} + +func (t z2m) Fatalf(format string, args ...any) { + t.Errorf(format, args...) + t.Fail() +} + +func newCommonTestCase(t zaptest.TestingT, noisy noisyTest) *commonTestCase { + ctrl := gomock.NewController(z2m{t}) + telset, obslogs := newTestTelemetry(t, noisy) + + creds := grpcmock.NewMockPerRPCCredentials(ctrl) + creds.EXPECT().RequireTransportSecurity().Times(0) // unused interface method + requestMetadataCall := creds.EXPECT().GetRequestMetadata( + gomock.Any(), // context.Context + gomock.Any(), // ...string (unused `uri` parameter) + ).Times(0) + + traceClient := arrowCollectorMock.NewMockArrowTracesServiceClient(ctrl) + + traceCall := traceClient.EXPECT().ArrowTraces( + gomock.Any(), // context.Context + gomock.Any(), // ...grpc.CallOption + ).Times(0) + return &commonTestCase{ + ctrl: ctrl, + telset: telset, + observedLogs: obslogs, + traceClient: MakeAnyStreamClient("ArrowTraces", traceClient.ArrowTraces), + traceCall: traceCall, + perRPCCredentials: creds, + requestMetadataCall: requestMetadataCall, + } +} + +type commonTestStream struct { + anyStreamClient AnyStreamClient + ctxCall *gomock.Call + sendCall *gomock.Call + recvCall *gomock.Call + closeSendCall *gomock.Call +} + +func (ctc *commonTestCase) newMockStream(ctx context.Context) *commonTestStream { + client := arrowCollectorMock.NewMockArrowTracesService_ArrowTracesClient(ctc.ctrl) + + testStream := &commonTestStream{ + anyStreamClient: client, + ctxCall: client.EXPECT().Context().AnyTimes().Return(ctx), + sendCall: client.EXPECT().Send( + gomock.Any(), // *arrowpb.BatchArrowRecords + ).Times(0), + recvCall: client.EXPECT().Recv().Times(0), + closeSendCall: client.EXPECT().CloseSend().Times(0), + } + return testStream +} + +// returnNewStream applies the list of test channels in order to +// construct new streams. The final entry is re-used for new streams +// when it is reached. +func (ctc *commonTestCase) returnNewStream(hs ...testChannel) func(context.Context, ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, +) { + var pos int + return func(ctx context.Context, _ ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + h := hs[pos] + if pos < len(hs) { + pos++ + } + if err := h.onConnect(ctx); err != nil { + return nil, err + } + str := ctc.newMockStream(ctx) + str.sendCall.AnyTimes().DoAndReturn(h.onSend(ctx)) + str.recvCall.AnyTimes().DoAndReturn(h.onRecv(ctx)) + str.closeSendCall.AnyTimes().DoAndReturn(h.onCloseSend()) + return str.anyStreamClient, nil + } +} + +// repeatedNewStream returns a stream configured with a new test +// channel on every ArrowStream() request. +func (ctc *commonTestCase) repeatedNewStream(nc func() testChannel) func(context.Context, ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, +) { + return func(ctx context.Context, _ ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + h := nc() + if err := h.onConnect(ctx); err != nil { + return nil, err + } + str := ctc.newMockStream(ctx) + str.sendCall.AnyTimes().DoAndReturn(h.onSend(ctx)) + str.recvCall.AnyTimes().DoAndReturn(h.onRecv(ctx)) + str.closeSendCall.AnyTimes().DoAndReturn(h.onCloseSend()) + return str.anyStreamClient, nil + } +} + +// healthyTestChannel accepts the connection and returns an OK status immediately. +type healthyTestChannel struct { + sent chan *arrowpb.BatchArrowRecords + recv chan *arrowpb.BatchStatus +} + +func newHealthyTestChannel() *healthyTestChannel { + return &healthyTestChannel{ + sent: make(chan *arrowpb.BatchArrowRecords), + recv: make(chan *arrowpb.BatchStatus), + } +} + +func (tc *healthyTestChannel) sendChannel() chan *arrowpb.BatchArrowRecords { + return tc.sent +} + +func (tc *healthyTestChannel) onConnect(_ context.Context) error { + return nil +} + +func (tc *healthyTestChannel) onCloseSend() func() error { + return func() error { + close(tc.sent) + return nil + } +} + +func (tc *healthyTestChannel) onSend(ctx context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(req *arrowpb.BatchArrowRecords) error { + select { + case tc.sendChannel() <- req: + return nil + case <-ctx.Done(): + return ctx.Err() + } + } +} + +func (tc *healthyTestChannel) onRecv(ctx context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + select { + case recv, ok := <-tc.recv: + if !ok { + return nil, io.EOF + } + + return recv, nil + case <-ctx.Done(): + return &arrowpb.BatchStatus{}, ctx.Err() + } + } +} + +// unresponsiveTestChannel accepts the connection and receives data, +// but never responds with status OK. +type unresponsiveTestChannel struct { + ch chan struct{} +} + +func newUnresponsiveTestChannel() *unresponsiveTestChannel { + return &unresponsiveTestChannel{ + ch: make(chan struct{}), + } +} + +func (tc *unresponsiveTestChannel) onConnect(_ context.Context) error { + return nil +} + +func (tc *unresponsiveTestChannel) onCloseSend() func() error { + return func() error { + return nil + } +} + +func (tc *unresponsiveTestChannel) onSend(ctx context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(_ *arrowpb.BatchArrowRecords) error { + select { + case <-ctx.Done(): + return ctx.Err() + default: + return nil + } + } +} + +func (tc *unresponsiveTestChannel) onRecv(ctx context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + select { + case <-tc.ch: + return nil, io.EOF + case <-ctx.Done(): + return &arrowpb.BatchStatus{}, ctx.Err() + } + } +} + +func (tc *unresponsiveTestChannel) unblock() { + close(tc.ch) +} + +// unsupportedTestChannel mimics gRPC's behavior when there is no +// arrow stream service registered with the server. +type arrowUnsupportedTestChannel struct { +} + +func newArrowUnsupportedTestChannel() *arrowUnsupportedTestChannel { + return &arrowUnsupportedTestChannel{} +} + +func (tc *arrowUnsupportedTestChannel) onConnect(_ context.Context) error { + // Note: this matches gRPC's apparent behavior. the stream + // connection succeeds and the unsupported code is returned to + // the Recv() call. + return nil +} + +func (tc *arrowUnsupportedTestChannel) onCloseSend() func() error { + return func() error { + return nil + } +} + +func (tc *arrowUnsupportedTestChannel) onSend(ctx context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(_ *arrowpb.BatchArrowRecords) error { + <-ctx.Done() + return ctx.Err() + } +} + +func (tc *arrowUnsupportedTestChannel) onRecv(_ context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + err := status.Error(codes.Unimplemented, "arrow will not be served") + return &arrowpb.BatchStatus{}, err + } +} + +// disconnectedTestChannel allows the connection to time out. +type disconnectedTestChannel struct { +} + +func newDisconnectedTestChannel() *disconnectedTestChannel { + return &disconnectedTestChannel{} +} + +func (tc *disconnectedTestChannel) onConnect(ctx context.Context) error { + <-ctx.Done() + return ctx.Err() +} + +func (tc *disconnectedTestChannel) onCloseSend() func() error { + return func() error { + panic("unreachable") + } +} + +func (tc *disconnectedTestChannel) onSend(_ context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(_ *arrowpb.BatchArrowRecords) error { + panic("unreachable") + } +} + +func (tc *disconnectedTestChannel) onRecv(_ context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + panic("unreachable") + } +} + +// sendErrorTestChannel returns an error in Send() +type sendErrorTestChannel struct { + release chan struct{} +} + +func newSendErrorTestChannel() *sendErrorTestChannel { + return &sendErrorTestChannel{ + release: make(chan struct{}), + } +} + +func (tc *sendErrorTestChannel) onConnect(_ context.Context) error { + return nil +} + +func (tc *sendErrorTestChannel) onCloseSend() func() error { + return func() error { + return nil + } +} + +func (tc *sendErrorTestChannel) onSend(_ context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(*arrowpb.BatchArrowRecords) error { + return io.EOF + } +} + +func (tc *sendErrorTestChannel) unblock() { + close(tc.release) +} + +func (tc *sendErrorTestChannel) onRecv(_ context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + <-tc.release + return &arrowpb.BatchStatus{}, io.EOF + } +} + +// connectErrorTestChannel returns an error from the ArrowTraces() call +type connectErrorTestChannel struct { +} + +func newConnectErrorTestChannel() *connectErrorTestChannel { + return &connectErrorTestChannel{} +} + +func (tc *connectErrorTestChannel) onConnect(_ context.Context) error { + return fmt.Errorf("test connect error") +} + +func (tc *connectErrorTestChannel) onCloseSend() func() error { + return func() error { + panic("unreachable") + } +} + +func (tc *connectErrorTestChannel) onSend(_ context.Context) func(*arrowpb.BatchArrowRecords) error { + return func(*arrowpb.BatchArrowRecords) error { + panic("not reached") + } +} + +func (tc *connectErrorTestChannel) onRecv(_ context.Context) func() (*arrowpb.BatchStatus, error) { + return func() (*arrowpb.BatchStatus, error) { + panic("not reached") + } +} diff --git a/exporter/otelarrowexporter/internal/arrow/exporter.go b/exporter/otelarrowexporter/internal/arrow/exporter.go index e4b2d766511b..18b3259d3b4a 100644 --- a/exporter/otelarrowexporter/internal/arrow/exporter.go +++ b/exporter/otelarrowexporter/internal/arrow/exporter.go @@ -5,40 +5,359 @@ package arrow // import "github.com/open-telemetry/opentelemetry-collector-contr import ( "context" + "errors" + "math/rand" + "strconv" + "sync" + "time" arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + "github.com/open-telemetry/otel-arrow/collector/netstats" + arrowRecord "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record" + "go.opentelemetry.io/collector/component" + "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/ptrace" + "go.uber.org/zap" "google.golang.org/grpc" + "google.golang.org/grpc/credentials" ) -// Exporter exports OpenTelemetry Protocol with Apache Arrow protocol -// data for a specific signal. One of these structs is created per -// baseExporter, in the top-level module, when Arrow is enabled. +// Exporter is 1:1 with exporter, isolates arrow-specific +// functionality. type Exporter struct { - // TODO: Implementation + // numStreams is the number of streams that will be used. + numStreams int + + // prioritizerName the name of a balancer policy. + prioritizerName PrioritizerName + + // maxStreamLifetime is a limit on duration for streams. A + // slight "jitter" is applied relative to this value on a + // per-stream basis. + maxStreamLifetime time.Duration + + // disableDowngrade prevents downgrade from occurring, supports + // forcing Arrow transport. + disableDowngrade bool + + // telemetry includes logger, tracer, meter. + telemetry component.TelemetrySettings + + // grpcOptions includes options used by the unary RPC methods, + // e.g., WaitForReady. + grpcOptions []grpc.CallOption + + // newProducer returns a real (or mock) Producer. + newProducer func() arrowRecord.ProducerAPI + + // client is a stream corresponding with the signal's payload + // type. uses the exporter's gRPC ClientConn (or is a mock, in tests). + streamClient StreamClientFunc + + // perRPCCredentials derived from the exporter's gRPC auth settings. + perRPCCredentials credentials.PerRPCCredentials + + // returning is used to pass broken, gracefully-terminated, + // and otherwise to the stream controller. + returning chan *Stream + + // ready prioritizes streams that are ready to send + ready streamPrioritizer + + // doneCancel refers to and cancels the background context of + // this exporter. + doneCancel + + // wg counts one per active goroutine belonging to all streams + // of this exporter. The wait group has Add(1) called before + // starting goroutines so that they can be properly waited for + // in shutdown(), so the pattern is: + // + // wg.Add(1) + // go func() { + // defer wg.Done() + // ... + // }() + wg sync.WaitGroup + + // netReporter measures network traffic. + netReporter netstats.Interface } -// AnyStreamClient is the interface supported by all Arrow streams, -// i.e., any of the Arrow-supported signals having a single method w/ -// the appropriate per-signal name. +// doneCancel is used to store the done signal and cancelation +// function for a context returned by context.WithCancel. +type doneCancel struct { + done <-chan struct{} + cancel context.CancelFunc +} + +// AnyStreamClient is the interface supported by all Arrow streams. type AnyStreamClient interface { Send(*arrowpb.BatchArrowRecords) error Recv() (*arrowpb.BatchStatus, error) grpc.ClientStream } -// StreamClientFunc is a constructor for AnyStreamClients. These return +// streamClientFunc is a constructor for AnyStreamClients. These return // the method name to assist with instrumentation, since the gRPC stats // handler isn't able to see the correct uncompressed size. type StreamClientFunc func(context.Context, ...grpc.CallOption) (AnyStreamClient, string, error) -// MakeAnyStreamClient accepts any Arrow-like stream, which is one of -// the Arrow-supported signals having a single method w/ the -// appropriate name, and turns it into an AnyStreamClient. The method -// name is carried through because once constructed, gRPC clients will -// not reveal their service and method names. +// MakeAnyStreamClient accepts any Arrow-like stream and turns it into +// an AnyStreamClient. The method name is carried through because +// once constructed, gRPC clients will not reveal their service and +// method names. func MakeAnyStreamClient[T AnyStreamClient](method string, clientFunc func(ctx context.Context, opts ...grpc.CallOption) (T, error)) StreamClientFunc { return func(ctx context.Context, opts ...grpc.CallOption) (AnyStreamClient, string, error) { client, err := clientFunc(ctx, opts...) return client, method, err } } + +// NewExporter configures a new Exporter. +func NewExporter( + maxStreamLifetime time.Duration, + numStreams int, + prioritizerName PrioritizerName, + disableDowngrade bool, + telemetry component.TelemetrySettings, + grpcOptions []grpc.CallOption, + newProducer func() arrowRecord.ProducerAPI, + streamClient StreamClientFunc, + perRPCCredentials credentials.PerRPCCredentials, + netReporter netstats.Interface, +) *Exporter { + return &Exporter{ + maxStreamLifetime: maxStreamLifetime, + numStreams: numStreams, + prioritizerName: prioritizerName, + disableDowngrade: disableDowngrade, + telemetry: telemetry, + grpcOptions: grpcOptions, + newProducer: newProducer, + streamClient: streamClient, + perRPCCredentials: perRPCCredentials, + returning: make(chan *Stream, numStreams), + netReporter: netReporter, + } +} + +// Start creates the background context used by all streams and starts +// a stream controller, which initializes the initial set of streams. +func (e *Exporter) Start(ctx context.Context) error { + // this is the background context + ctx, e.doneCancel = newDoneCancel(ctx) + + // Starting N+1 goroutines + e.wg.Add(1) + + // this is the downgradeable context + downCtx, downDc := newDoneCancel(ctx) + + var sws []*streamWorkState + e.ready, sws = newStreamPrioritizer(downDc, e.prioritizerName, e.numStreams) + + for _, ws := range sws { + e.startArrowStream(downCtx, ws) + } + + go e.runStreamController(ctx, downCtx, downDc) + + return nil +} + +func (e *Exporter) startArrowStream(ctx context.Context, ws *streamWorkState) { + // this is the new stream context + ctx, dc := newDoneCancel(ctx) + + e.wg.Add(1) + + go e.runArrowStream(ctx, dc, ws) +} + +// runStreamController starts the initial set of streams, then waits for streams to +// terminate one at a time and restarts them. If streams come back with a nil +// client (meaning that OTel-Arrow was not supported by the endpoint), it will +// not be restarted. +func (e *Exporter) runStreamController(exportCtx, downCtx context.Context, downDc doneCancel) { + defer e.cancel() + defer e.wg.Done() + + running := e.numStreams + + for { + select { + case stream := <-e.returning: + if stream.client != nil || e.disableDowngrade { + // The stream closed or broken. Restart it. + e.startArrowStream(downCtx, stream.workState) + continue + } + // Otherwise, the stream never got started. It was + // downgraded and senders will use the standard OTLP path. + running-- + + // None of the streams were able to connect to + // an Arrow endpoint. + if running == 0 { + e.telemetry.Logger.Info("could not establish arrow streams, downgrading to standard OTLP export") + downDc.cancel() + // this call is allowed to block indefinitely, + // as to call drain(). + e.ready.downgrade(exportCtx) + return + } + + case <-exportCtx.Done(): + // We are shutting down. + return + } + } +} + +// addJitter is used to subtract 0-5% from max_stream_lifetime. Since +// the max_stream_lifetime value is expected to be close to the +// receiver's max_connection_age_grace setting, we do not add jitter, +// only subtract. +func addJitter(v time.Duration) time.Duration { + if v == 0 { + return 0 + } + return v - time.Duration(rand.Int63n(int64(v/20))) +} + +// runArrowStream begins one gRPC stream using a child of the background context. +// If the stream connection is successful, this goroutine starts another goroutine +// to call writeStream() and performs readStream() itself. When the stream shuts +// down this call synchronously waits for and unblocks the consumers. +func (e *Exporter) runArrowStream(ctx context.Context, dc doneCancel, state *streamWorkState) { + defer dc.cancel() + producer := e.newProducer() + + stream := newStream(producer, e.ready, e.telemetry, e.netReporter, state) + stream.maxStreamLifetime = addJitter(e.maxStreamLifetime) + + defer func() { + if err := producer.Close(); err != nil { + e.telemetry.Logger.Error("arrow producer close:", zap.Error(err)) + } + e.wg.Done() + e.returning <- stream + }() + + stream.run(ctx, dc, e.streamClient, e.grpcOptions) +} + +// SendAndWait tries to send using an Arrow stream. The results are: +// +// (true, nil): Arrow send: success at consumer +// (false, nil): Arrow is not supported by the server, caller expected to fallback. +// (true, non-nil): Arrow send: server response may be permanent or allow retry. +// (false, non-nil): Context timeout prevents retry. +// +// consumer should fall back to standard OTLP, (true, nil) +func (e *Exporter) SendAndWait(ctx context.Context, data any) (bool, error) { + errCh := make(chan error, 1) + + // Note that if the OTLP exporter's gRPC Headers field was + // set, those (static) headers were used to establish the + // stream. The caller's context was returned by + // baseExporter.enhanceContext() includes the static headers + // plus optional client metadata. Here, get whatever + // headers that gRPC would have transmitted for a unary RPC + // and convey them via the Arrow batch. + + // Note that the "uri" parameter to GetRequestMetadata is + // not used by the headersetter extension and is not well + // documented. Since it's an optional list, we omit it. + var md map[string]string + if e.perRPCCredentials != nil { + var err error + md, err = e.perRPCCredentials.GetRequestMetadata(ctx) + if err != nil { + return false, err + } + } + + // Note that the uncompressed size as measured by the receiver + // will be different than uncompressed size as measured by the + // exporter, because of the optimization phase performed in the + // conversion to Arrow. + var uncompSize int + switch data := data.(type) { + case ptrace.Traces: + var sizer ptrace.ProtoMarshaler + uncompSize = sizer.TracesSize(data) + case plog.Logs: + var sizer plog.ProtoMarshaler + uncompSize = sizer.LogsSize(data) + case pmetric.Metrics: + var sizer pmetric.ProtoMarshaler + uncompSize = sizer.MetricsSize(data) + } + + if md == nil { + md = make(map[string]string) + } + md["otlp-pdata-size"] = strconv.Itoa(uncompSize) + + wri := writeItem{ + records: data, + md: md, + uncompSize: uncompSize, + errCh: errCh, + producerCtx: ctx, + } + + for { + writer := e.ready.nextWriter() + + if writer == nil { + return false, nil // a downgraded connection + } + + err := writer.sendAndWait(ctx, errCh, wri) + if err != nil && errors.Is(err, ErrStreamRestarting) { + continue // an internal retry + + } + // result from arrow server (may be nil, may be + // permanent, etc.) + return true, err + } +} + +// Shutdown returns when all Arrow-associated goroutines have returned. +func (e *Exporter) Shutdown(_ context.Context) error { + e.cancel() + e.wg.Wait() + return nil +} + +// waitForWrite waits for the first of the following: +// 1. This context timeout +// 2. Completion with err == nil or err != nil +// 3. Downgrade +func waitForWrite(ctx context.Context, errCh <-chan error, down <-chan struct{}) error { + select { + case <-ctx.Done(): + // This caller's context timed out. + return ctx.Err() + case <-down: + return ErrStreamRestarting + case err := <-errCh: + // Note: includes err == nil and err != nil cases. + return err + } +} + +// newDoneCancel returns a doneCancel, which is a new context with +// type that carries its done and cancel function. +func newDoneCancel(ctx context.Context) (context.Context, doneCancel) { + ctx, cancel := context.WithCancel(ctx) + return ctx, doneCancel{ + done: ctx.Done(), + cancel: cancel, + } +} diff --git a/exporter/otelarrowexporter/internal/arrow/exporter_test.go b/exporter/otelarrowexporter/internal/arrow/exporter_test.go new file mode 100644 index 000000000000..276e5f3fa437 --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/exporter_test.go @@ -0,0 +1,890 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "sync" + "sync/atomic" + "testing" + "time" + + arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + "github.com/open-telemetry/otel-arrow/collector/netstats" + "github.com/open-telemetry/otel-arrow/collector/testdata" + arrowRecord "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record" + arrowRecordMock "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record/mock" + otelAssert "github.com/open-telemetry/otel-arrow/pkg/otel/assert" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/ptrace" + "go.opentelemetry.io/otel" + "go.opentelemetry.io/otel/propagation" + "go.opentelemetry.io/otel/trace" + "go.uber.org/mock/gomock" + "go.uber.org/zap/zaptest" + "golang.org/x/net/http2/hpack" + "google.golang.org/grpc" + "google.golang.org/grpc/metadata" +) + +var AllPrioritizers = []PrioritizerName{LeastLoadedPrioritizer, LeastLoadedTwoPrioritizer} + +const defaultMaxStreamLifetime = 11 * time.Second + +type compareJSONTraces struct{ ptrace.Traces } +type compareJSONMetrics struct{ pmetric.Metrics } +type compareJSONLogs struct{ plog.Logs } + +func (c compareJSONTraces) MarshalJSON() ([]byte, error) { + var m ptrace.JSONMarshaler + return m.MarshalTraces(c.Traces) +} + +func (c compareJSONMetrics) MarshalJSON() ([]byte, error) { + var m pmetric.JSONMarshaler + return m.MarshalMetrics(c.Metrics) +} + +func (c compareJSONLogs) MarshalJSON() ([]byte, error) { + var m plog.JSONMarshaler + return m.MarshalLogs(c.Logs) +} + +type exporterTestCase struct { + *commonTestCase + exporter *Exporter +} + +func newSingleStreamTestCase(t *testing.T, pname PrioritizerName) *exporterTestCase { + return newExporterTestCaseCommon(t, pname, NotNoisy, defaultMaxStreamLifetime, 1, false, nil) +} + +func newShortLifetimeStreamTestCase(t *testing.T, pname PrioritizerName, numStreams int) *exporterTestCase { + return newExporterTestCaseCommon(t, pname, NotNoisy, time.Second/2, numStreams, false, nil) +} + +func newSingleStreamDowngradeDisabledTestCase(t *testing.T, pname PrioritizerName) *exporterTestCase { + return newExporterTestCaseCommon(t, pname, NotNoisy, defaultMaxStreamLifetime, 1, true, nil) +} + +func newSingleStreamMetadataTestCase(t *testing.T) *exporterTestCase { + var count int + return newExporterTestCaseCommon(t, DefaultPrioritizer, NotNoisy, defaultMaxStreamLifetime, 1, false, func(_ context.Context) (map[string]string, error) { + defer func() { count++ }() + if count%2 == 0 { + return nil, nil + } + return map[string]string{ + "expected1": "metadata1", + "expected2": fmt.Sprint(count), + }, nil + }) +} + +func newExporterNoisyTestCase(t *testing.T, numStreams int) *exporterTestCase { + return newExporterTestCaseCommon(t, DefaultPrioritizer, Noisy, defaultMaxStreamLifetime, numStreams, false, nil) +} + +func copyBatch[T any](recordFunc func(T) (*arrowpb.BatchArrowRecords, error)) func(T) (*arrowpb.BatchArrowRecords, error) { + // Because Arrow-IPC uses zero copy, we have to copy inside the test + // instead of sharing pointers to BatchArrowRecords. + return func(data T) (*arrowpb.BatchArrowRecords, error) { + in, err := recordFunc(data) + if err != nil { + return nil, err + } + + hcpy := make([]byte, len(in.Headers)) + copy(hcpy, in.Headers) + + pays := make([]*arrowpb.ArrowPayload, len(in.ArrowPayloads)) + + for i, inp := range in.ArrowPayloads { + rcpy := make([]byte, len(inp.Record)) + copy(rcpy, inp.Record) + pays[i] = &arrowpb.ArrowPayload{ + SchemaId: inp.SchemaId, + Type: inp.Type, + Record: rcpy, + } + } + + return &arrowpb.BatchArrowRecords{ + BatchId: in.BatchId, + Headers: hcpy, + ArrowPayloads: pays, + }, nil + } +} + +func mockArrowProducer(ctc *commonTestCase) func() arrowRecord.ProducerAPI { + return func() arrowRecord.ProducerAPI { + // Mock the close function, use a real producer for testing dataflow. + mock := arrowRecordMock.NewMockProducerAPI(ctc.ctrl) + prod := arrowRecord.NewProducer() + + mock.EXPECT().BatchArrowRecordsFromTraces(gomock.Any()).AnyTimes().DoAndReturn( + copyBatch(prod.BatchArrowRecordsFromTraces)) + mock.EXPECT().BatchArrowRecordsFromLogs(gomock.Any()).AnyTimes().DoAndReturn( + copyBatch(prod.BatchArrowRecordsFromLogs)) + mock.EXPECT().BatchArrowRecordsFromMetrics(gomock.Any()).AnyTimes().DoAndReturn( + copyBatch(prod.BatchArrowRecordsFromMetrics)) + mock.EXPECT().Close().Times(1).Return(nil) + return mock + } +} + +func newExporterTestCaseCommon(t zaptest.TestingT, pname PrioritizerName, noisy noisyTest, maxLifetime time.Duration, numStreams int, disableDowngrade bool, metadataFunc func(ctx context.Context) (map[string]string, error)) *exporterTestCase { + ctc := newCommonTestCase(t, noisy) + + if metadataFunc == nil { + ctc.requestMetadataCall.AnyTimes().Return(nil, nil) + } else { + ctc.requestMetadataCall.AnyTimes().DoAndReturn(func(ctx context.Context, _ ...string) (map[string]string, error) { + return metadataFunc(ctx) + }) + } + + exp := NewExporter(maxLifetime, numStreams, pname, disableDowngrade, ctc.telset, nil, mockArrowProducer(ctc), ctc.traceClient, ctc.perRPCCredentials, netstats.Noop{}) + + return &exporterTestCase{ + commonTestCase: ctc, + exporter: exp, + } +} + +func statusOKFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: arrowpb.StatusCode_OK, + } +} + +func statusUnavailableFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: arrowpb.StatusCode_UNAVAILABLE, + StatusMessage: "test unavailable", + } +} + +func statusInvalidFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: arrowpb.StatusCode_INVALID_ARGUMENT, + StatusMessage: "test invalid", + } +} + +func statusUnrecognizedFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: 1 << 20, + StatusMessage: "test unrecognized", + } +} + +// TestArrowExporterSuccess tests a single Send through a healthy channel. +func TestArrowExporterSuccess(t *testing.T) { + stdTesting := otelAssert.NewStdUnitTest(t) + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + for _, inputData := range []any{twoTraces, twoMetrics, twoLogs} { + t.Run(fmt.Sprintf("%T", inputData), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newHealthyTestChannel() + + tc.traceCall.Times(1).DoAndReturn(tc.returnNewStream(channel)) + + ctx := context.Background() + require.NoError(t, tc.exporter.Start(ctx)) + + var wg sync.WaitGroup + var outputData *arrowpb.BatchArrowRecords + wg.Add(1) + go func() { + defer wg.Done() + outputData = <-channel.sendChannel() + channel.recv <- statusOKFor(outputData.BatchId) + }() + + sent, err := tc.exporter.SendAndWait(ctx, inputData) + require.NoError(t, err) + require.True(t, sent) + + wg.Wait() + + testCon := arrowRecord.NewConsumer() + switch testData := inputData.(type) { + case ptrace.Traces: + traces, err := testCon.TracesFrom(outputData) + require.NoError(t, err) + require.Equal(t, 1, len(traces)) + otelAssert.Equiv(stdTesting, []json.Marshaler{ + compareJSONTraces{testData}, + }, []json.Marshaler{ + compareJSONTraces{traces[0]}, + }) + case plog.Logs: + logs, err := testCon.LogsFrom(outputData) + require.NoError(t, err) + require.Equal(t, 1, len(logs)) + otelAssert.Equiv(stdTesting, []json.Marshaler{ + compareJSONLogs{testData}, + }, []json.Marshaler{ + compareJSONLogs{logs[0]}, + }) + case pmetric.Metrics: + metrics, err := testCon.MetricsFrom(outputData) + require.NoError(t, err) + require.Equal(t, 1, len(metrics)) + otelAssert.Equiv(stdTesting, []json.Marshaler{ + compareJSONMetrics{testData}, + }, []json.Marshaler{ + compareJSONMetrics{metrics[0]}, + }) + } + + require.NoError(t, tc.exporter.Shutdown(ctx)) + }) + } + }) + } +} + +// TestArrowExporterTimeout tests that single slow Send leads to context canceled. +func TestArrowExporterTimeout(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newUnresponsiveTestChannel() + + tc.traceCall.Times(1).DoAndReturn(tc.returnNewStream(channel)) + + ctx, cancel := context.WithCancel(context.Background()) + require.NoError(t, tc.exporter.Start(ctx)) + + go func() { + time.Sleep(200 * time.Millisecond) + cancel() + }() + sent, err := tc.exporter.SendAndWait(ctx, twoTraces) + require.True(t, sent) + require.Error(t, err) + require.True(t, errors.Is(err, context.Canceled)) + + require.NoError(t, tc.exporter.Shutdown(ctx)) + }) + } +} + +// TestConnectError tests that if the connetions fail fast the +// stream object for some reason is nil. This causes downgrade. +func TestArrowExporterStreamConnectError(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newConnectErrorTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + bg := context.Background() + require.NoError(t, tc.exporter.Start(bg)) + + sent, err := tc.exporter.SendAndWait(bg, twoTraces) + require.False(t, sent) + require.NoError(t, err) + + require.NoError(t, tc.exporter.Shutdown(bg)) + + require.Less(t, 0, len(tc.observedLogs.All()), "should have at least one log: %v", tc.observedLogs.All()) + require.Equal(t, tc.observedLogs.All()[0].Message, "cannot start arrow stream") + }) + } +} + +// TestArrowExporterDowngrade tests that if the Recv() returns an +// Unimplemented code (as gRPC does) that the connection is downgraded +// without error. +func TestArrowExporterDowngrade(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newArrowUnsupportedTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + bg := context.Background() + require.NoError(t, tc.exporter.Start(bg)) + + sent, err := tc.exporter.SendAndWait(bg, twoTraces) + require.False(t, sent) + require.NoError(t, err) + + require.NoError(t, tc.exporter.Shutdown(bg)) + + require.Less(t, 1, len(tc.observedLogs.All()), "should have at least two logs: %v", tc.observedLogs.All()) + require.Equal(t, tc.observedLogs.All()[0].Message, "arrow is not supported") + require.Contains(t, tc.observedLogs.All()[1].Message, "downgrading") + }) + } +} + +// TestArrowExporterDisableDowngrade tests that if the Recv() returns +// any error downgrade still does not occur amd that the connection is +// retried without error. +func TestArrowExporterDisableDowngrade(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamDowngradeDisabledTestCase(t, pname) + badChannel := newArrowUnsupportedTestChannel() + goodChannel := newHealthyTestChannel() + + fails := 0 + tc.traceCall.AnyTimes().DoAndReturn(func(ctx context.Context, opts ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + defer func() { fails++ }() + + if fails < 3 { + return tc.returnNewStream(badChannel)(ctx, opts...) + } + return tc.returnNewStream(goodChannel)(ctx, opts...) + }) + + var wg sync.WaitGroup + wg.Add(1) + go func() { + defer wg.Done() + outputData := <-goodChannel.sendChannel() + goodChannel.recv <- statusOKFor(outputData.BatchId) + }() + + bg := context.Background() + require.NoError(t, tc.exporter.Start(bg)) + + sent, err := tc.exporter.SendAndWait(bg, twoTraces) + require.True(t, sent) + require.NoError(t, err) + + wg.Wait() + + require.NoError(t, tc.exporter.Shutdown(bg)) + + require.Less(t, 1, len(tc.observedLogs.All()), "should have at least two logs: %v", tc.observedLogs.All()) + require.Equal(t, tc.observedLogs.All()[0].Message, "arrow is not supported") + require.NotContains(t, tc.observedLogs.All()[1].Message, "downgrading") + }) + } +} + +// TestArrowExporterConnectTimeout tests that an error is returned to +// the caller if the response does not arrive in time. +func TestArrowExporterConnectTimeout(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newDisconnectedTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + bg := context.Background() + ctx, cancel := context.WithCancel(bg) + require.NoError(t, tc.exporter.Start(bg)) + + go func() { + time.Sleep(200 * time.Millisecond) + cancel() + }() + _, err := tc.exporter.SendAndWait(ctx, twoTraces) + require.Error(t, err) + require.True(t, errors.Is(err, context.Canceled)) + + require.NoError(t, tc.exporter.Shutdown(bg)) + }) + } +} + +// TestArrowExporterStreamFailure tests that a single stream failure +// followed by a healthy stream. +func TestArrowExporterStreamFailure(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel0 := newUnresponsiveTestChannel() + channel1 := newHealthyTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel0, channel1)) + + bg := context.Background() + require.NoError(t, tc.exporter.Start(bg)) + + go func() { + time.Sleep(200 * time.Millisecond) + channel0.unblock() + }() + + var wg sync.WaitGroup + var outputData *arrowpb.BatchArrowRecords + wg.Add(1) + go func() { + defer wg.Done() + outputData = <-channel1.sendChannel() + channel1.recv <- statusOKFor(outputData.BatchId) + }() + + sent, err := tc.exporter.SendAndWait(bg, twoTraces) + require.NoError(t, err) + require.True(t, sent) + + wg.Wait() + + require.NoError(t, tc.exporter.Shutdown(bg)) + }) + } +} + +// TestArrowExporterStreamRace reproduces the situation needed for a +// race between stream send and stream cancel, causing it to fully +// exercise the removeReady() code path. +func TestArrowExporterStreamRace(t *testing.T) { + // This creates the conditions likely to produce a + // stream race in prioritizer.go. + tc := newExporterNoisyTestCase(t, 20) + + var tries atomic.Int32 + + tc.traceCall.AnyTimes().DoAndReturn(tc.repeatedNewStream(func() testChannel { + noResponse := newUnresponsiveTestChannel() + // Immediately unblock to return the EOF to the stream + // receiver and shut down the stream. + go noResponse.unblock() + tries.Add(1) + return noResponse + })) + + var wg sync.WaitGroup + + bg := context.Background() + require.NoError(t, tc.exporter.Start(bg)) + + callctx, cancel := context.WithCancel(bg) + + // These goroutines will repeatedly try for an available + // stream, but none will become available. Eventually the + // context will be canceled and cause these goroutines to + // return. + for i := 0; i < 5; i++ { + wg.Add(1) + go func() { + defer wg.Done() + // This blocks until the cancelation. + _, err := tc.exporter.SendAndWait(callctx, twoTraces) + require.Error(t, err) + require.True(t, errors.Is(err, context.Canceled)) + }() + } + + // Wait until 1000 streams have started. + assert.Eventually(t, func() bool { + return tries.Load() >= 1000 + }, 10*time.Second, 5*time.Millisecond) + + cancel() + wg.Wait() + require.NoError(t, tc.exporter.Shutdown(bg)) +} + +// TestArrowExporterStreaming tests 10 sends in a row. +func TestArrowExporterStreaming(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newHealthyTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + require.NoError(t, tc.exporter.Start(ctx)) + + var expectOutput []ptrace.Traces + var actualOutput []ptrace.Traces + testCon := arrowRecord.NewConsumer() + + var wg sync.WaitGroup + wg.Add(1) + go func() { + defer wg.Done() + for data := range channel.sendChannel() { + traces, err := testCon.TracesFrom(data) + require.NoError(t, err) + require.Equal(t, 1, len(traces)) + actualOutput = append(actualOutput, traces[0]) + channel.recv <- statusOKFor(data.BatchId) + } + }() + + for times := 0; times < 10; times++ { + input := testdata.GenerateTraces(2) + + sent, err := tc.exporter.SendAndWait(context.Background(), input) + require.NoError(t, err) + require.True(t, sent) + + expectOutput = append(expectOutput, input) + } + // Stop the test conduit started above. + cancel() + wg.Wait() + + // As this equality check doesn't support out of order slices, + // we sort the slices directly in the GenerateTraces function. + require.Equal(t, expectOutput, actualOutput) + require.NoError(t, tc.exporter.Shutdown(ctx)) + }) + } +} + +// TestArrowExporterHeaders tests a mix of outgoing context headers. +func TestArrowExporterHeaders(t *testing.T) { + tc := newSingleStreamMetadataTestCase(t) + channel := newHealthyTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + require.NoError(t, tc.exporter.Start(ctx)) + + var expectOutput []metadata.MD + var actualOutput []metadata.MD + + var wg sync.WaitGroup + wg.Add(1) + go func() { + defer wg.Done() + md := metadata.MD{} + hpd := hpack.NewDecoder(4096, func(f hpack.HeaderField) { + md[f.Name] = append(md[f.Name], f.Value) + }) + for data := range channel.sendChannel() { + if len(data.Headers) == 0 { + actualOutput = append(actualOutput, nil) + } else { + _, err := hpd.Write(data.Headers) + require.NoError(t, err) + actualOutput = append(actualOutput, md) + md = metadata.MD{} + } + channel.recv <- statusOKFor(data.BatchId) + } + }() + + for times := 0; times < 10; times++ { + input := testdata.GenerateTraces(2) + + if times%2 == 1 { + md := metadata.MD{ + "expected1": []string{"metadata1"}, + "expected2": []string{fmt.Sprint(times)}, + "otlp-pdata-size": []string{"329"}, + } + expectOutput = append(expectOutput, md) + } else { + expectOutput = append(expectOutput, metadata.MD{ + "otlp-pdata-size": []string{"329"}, + }) + } + + sent, err := tc.exporter.SendAndWait(context.Background(), input) + require.NoError(t, err) + require.True(t, sent) + } + // Stop the test conduit started above. + cancel() + wg.Wait() + + require.Equal(t, expectOutput, actualOutput) + require.NoError(t, tc.exporter.Shutdown(ctx)) +} + +// TestArrowExporterIsTraced tests whether trace and span ID are +// propagated. +func TestArrowExporterIsTraced(t *testing.T) { + otel.SetTextMapPropagator(propagation.TraceContext{}) + + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newSingleStreamTestCase(t, pname) + channel := newHealthyTestChannel() + + tc.traceCall.AnyTimes().DoAndReturn(tc.returnNewStream(channel)) + + ctx, cancel := context.WithCancel(context.Background()) + require.NoError(t, tc.exporter.Start(ctx)) + + var expectOutput []metadata.MD + var actualOutput []metadata.MD + + var wg sync.WaitGroup + wg.Add(1) + go func() { + defer wg.Done() + md := metadata.MD{} + hpd := hpack.NewDecoder(4096, func(f hpack.HeaderField) { + md[f.Name] = append(md[f.Name], f.Value) + }) + for data := range channel.sendChannel() { + if len(data.Headers) == 0 { + actualOutput = append(actualOutput, nil) + } else { + _, err := hpd.Write(data.Headers) + require.NoError(t, err) + actualOutput = append(actualOutput, md) + md = metadata.MD{} + } + channel.recv <- statusOKFor(data.BatchId) + } + }() + + for times := 0; times < 10; times++ { + input := testdata.GenerateTraces(2) + callCtx := context.Background() + + if times%2 == 1 { + callCtx = trace.ContextWithSpanContext(callCtx, + trace.NewSpanContext(trace.SpanContextConfig{ + TraceID: [16]byte{byte(times), 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf}, + SpanID: [8]byte{byte(times), 1, 2, 3, 4, 5, 6, 7}, + }), + ) + expectMap := map[string]string{} + propagation.TraceContext{}.Inject(callCtx, propagation.MapCarrier(expectMap)) + + md := metadata.MD{ + "traceparent": []string{expectMap["traceparent"]}, + "otlp-pdata-size": []string{"329"}, + } + expectOutput = append(expectOutput, md) + } else { + expectOutput = append(expectOutput, metadata.MD{ + "otlp-pdata-size": []string{"329"}, + }) + } + + sent, err := tc.exporter.SendAndWait(callCtx, input) + require.NoError(t, err) + require.True(t, sent) + } + // Stop the test conduit started above. + cancel() + wg.Wait() + + require.Equal(t, expectOutput, actualOutput) + require.NoError(t, tc.exporter.Shutdown(ctx)) + }) + } +} + +func TestAddJitter(t *testing.T) { + require.Equal(t, time.Duration(0), addJitter(0)) + + // Expect no more than 5% less in each trial. + for i := 0; i < 100; i++ { + x := addJitter(20 * time.Minute) + require.LessOrEqual(t, 19*time.Minute, x) + require.Less(t, x, 20*time.Minute) + } +} + +// TestArrowExporterStreamLifetimeAndShutdown exercises multiple +// stream lifetimes and then shuts down, inspects the logs for +// legibility. +func TestArrowExporterStreamLifetimeAndShutdown(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + for _, numStreams := range []int{1, 2, 8} { + t.Run(fmt.Sprint(numStreams), func(t *testing.T) { + tc := newShortLifetimeStreamTestCase(t, pname, numStreams) + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + var wg sync.WaitGroup + + var expectCount uint64 + var actualCount uint64 + + tc.traceCall.AnyTimes().DoAndReturn(func(ctx context.Context, opts ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + wg.Add(1) + channel := newHealthyTestChannel() + + go func() { + defer wg.Done() + testCon := arrowRecord.NewConsumer() + + for data := range channel.sendChannel() { + traces, err := testCon.TracesFrom(data) + require.NoError(t, err) + require.Equal(t, 1, len(traces)) + atomic.AddUint64(&actualCount, 1) + channel.recv <- statusOKFor(data.BatchId) + } + + // Closing the recv channel causes the exporter to see EOF. + close(channel.recv) + }() + + return tc.returnNewStream(channel)(ctx, opts...) + }) + + require.NoError(t, tc.exporter.Start(ctx)) + + start := time.Now() + // This is 10 stream lifetimes using the "ShortLifetime" test. + for time.Since(start) < 5*time.Second { + input := testdata.GenerateTraces(2) + + sent, err := tc.exporter.SendAndWait(ctx, input) + require.NoError(t, err) + require.True(t, sent) + + expectCount++ + } + + require.NoError(t, tc.exporter.Shutdown(ctx)) + + require.Equal(t, expectCount, actualCount) + + cancel() + wg.Wait() + + require.Empty(t, tc.observedLogs.All()) + }) + } + }) + } +} + +func BenchmarkLeastLoadedTwo4(b *testing.B) { + benchmarkPrioritizer(b, 4, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedTwo8(b *testing.B) { + benchmarkPrioritizer(b, 8, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedTwo16(b *testing.B) { + benchmarkPrioritizer(b, 16, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedTwo32(b *testing.B) { + benchmarkPrioritizer(b, 32, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedTwo64(b *testing.B) { + benchmarkPrioritizer(b, 64, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedTwo128(b *testing.B) { + benchmarkPrioritizer(b, 128, LeastLoadedTwoPrioritizer) +} + +func BenchmarkLeastLoadedFour4(b *testing.B) { + benchmarkPrioritizer(b, 4, LeastLoadedFourPrioritizer) +} + +func BenchmarkLeastLoadedFour8(b *testing.B) { + benchmarkPrioritizer(b, 8, LeastLoadedFourPrioritizer) +} + +func BenchmarkLeastLoadedFour16(b *testing.B) { + benchmarkPrioritizer(b, 16, LeastLoadedFourPrioritizer) +} + +func BenchmarkLeastLoadedFour32(b *testing.B) { + benchmarkPrioritizer(b, 32, LeastLoadedFourPrioritizer) +} + +func BenchmarkLeastLoadedFour64(b *testing.B) { + benchmarkPrioritizer(b, 64, LeastLoadedFourPrioritizer) +} + +func BenchmarkLeastLoadedFour128(b *testing.B) { + benchmarkPrioritizer(b, 128, LeastLoadedFourPrioritizer) +} + +func benchmarkPrioritizer(b *testing.B, numStreams int, pname PrioritizerName) { + tc := newExporterTestCaseCommon(z2m{b}, pname, Noisy, defaultMaxStreamLifetime, numStreams, true, nil) + + var wg sync.WaitGroup + var cnt atomic.Int32 + + tc.traceCall.AnyTimes().DoAndReturn(func(ctx context.Context, opts ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + wg.Add(1) + num := cnt.Add(1) + channel := newHealthyTestChannel() + + delay := time.Duration(num) * time.Millisecond + + go func() { + defer wg.Done() + var mine sync.WaitGroup + for data := range channel.sendChannel() { + mine.Add(1) + go func(<-chan time.Time) { + defer mine.Done() + channel.recv <- statusOKFor(data.BatchId) + }(time.After(delay)) + } + + mine.Wait() + + close(channel.recv) + }() + + return tc.returnNewStream(channel)(ctx, opts...) + }) + + bg, cancel := context.WithCancel(context.Background()) + defer cancel() + if err := tc.exporter.Start(bg); err != nil { + b.Errorf("start failed: %v", err) + return + } + + input := testdata.GenerateTraces(2) + + wg.Add(1) + defer func() { + if err := tc.exporter.Shutdown(bg); err != nil { + b.Errorf("shutdown failed: %v", err) + } + wg.Done() + wg.Wait() + }() + + b.ResetTimer() + + for i := 0; i < b.N; i++ { + sent, err := tc.exporter.SendAndWait(bg, input) + if err != nil || !sent { + b.Errorf("send failed: %v: %v", sent, err) + } + } +} diff --git a/exporter/otelarrowexporter/internal/arrow/grpcmock/credentials.go b/exporter/otelarrowexporter/internal/arrow/grpcmock/credentials.go new file mode 100644 index 000000000000..c9ddc953e5b1 --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/grpcmock/credentials.go @@ -0,0 +1,74 @@ +// Code generated by MockGen. DO NOT EDIT. +// Source: google.golang.org/grpc/credentials (interfaces: PerRPCCredentials) +// +// Generated by this command: +// +// mockgen -package grpcmock google.golang.org/grpc/credentials PerRPCCredentials +// + +// Package grpcmock is a generated GoMock package. +package grpcmock + +import ( + context "context" + reflect "reflect" + + gomock "go.uber.org/mock/gomock" +) + +// MockPerRPCCredentials is a mock of PerRPCCredentials interface. +type MockPerRPCCredentials struct { + ctrl *gomock.Controller + recorder *MockPerRPCCredentialsMockRecorder +} + +// MockPerRPCCredentialsMockRecorder is the mock recorder for MockPerRPCCredentials. +type MockPerRPCCredentialsMockRecorder struct { + mock *MockPerRPCCredentials +} + +// NewMockPerRPCCredentials creates a new mock instance. +func NewMockPerRPCCredentials(ctrl *gomock.Controller) *MockPerRPCCredentials { + mock := &MockPerRPCCredentials{ctrl: ctrl} + mock.recorder = &MockPerRPCCredentialsMockRecorder{mock} + return mock +} + +// EXPECT returns an object that allows the caller to indicate expected use. +func (m *MockPerRPCCredentials) EXPECT() *MockPerRPCCredentialsMockRecorder { + return m.recorder +} + +// GetRequestMetadata mocks base method. +func (m *MockPerRPCCredentials) GetRequestMetadata(arg0 context.Context, arg1 ...string) (map[string]string, error) { + m.ctrl.T.Helper() + varargs := []any{arg0} + for _, a := range arg1 { + varargs = append(varargs, a) + } + ret := m.ctrl.Call(m, "GetRequestMetadata", varargs...) + ret0, _ := ret[0].(map[string]string) + ret1, _ := ret[1].(error) + return ret0, ret1 +} + +// GetRequestMetadata indicates an expected call of GetRequestMetadata. +func (mr *MockPerRPCCredentialsMockRecorder) GetRequestMetadata(arg0 any, arg1 ...any) *gomock.Call { + mr.mock.ctrl.T.Helper() + varargs := append([]any{arg0}, arg1...) + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "GetRequestMetadata", reflect.TypeOf((*MockPerRPCCredentials)(nil).GetRequestMetadata), varargs...) +} + +// RequireTransportSecurity mocks base method. +func (m *MockPerRPCCredentials) RequireTransportSecurity() bool { + m.ctrl.T.Helper() + ret := m.ctrl.Call(m, "RequireTransportSecurity") + ret0, _ := ret[0].(bool) + return ret0 +} + +// RequireTransportSecurity indicates an expected call of RequireTransportSecurity. +func (mr *MockPerRPCCredentialsMockRecorder) RequireTransportSecurity() *gomock.Call { + mr.mock.ctrl.T.Helper() + return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "RequireTransportSecurity", reflect.TypeOf((*MockPerRPCCredentials)(nil).RequireTransportSecurity)) +} diff --git a/exporter/otelarrowexporter/internal/arrow/prioritizer.go b/exporter/otelarrowexporter/internal/arrow/prioritizer.go new file mode 100644 index 000000000000..84220338348f --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/prioritizer.go @@ -0,0 +1,107 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" + +import ( + "context" + "fmt" + "strconv" + "strings" + + "go.opentelemetry.io/collector/component" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" +) + +var ErrStreamRestarting = status.Error(codes.Aborted, "stream is restarting") + +type PrioritizerName string + +var _ component.ConfigValidator = PrioritizerName("") + +const ( + DefaultPrioritizer PrioritizerName = LeastLoadedPrioritizer + LeastLoadedPrioritizer PrioritizerName = llPrefix + LeastLoadedTwoPrioritizer PrioritizerName = llPrefix + "2" + LeastLoadedFourPrioritizer PrioritizerName = llPrefix + "4" + unsetPrioritizer PrioritizerName = "" + + llPrefix = "leastloaded" +) + +// streamPrioritizer is an interface for prioritizing multiple +// streams. +type streamPrioritizer interface { + // nextWriter gets the next stream writer. In case the exporter + // was downgraded, returns nil. + nextWriter() streamWriter + + // downgrade is called with the root context of the exporter, + // and may block indefinitely. this allows the prioritizer to + // drain its channel(s) until the exporter shuts down. + downgrade(context.Context) +} + +// streamWriter is the caller's interface to a stream. +type streamWriter interface { + // sendAndWait is called to begin a write. After completing + // the call, wait on writeItem.errCh for the response. + sendAndWait(context.Context, <-chan error, writeItem) error +} + +func newStreamPrioritizer(dc doneCancel, name PrioritizerName, numStreams int) (streamPrioritizer, []*streamWorkState) { + if name == unsetPrioritizer { + name = DefaultPrioritizer + } + if strings.HasPrefix(string(name), llPrefix) { + // error was checked and reported in Validate + n, err := strconv.Atoi(string(name[len(llPrefix):])) + if err == nil { + return newBestOfNPrioritizer(dc, n, numStreams, pendingRequests) + } + } + return newBestOfNPrioritizer(dc, numStreams, numStreams, pendingRequests) +} + +// pendingRequests is the load function used by leastloadedN. +func pendingRequests(sws *streamWorkState) float64 { + sws.lock.Lock() + defer sws.lock.Unlock() + return float64(len(sws.waiters) + len(sws.toWrite)) +} + +// Validate implements component.ConfigValidator +func (p PrioritizerName) Validate() error { + switch p { + // Exact match cases + case LeastLoadedPrioritizer, unsetPrioritizer: + return nil + } + // "leastloadedN" cases + if !strings.HasPrefix(string(p), llPrefix) { + return fmt.Errorf("unrecognized prioritizer: %q", string(p)) + } + _, err := strconv.Atoi(string(p[len(llPrefix):])) + if err != nil { + return fmt.Errorf("invalid prioritizer: %q", string(p)) + } + return nil +} + +// drain helps avoid a race condition when downgrade happens, it ensures that +// any late-arriving work will immediately see ErrStreamRestarting, and this +// continues until the exporter shuts down. +// +// Note: the downgrade function is a major source of complexity and it is +// probably best removed, instead of having this level of complexity. +func drain(ch <-chan writeItem, done <-chan struct{}) { + for { + select { + case <-done: + return + case item := <-ch: + item.errCh <- ErrStreamRestarting + } + } +} diff --git a/exporter/otelarrowexporter/internal/arrow/stream.go b/exporter/otelarrowexporter/internal/arrow/stream.go new file mode 100644 index 000000000000..7070d8c6ea42 --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/stream.go @@ -0,0 +1,477 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" + +import ( + "bytes" + "context" + "errors" + "fmt" + "io" + "sync" + "time" + + arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + "github.com/open-telemetry/otel-arrow/collector/netstats" + arrowRecord "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record" + "go.opentelemetry.io/collector/component" + "go.opentelemetry.io/collector/consumer/consumererror" + "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/ptrace" + "go.opentelemetry.io/otel" + otelcodes "go.opentelemetry.io/otel/codes" + "go.opentelemetry.io/otel/propagation" + "go.opentelemetry.io/otel/trace" + "go.uber.org/multierr" + "go.uber.org/zap" + "golang.org/x/net/http2/hpack" + "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" +) + +// Stream is 1:1 with gRPC stream. +type Stream struct { + // maxStreamLifetime is the max timeout before stream + // should be closed on the client side. This ensures a + // graceful shutdown before max_connection_age is reached + // on the server side. + maxStreamLifetime time.Duration + + // producer is exclusive to the holder of the stream. + producer arrowRecord.ProducerAPI + + // prioritizer has a reference to the stream, this allows it to be severed. + prioritizer streamPrioritizer + + // telemetry are a copy of the exporter's telemetry settings + telemetry component.TelemetrySettings + + // tracer is used to create a span describing the export. + tracer trace.Tracer + + // client uses the exporter's grpc.ClientConn. this is + // initially nil only set when ArrowStream() calls meaning the + // endpoint recognizes OTel-Arrow. + client AnyStreamClient + + // method the gRPC method name, used for additional instrumentation. + method string + + // netReporter provides network-level metrics. + netReporter netstats.Interface + + // streamWorkState is the interface to prioritizer/balancer, contains + // outstanding request (by batch ID) and the write channel used by + // the stream. All of this state will be inherited by the successor + // stream. + workState *streamWorkState +} + +// streamWorkState contains the state assigned to an Arrow stream. When +// a stream shuts down, the work state is handed to the replacement stream. +type streamWorkState struct { + // toWrite is used to pass pending data between a caller, the + // prioritizer and a stream. + toWrite chan writeItem + + // lock protects waiters + lock sync.Mutex + + // waiters is the response channel for each active batch. + waiters map[int64]chan<- error +} + +// writeItem is passed from the sender (a pipeline consumer) to the +// stream writer, which is not bound by the sender's context. +type writeItem struct { + // records is a ptrace.Traces, plog.Logs, or pmetric.Metrics + records any + // md is the caller's metadata, derived from its context. + md map[string]string + // errCh is used by the stream reader to unblock the sender + // to the stream side, this is a `chan<-`. to the send side, + // this is a `<-chan`. + errCh chan<- error + // uncompSize is computed by the appropriate sizer (in the + // caller's goroutine) + uncompSize int + // producerCtx is used for tracing purposes. + producerCtx context.Context +} + +// newStream constructs a stream +func newStream( + producer arrowRecord.ProducerAPI, + prioritizer streamPrioritizer, + telemetry component.TelemetrySettings, + netReporter netstats.Interface, + workState *streamWorkState, +) *Stream { + tracer := telemetry.TracerProvider.Tracer("otel-arrow-exporter") + return &Stream{ + producer: producer, + prioritizer: prioritizer, + telemetry: telemetry, + tracer: tracer, + netReporter: netReporter, + workState: workState, + } +} + +// setBatchChannel places a waiting consumer's batchID into the waiters map, where +// the stream reader may find it. +func (s *Stream) setBatchChannel(batchID int64, errCh chan<- error) { + s.workState.lock.Lock() + defer s.workState.lock.Unlock() + + s.workState.waiters[batchID] = errCh +} + +// logStreamError decides how to log an error. `which` indicates the +// stream direction, will be "reader" or "writer". +func (s *Stream) logStreamError(which string, err error) { + var code codes.Code + var msg string + // gRPC tends to supply status-wrapped errors, so we always + // unpack them. A wrapped Canceled code indicates intentional + // shutdown, which can be due to normal causes (EOF, e.g., + // max-stream-lifetime reached) or unusual causes (Canceled, + // e.g., because the other stream direction reached an error). + if status, ok := status.FromError(err); ok { + code = status.Code() + msg = status.Message() + } else if errors.Is(err, io.EOF) || errors.Is(err, context.Canceled) { + code = codes.Canceled + msg = err.Error() + } else { + code = codes.Internal + msg = err.Error() + } + if code == codes.Canceled { + s.telemetry.Logger.Debug("arrow stream shutdown", zap.String("which", which), zap.String("message", msg)) + } else { + s.telemetry.Logger.Error("arrow stream error", zap.String("which", which), zap.String("message", msg), zap.Int("code", int(code))) + } +} + +// run blocks the calling goroutine while executing stream logic. run +// will return when the reader and writer are finished. errors will be logged. +func (s *Stream) run(ctx context.Context, dc doneCancel, streamClient StreamClientFunc, grpcOptions []grpc.CallOption) { + sc, method, err := streamClient(ctx, grpcOptions...) + if err != nil { + // Returning with stream.client == nil signals the + // lack of an Arrow stream endpoint. When all the + // streams return with .client == nil, the ready + // channel will be closed, which causes downgrade. + // + // Note: These are gRPC server internal errors and + // will cause downgrade to standard OTLP. These + // cannot be simulated by connecting to a gRPC server + // that does not support the ArrowStream service, with + // or without the WaitForReady flag set. In a real + // gRPC server the first Unimplemented code is + // generally delivered to the Recv() call below, so + // this code path is not taken for an ordinary downgrade. + s.telemetry.Logger.Error("cannot start arrow stream", zap.Error(err)) + return + } + // Setting .client != nil indicates that the endpoint was valid, + // streaming may start. When this stream finishes, it will be + // restarted. + s.method = method + s.client = sc + + // ww is used to wait for the writer. Since we wait for the writer, + // the writer's goroutine is not added to exporter waitgroup (e.wg). + var ww sync.WaitGroup + + var writeErr error + ww.Add(1) + go func() { + defer ww.Done() + writeErr = s.write(ctx) + if writeErr != nil { + dc.cancel() + } + }() + + // the result from read() is processed after cancel and wait, + // so we can set s.client = nil in case of a delayed Unimplemented. + err = s.read(ctx) + + // Wait for the writer to ensure that all waiters are known. + dc.cancel() + ww.Wait() + + if err != nil { + // This branch is reached with an unimplemented status + // with or without the WaitForReady flag. + if status, ok := status.FromError(err); ok && status.Code() == codes.Unimplemented { + // This (client == nil) signals the controller to + // downgrade when all streams have returned in that + // status. + // + // This is a special case because we reset s.client, + // which sets up a downgrade after the streams return. + s.client = nil + s.telemetry.Logger.Info("arrow is not supported", + zap.String("message", status.Message()), + ) + } else { + // All other cases, use the standard log handler. + s.logStreamError("reader", err) + } + } + if writeErr != nil { + s.logStreamError("writer", writeErr) + } + + s.workState.lock.Lock() + defer s.workState.lock.Unlock() + + // The reader and writer have both finished; respond to any + // outstanding waiters. + for _, ch := range s.workState.waiters { + // Note: the top-level OTLP exporter will retry. + ch <- ErrStreamRestarting + } + + s.workState.waiters = map[int64]chan<- error{} +} + +// write repeatedly places this stream into the next-available queue, then +// performs a blocking send(). This returns when the data is in the write buffer, +// the caller waiting on its error channel. +func (s *Stream) write(ctx context.Context) (retErr error) { + // always close the send channel when this function returns. + defer func() { _ = s.client.CloseSend() }() + + // headers are encoding using hpack, reusing a buffer on each call. + var hdrsBuf bytes.Buffer + hdrsEnc := hpack.NewEncoder(&hdrsBuf) + + var timerCh <-chan time.Time + if s.maxStreamLifetime != 0 { + timer := time.NewTimer(s.maxStreamLifetime) + timerCh = timer.C + defer timer.Stop() + } + + for { + // this can block, and if the context is canceled we + // wait for the reader to find this stream. + var wri writeItem + select { + case <-timerCh: + return nil + case wri = <-s.workState.toWrite: + case <-ctx.Done(): + return ctx.Err() + } + + err := s.encodeAndSend(wri, &hdrsBuf, hdrsEnc) + if err != nil { + // Note: For the return statement below, there is no potential + // sender race because the stream is not available, as indicated by + // the successful <-stream.toWrite above + return err + } + } +} + +func (s *Stream) encodeAndSend(wri writeItem, hdrsBuf *bytes.Buffer, hdrsEnc *hpack.Encoder) (retErr error) { + ctx, span := s.tracer.Start(wri.producerCtx, "otel_arrow_stream_send") + defer span.End() + + defer func() { + // Set span status if an error is returned. + if retErr != nil { + span := trace.SpanFromContext(ctx) + span.SetStatus(otelcodes.Error, retErr.Error()) + } + }() + // Get the global propagator, to inject context. When there + // are no fields, it's a no-op propagator implementation and + // we can skip the allocations inside this block. + prop := otel.GetTextMapPropagator() + + if len(prop.Fields()) > 0 { + // When the incoming context carries nothing, the map + // will be nil. Allocate, if necessary. + if wri.md == nil { + wri.md = map[string]string{} + } + // Use the global propagator to inject trace context. Note that + // OpenTelemetry Collector will set a global propagator from the + // service::telemetry::traces configuration. + otel.GetTextMapPropagator().Inject(ctx, propagation.MapCarrier(wri.md)) + } + + batch, err := s.encode(wri.records) + if err != nil { + // This is some kind of internal error. We will restart the + // stream and mark this record as a permanent one. + err = fmt.Errorf("encode: %w", err) + wri.errCh <- consumererror.NewPermanent(err) + return err + } + + // Optionally include outgoing metadata, if present. + if len(wri.md) != 0 { + hdrsBuf.Reset() + for key, val := range wri.md { + err := hdrsEnc.WriteField(hpack.HeaderField{ + Name: key, + Value: val, + }) + if err != nil { + // This case is like the encode-failure case + // above, we will restart the stream but consider + // this a permenent error. + err = fmt.Errorf("hpack: %w", err) + wri.errCh <- consumererror.NewPermanent(err) + return err + } + } + batch.Headers = hdrsBuf.Bytes() + } + + // Let the receiver knows what to look for. + s.setBatchChannel(batch.BatchId, wri.errCh) + + // The netstats code knows that uncompressed size is + // unreliable for arrow transport, so we instrument it + // directly here. Only the primary direction of transport + // is instrumented this way. + if wri.uncompSize != 0 { + var sized netstats.SizesStruct + sized.Method = s.method + sized.Length = int64(wri.uncompSize) + s.netReporter.CountSend(ctx, sized) + s.netReporter.SetSpanSizeAttributes(ctx, sized) + } + + if err := s.client.Send(batch); err != nil { + // The error will be sent to errCh during cleanup for this stream. + // Note: do not wrap this error, it may contain a Status. + return err + } + + return nil +} + +// read repeatedly reads a batch status and releases the consumers waiting for +// a response. +func (s *Stream) read(_ context.Context) error { + // Note we do not use the context, the stream context might + // cancel a call to Recv() but the call to processBatchStatus + // is non-blocking. + for { + // Note: if the client has called CloseSend() and is waiting for a response from the server. + // And if the server fails for some reason, we will wait until some other condition, such as a context + // timeout. TODO: possibly, improve to wait for no outstanding requests and then stop reading. + resp, err := s.client.Recv() + if err != nil { + // Note: do not wrap, contains a Status. + return err + } + + if err = s.processBatchStatus(resp); err != nil { + return fmt.Errorf("process: %w", err) + } + } +} + +// getSenderChannel takes the stream lock and removes the corresonding +// sender channel. +func (sws *streamWorkState) getSenderChannel(status *arrowpb.BatchStatus) (chan<- error, error) { + sws.lock.Lock() + defer sws.lock.Unlock() + + ch, ok := sws.waiters[status.BatchId] + if !ok { + // Will break the stream. + return nil, fmt.Errorf("unrecognized batch ID: %d", status.BatchId) + } + + delete(sws.waiters, status.BatchId) + return ch, nil +} + +// processBatchStatus processes a single response from the server and unblocks the +// associated sender. +func (s *Stream) processBatchStatus(ss *arrowpb.BatchStatus) error { + ch, ret := s.workState.getSenderChannel(ss) + + if ch == nil { + // In case getSenderChannels encounters a problem, the + // channel is nil. + return ret + } + + if ss.StatusCode == arrowpb.StatusCode_OK { + ch <- nil + return nil + } + // See ../../otelarrow.go's `shouldRetry()` method, the retry + // behavior described here is achieved there by setting these + // recognized codes. + var err error + switch ss.StatusCode { + case arrowpb.StatusCode_UNAVAILABLE: + // Retryable + err = status.Errorf(codes.Unavailable, "destination unavailable: %d: %s", ss.BatchId, ss.StatusMessage) + case arrowpb.StatusCode_INVALID_ARGUMENT: + // Not retryable + err = status.Errorf(codes.InvalidArgument, "invalid argument: %d: %s", ss.BatchId, ss.StatusMessage) + case arrowpb.StatusCode_RESOURCE_EXHAUSTED: + // Retry behavior is configurable + err = status.Errorf(codes.ResourceExhausted, "resource exhausted: %d: %s", ss.BatchId, ss.StatusMessage) + default: + // Note: a Canceled StatusCode was once returned by receivers following + // a CloseSend() from the exporter. This is now handled using error + // status codes. If an exporter is upgraded before a receiver, the exporter + // will log this error when the receiver closes streams. + + // Unrecognized status code. + err = status.Errorf(codes.Internal, "unexpected stream response: %d: %s", ss.BatchId, ss.StatusMessage) + + // Will break the stream. + ret = multierr.Append(ret, err) + } + ch <- err + return ret +} + +// encode produces the next batch of Arrow records. +func (s *Stream) encode(records any) (_ *arrowpb.BatchArrowRecords, retErr error) { + // Defensively, protect against panics in the Arrow producer function. + defer func() { + if err := recover(); err != nil { + // When this happens, the stacktrace is + // important and lost if we don't capture it + // here. + s.telemetry.Logger.Debug("panic detail in otel-arrow-adapter", + zap.Reflect("recovered", err), + zap.Stack("stacktrace"), + ) + retErr = fmt.Errorf("panic in otel-arrow-adapter: %v", err) + } + }() + var batch *arrowpb.BatchArrowRecords + var err error + switch data := records.(type) { + case ptrace.Traces: + batch, err = s.producer.BatchArrowRecordsFromTraces(data) + case plog.Logs: + batch, err = s.producer.BatchArrowRecordsFromLogs(data) + case pmetric.Metrics: + batch, err = s.producer.BatchArrowRecordsFromMetrics(data) + default: + return nil, fmt.Errorf("unsupported OTLP type: %T", records) + } + return batch, err +} diff --git a/exporter/otelarrowexporter/internal/arrow/stream_test.go b/exporter/otelarrowexporter/internal/arrow/stream_test.go new file mode 100644 index 000000000000..e916667c455c --- /dev/null +++ b/exporter/otelarrowexporter/internal/arrow/stream_test.go @@ -0,0 +1,349 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package arrow + +import ( + "context" + "errors" + "fmt" + "sync" + "testing" + "time" + + arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + "github.com/open-telemetry/otel-arrow/collector/netstats" + arrowRecordMock "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record/mock" + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/consumer/consumererror" + "go.uber.org/mock/gomock" + "google.golang.org/grpc" +) + +var oneBatch = &arrowpb.BatchArrowRecords{ + BatchId: 1, +} + +type streamTestCase struct { + *commonTestCase + *commonTestStream + + producer *arrowRecordMock.MockProducerAPI + prioritizer streamPrioritizer + bgctx context.Context + doneCancel + fromTracesCall *gomock.Call + fromMetricsCall *gomock.Call + fromLogsCall *gomock.Call + stream *Stream + wait sync.WaitGroup +} + +func newStreamTestCase(t *testing.T, pname PrioritizerName) *streamTestCase { + ctrl := gomock.NewController(t) + producer := arrowRecordMock.NewMockProducerAPI(ctrl) + + bg, dc := newDoneCancel(context.Background()) + prio, state := newStreamPrioritizer(dc, pname, 1) + + ctc := newCommonTestCase(t, NotNoisy) + cts := ctc.newMockStream(bg) + + // metadata functionality is tested in exporter_test.go + ctc.requestMetadataCall.AnyTimes().Return(nil, nil) + + stream := newStream(producer, prio, ctc.telset, netstats.Noop{}, state[0]) + stream.maxStreamLifetime = 10 * time.Second + + fromTracesCall := producer.EXPECT().BatchArrowRecordsFromTraces(gomock.Any()).Times(0) + fromMetricsCall := producer.EXPECT().BatchArrowRecordsFromMetrics(gomock.Any()).Times(0) + fromLogsCall := producer.EXPECT().BatchArrowRecordsFromLogs(gomock.Any()).Times(0) + + return &streamTestCase{ + commonTestCase: ctc, + commonTestStream: cts, + producer: producer, + prioritizer: prio, + bgctx: bg, + doneCancel: dc, + stream: stream, + fromTracesCall: fromTracesCall, + fromMetricsCall: fromMetricsCall, + fromLogsCall: fromLogsCall, + } +} + +// start runs a test stream according to the behavior of testChannel. +func (tc *streamTestCase) start(channel testChannel) { + tc.traceCall.Times(1).DoAndReturn(tc.connectTestStream(channel)) + + tc.wait.Add(1) + + go func() { + defer tc.wait.Done() + tc.stream.run(tc.bgctx, tc.doneCancel, tc.traceClient, nil) + }() +} + +// cancelAndWait cancels the context and waits for the runner to return. +func (tc *streamTestCase) cancelAndWaitForShutdown() { + tc.cancel() + tc.wait.Wait() +} + +// cancel waits for the runner to exit without canceling the context. +func (tc *streamTestCase) waitForShutdown() { + tc.wait.Wait() +} + +// connectTestStream returns the stream under test from the common test's mock ArrowStream(). +func (tc *streamTestCase) connectTestStream(h testChannel) func(context.Context, ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, +) { + return func(ctx context.Context, _ ...grpc.CallOption) ( + arrowpb.ArrowTracesService_ArrowTracesClient, + error, + ) { + if err := h.onConnect(ctx); err != nil { + return nil, err + } + tc.sendCall.AnyTimes().DoAndReturn(h.onSend(ctx)) + tc.recvCall.AnyTimes().DoAndReturn(h.onRecv(ctx)) + tc.closeSendCall.AnyTimes().DoAndReturn(h.onCloseSend()) + return tc.anyStreamClient, nil + } +} + +// get returns the stream via the prioritizer it is registered with. +func (tc *streamTestCase) mustGet() streamWriter { + stream := tc.prioritizer.nextWriter() + if stream == nil { + panic("unexpected nil stream") + } + return stream +} + +func (tc *streamTestCase) mustSendAndWait() error { + ctx := context.Background() + ch := make(chan error, 1) + wri := writeItem{ + producerCtx: context.Background(), + records: twoTraces, + errCh: ch, + } + return tc.mustGet().sendAndWait(ctx, ch, wri) +} + +// TestStreamNoMaxLifetime verifies that configuring +// max_stream_lifetime==0 works and the client never +// calls CloseSend(). +func TestStreamNoMaxLifetime(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + + tc := newStreamTestCase(t, pname) + tc.stream.maxStreamLifetime = 0 + + tc.fromTracesCall.Times(1).Return(oneBatch, nil) + tc.closeSendCall.Times(0) + + channel := newHealthyTestChannel() + tc.start(channel) + defer tc.cancelAndWaitForShutdown() + var wg sync.WaitGroup + wg.Add(1) + defer wg.Wait() + go func() { + defer wg.Done() + batch := <-channel.sent + channel.recv <- statusOKFor(batch.BatchId) + }() + + err := tc.mustSendAndWait() + require.NoError(t, err) + }) + } +} + +// TestStreamEncodeError verifies that an encoder error in the sender +// yields a permanent error. +func TestStreamEncodeError(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + testErr := fmt.Errorf("test encode error") + tc.fromTracesCall.Times(1).Return(nil, testErr) + + tc.start(newHealthyTestChannel()) + defer tc.cancelAndWaitForShutdown() + + // sender should get a permanent testErr + err := tc.mustSendAndWait() + require.Error(t, err) + require.True(t, errors.Is(err, testErr)) + require.True(t, consumererror.IsPermanent(err)) + }) + } +} + +// TestStreamUnknownBatchError verifies that the stream reader handles +// a unknown BatchID. +func TestStreamUnknownBatchError(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + tc.fromTracesCall.Times(1).Return(oneBatch, nil) + + channel := newHealthyTestChannel() + tc.start(channel) + defer tc.cancelAndWaitForShutdown() + + var wg sync.WaitGroup + wg.Add(1) + defer wg.Wait() + go func() { + defer wg.Done() + <-channel.sent + channel.recv <- statusOKFor(-1 /*unknown*/) + }() + // sender should get ErrStreamRestarting + err := tc.mustSendAndWait() + require.Error(t, err) + require.True(t, errors.Is(err, ErrStreamRestarting)) + }) + } +} + +// TestStreamStatusUnavailableInvalid verifies that the stream reader handles +// an unavailable or invalid status w/o breaking the stream. +func TestStreamStatusUnavailableInvalid(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + tc.fromTracesCall.Times(3).Return(oneBatch, nil) + + channel := newHealthyTestChannel() + tc.start(channel) + defer tc.cancelAndWaitForShutdown() + + var wg sync.WaitGroup + wg.Add(1) + defer wg.Wait() + go func() { + defer wg.Done() + batch := <-channel.sent + channel.recv <- statusUnavailableFor(batch.BatchId) + batch = <-channel.sent + channel.recv <- statusInvalidFor(batch.BatchId) + batch = <-channel.sent + channel.recv <- statusOKFor(batch.BatchId) + }() + // sender should get "test unavailable" once, success second time. + err := tc.mustSendAndWait() + require.Error(t, err) + require.Contains(t, err.Error(), "test unavailable") + + err = tc.mustSendAndWait() + require.Error(t, err) + require.Contains(t, err.Error(), "test invalid") + + err = tc.mustSendAndWait() + require.NoError(t, err) + }) + } +} + +// TestStreamStatusUnrecognized verifies that the stream reader handles +// an unrecognized status by breaking the stream. +func TestStreamStatusUnrecognized(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + tc.fromTracesCall.Times(1).Return(oneBatch, nil) + + channel := newHealthyTestChannel() + tc.start(channel) + defer tc.cancelAndWaitForShutdown() + + var wg sync.WaitGroup + wg.Add(1) + defer wg.Wait() + go func() { + defer wg.Done() + batch := <-channel.sent + channel.recv <- statusUnrecognizedFor(batch.BatchId) + }() + err := tc.mustSendAndWait() + require.Error(t, err) + require.Contains(t, err.Error(), "test unrecognized") + + // Note: do not cancel the context, the stream should be + // shutting down due to the error. + tc.waitForShutdown() + }) + } +} + +// TestStreamUnsupported verifies that the stream signals downgrade +// when an Unsupported code is received, which is how the gRPC client +// responds when the server does not support arrow. +func TestStreamUnsupported(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + // If the write succeeds before the read, then the FromTraces + // call will occur. Otherwise, it will not. + + tc.fromTracesCall.MinTimes(0).MaxTimes(1).Return(oneBatch, nil) + + channel := newArrowUnsupportedTestChannel() + tc.start(channel) + defer func() { + // When the stream returns, the downgrade is needed to + // cause the request to respond or else it waits for a new + // stream. + tc.waitForShutdown() + tc.cancel() + }() + + err := tc.mustSendAndWait() + require.Equal(t, ErrStreamRestarting, err) + + tc.waitForShutdown() + + require.Less(t, 0, len(tc.observedLogs.All()), "should have at least one log: %v", tc.observedLogs.All()) + require.Equal(t, tc.observedLogs.All()[0].Message, "arrow is not supported") + }) + } +} + +// TestStreamSendError verifies that the stream reader handles a +// Send() error. +func TestStreamSendError(t *testing.T) { + for _, pname := range AllPrioritizers { + t.Run(string(pname), func(t *testing.T) { + tc := newStreamTestCase(t, pname) + + tc.fromTracesCall.Times(1).Return(oneBatch, nil) + + channel := newSendErrorTestChannel() + tc.start(channel) + defer tc.cancelAndWaitForShutdown() + + go func() { + time.Sleep(200 * time.Millisecond) + channel.unblock() + }() + // sender should get ErrStreamRestarting + err := tc.mustSendAndWait() + require.Error(t, err) + require.True(t, errors.Is(err, ErrStreamRestarting)) + }) + } +} diff --git a/exporter/otelarrowexporter/metadata.yaml b/exporter/otelarrowexporter/metadata.yaml index c2e8b6f8e2dd..3a830d26c436 100644 --- a/exporter/otelarrowexporter/metadata.yaml +++ b/exporter/otelarrowexporter/metadata.yaml @@ -9,9 +9,6 @@ status: codeowners: active: [jmacd, moh-osman3] -# TODO: Update the exporter to pass the tests tests: - skip_lifecycle: true - skip_shutdown: true - goleak: - skip: true + config: + endpoint: http://127.0.0.1:4317 diff --git a/exporter/otelarrowexporter/otelarrow.go b/exporter/otelarrowexporter/otelarrow.go index c1e689e3c01a..01b21e392b00 100644 --- a/exporter/otelarrowexporter/otelarrow.go +++ b/exporter/otelarrowexporter/otelarrow.go @@ -7,72 +7,329 @@ import ( "context" "errors" "fmt" + "runtime" + "time" + arrowPkg "github.com/apache/arrow/go/v14/arrow" + "github.com/open-telemetry/otel-arrow/collector/compression/zstd" + "github.com/open-telemetry/otel-arrow/collector/netstats" + arrowRecord "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record" "go.opentelemetry.io/collector/component" + "go.opentelemetry.io/collector/config/configcompression" + "go.opentelemetry.io/collector/consumer/consumererror" "go.opentelemetry.io/collector/exporter" + "go.opentelemetry.io/collector/exporter/exporterhelper" "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/plog/plogotlp" "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp" "go.opentelemetry.io/collector/pdata/ptrace" + "go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp" + "go.uber.org/multierr" + "go.uber.org/zap" + "google.golang.org/genproto/googleapis/rpc/errdetails" "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/credentials" + "google.golang.org/grpc/metadata" + "google.golang.org/grpc/status" "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow" ) -// baseExporter is used as the basis for all OpenTelemetry signal types. type baseExporter struct { - // config is the active component.Config. + // Input configuration. config *Config - // settings are the active collector-wide settings. - settings exporter.CreateSettings + // gRPC clients and connection. + traceExporter ptraceotlp.GRPCClient + metricExporter pmetricotlp.GRPCClient + logExporter plogotlp.GRPCClient + clientConn *grpc.ClientConn + metadata metadata.MD + callOptions []grpc.CallOption + settings exporter.CreateSettings + netReporter *netstats.NetworkReporter - // TODO: implementation + // Default user-agent header. + userAgent string + + // OTel-Arrow optional state + arrow *arrow.Exporter + // streamClientFunc is the stream constructor + streamClientFactory streamClientFactory } type streamClientFactory func(conn *grpc.ClientConn) arrow.StreamClientFunc -// newExporter configures a new exporter using the associated stream factory for Arrow. -func newExporter(cfg component.Config, set exporter.CreateSettings, _ streamClientFactory) (*baseExporter, error) { - // TODO: Implementation. - oCfg, ok := cfg.(*Config) - if !ok { - return nil, fmt.Errorf("unrecognized configuration type: %T", cfg) - } +// Crete new exporter and start it. The exporter will begin connecting but +// this function may return before the connection is established. +func newExporter(cfg component.Config, set exporter.CreateSettings, streamClientFactory streamClientFactory) (*baseExporter, error) { + oCfg := cfg.(*Config) + if oCfg.Endpoint == "" { - return nil, errors.New("OTel-Arrow exporter config requires an Endpoint") + return nil, errors.New("OTLP exporter config requires an Endpoint") + } + + netReporter, err := netstats.NewExporterNetworkReporter(set) + if err != nil { + return nil, err } + userAgent := fmt.Sprintf("%s/%s (%s/%s)", + set.BuildInfo.Description, set.BuildInfo.Version, runtime.GOOS, runtime.GOARCH) + + if !oCfg.Arrow.Disabled { + // Ignoring an error because Validate() was called. + _ = zstd.SetEncoderConfig(oCfg.Arrow.Zstd) + + userAgent += fmt.Sprintf(" ApacheArrow/%s (NumStreams/%d)", arrowPkg.PkgVersion, oCfg.Arrow.NumStreams) + } + return &baseExporter{ - config: oCfg, - settings: set, + config: oCfg, + settings: set, + userAgent: userAgent, + netReporter: netReporter, + streamClientFactory: streamClientFactory, }, nil } -// start configures and starts the gRPC client connection. +// start actually creates the gRPC connection. The client construction is deferred till this point as this +// is the only place we get hold of Extensions which are required to construct auth round tripper. func (e *baseExporter) start(ctx context.Context, host component.Host) (err error) { - // TODO: Implementation: the following is a placeholder used - // to satisfy gRPC configuration-related configuration errors. - if _, err = e.config.ClientConfig.ToClientConn(ctx, host, e.settings.TelemetrySettings); err != nil { + dialOpts := []grpc.DialOption{ + grpc.WithUserAgent(e.userAgent), + } + if e.netReporter != nil { + dialOpts = append(dialOpts, grpc.WithStatsHandler(e.netReporter.Handler())) + } + dialOpts = append(dialOpts, e.config.UserDialOptions...) + if e.clientConn, err = e.config.ClientConfig.ToClientConn(ctx, host, e.settings.TelemetrySettings, dialOpts...); err != nil { return err } + e.traceExporter = ptraceotlp.NewGRPCClient(e.clientConn) + e.metricExporter = pmetricotlp.NewGRPCClient(e.clientConn) + e.logExporter = plogotlp.NewGRPCClient(e.clientConn) + headers := map[string]string{} + for k, v := range e.config.ClientConfig.Headers { + headers[k] = string(v) + } + e.metadata = metadata.New(headers) + e.callOptions = []grpc.CallOption{ + grpc.WaitForReady(e.config.ClientConfig.WaitForReady), + } + + if !e.config.Arrow.Disabled { + // Note this sets static outgoing context for all future stream requests. + ctx := e.enhanceContext(context.Background()) + + var perRPCCreds credentials.PerRPCCredentials + if e.config.ClientConfig.Auth != nil { + // Get the auth extension, we'll use it to enrich the request context. + authClient, err := e.config.ClientConfig.Auth.GetClientAuthenticator(host.GetExtensions()) + if err != nil { + return err + } + + perRPCCreds, err = authClient.PerRPCCredentials() + if err != nil { + return err + } + } + + arrowOpts := e.config.Arrow.toArrowProducerOptions() + + arrowCallOpts := e.callOptions + + if e.config.ClientConfig.Compression == configcompression.TypeZstd { + // ignore the error below b/c Validate() was called + _ = zstd.SetEncoderConfig(e.config.Arrow.Zstd) + // use the configured compressor. + arrowCallOpts = append(arrowCallOpts, e.config.Arrow.Zstd.CallOption()) + } + + e.arrow = arrow.NewExporter(e.config.Arrow.MaxStreamLifetime, e.config.Arrow.NumStreams, e.config.Arrow.Prioritizer, e.config.Arrow.DisableDowngrade, e.settings.TelemetrySettings, arrowCallOpts, func() arrowRecord.ProducerAPI { + return arrowRecord.NewProducerWithOptions(arrowOpts...) + }, e.streamClientFactory(e.clientConn), perRPCCreds, e.netReporter) + + if err := e.arrow.Start(ctx); err != nil { + return err + } + } + return nil } -func (e *baseExporter) shutdown(_ context.Context) error { - // TODO: Implementation. +func (e *baseExporter) shutdown(ctx context.Context) error { + var err error + if e.arrow != nil { + err = multierr.Append(err, e.arrow.Shutdown(ctx)) + } + if e.clientConn != nil { + err = multierr.Append(err, e.clientConn.Close()) + } + return err +} + +// arrowSendAndWait gets an available stream and tries to send using +// Arrow if it is configured. A (false, nil) result indicates for the +// caller to fall back to ordinary OTLP. +// +// Note that ctx is has not had enhanceContext() called, meaning it +// will have outgoing gRPC metadata only when an upstream processor or +// receiver placed it there. +func (e *baseExporter) arrowSendAndWait(ctx context.Context, data any) (sent bool, _ error) { + if e.arrow == nil { + return false, nil + } + sent, err := e.arrow.SendAndWait(ctx, data) + if err != nil { + return sent, processError(err) + } + return sent, nil +} + +func (e *baseExporter) pushTraces(ctx context.Context, td ptrace.Traces) error { + if sent, err := e.arrowSendAndWait(ctx, td); err != nil { + return err + } else if sent { + return nil + } + req := ptraceotlp.NewExportRequestFromTraces(td) + resp, respErr := e.traceExporter.Export(e.enhanceContext(ctx), req, e.callOptions...) + if err := processError(respErr); err != nil { + return err + } + partialSuccess := resp.PartialSuccess() + if !(partialSuccess.ErrorMessage() == "" && partialSuccess.RejectedSpans() == 0) { + // TODO: These should be counted, similar to dropped items. + e.settings.Logger.Warn("partial success", + zap.String("message", resp.PartialSuccess().ErrorMessage()), + zap.Int64("num_rejected", resp.PartialSuccess().RejectedSpans()), + ) + } return nil } -func (e *baseExporter) pushTraces(_ context.Context, _ ptrace.Traces) error { - // TODO: Implementation. +func (e *baseExporter) pushMetrics(ctx context.Context, md pmetric.Metrics) error { + if sent, err := e.arrowSendAndWait(ctx, md); err != nil { + return err + } else if sent { + return nil + } + req := pmetricotlp.NewExportRequestFromMetrics(md) + resp, respErr := e.metricExporter.Export(e.enhanceContext(ctx), req, e.callOptions...) + if err := processError(respErr); err != nil { + return err + } + partialSuccess := resp.PartialSuccess() + if !(partialSuccess.ErrorMessage() == "" && partialSuccess.RejectedDataPoints() == 0) { + // TODO: These should be counted, similar to dropped items. + e.settings.Logger.Warn("partial success", + zap.String("message", resp.PartialSuccess().ErrorMessage()), + zap.Int64("num_rejected", resp.PartialSuccess().RejectedDataPoints()), + ) + } return nil } -func (e *baseExporter) pushMetrics(_ context.Context, _ pmetric.Metrics) error { - // TODO: Implementation. +func (e *baseExporter) pushLogs(ctx context.Context, ld plog.Logs) error { + if sent, err := e.arrowSendAndWait(ctx, ld); err != nil { + return err + } else if sent { + return nil + } + req := plogotlp.NewExportRequestFromLogs(ld) + resp, respErr := e.logExporter.Export(e.enhanceContext(ctx), req, e.callOptions...) + if err := processError(respErr); err != nil { + return err + } + partialSuccess := resp.PartialSuccess() + if !(partialSuccess.ErrorMessage() == "" && partialSuccess.RejectedLogRecords() == 0) { + // TODO: These should be counted, similar to dropped items. + e.settings.Logger.Warn("partial success", + zap.String("message", resp.PartialSuccess().ErrorMessage()), + zap.Int64("num_rejected", resp.PartialSuccess().RejectedLogRecords()), + ) + } return nil } -func (e *baseExporter) pushLogs(_ context.Context, _ plog.Logs) error { - // TODO: Implementation. +func (e *baseExporter) enhanceContext(ctx context.Context) context.Context { + if e.metadata.Len() > 0 { + ctx = metadata.NewOutgoingContext(ctx, e.metadata) + } + return ctx +} + +func processError(err error) error { + if err == nil { + // Request is successful, we are done. + return nil + } + + // We have an error, check gRPC status code. + st := status.Convert(err) + if st.Code() == codes.OK { + // Not really an error, still success. + return nil + } + + // Now, this is this a real error. + + retryInfo := getRetryInfo(st) + + if !shouldRetry(st.Code(), retryInfo) { + // It is not a retryable error, we should not retry. + return consumererror.NewPermanent(err) + } + + // Check if server returned throttling information. + throttleDuration := getThrottleDuration(retryInfo) + if throttleDuration != 0 { + // We are throttled. Wait before retrying as requested by the server. + return exporterhelper.NewThrottleRetry(err, throttleDuration) + } + + // Need to retry. + + return err +} + +func shouldRetry(code codes.Code, retryInfo *errdetails.RetryInfo) bool { + switch code { + case codes.Canceled, + codes.DeadlineExceeded, + codes.Aborted, + codes.OutOfRange, + codes.Unavailable, + codes.DataLoss: + // These are retryable errors. + return true + case codes.ResourceExhausted: + // Retry only if RetryInfo was supplied by the server. + // This indicates that the server can still recover from resource exhaustion. + return retryInfo != nil + } + // Don't retry on any other code. + return false +} + +func getRetryInfo(status *status.Status) *errdetails.RetryInfo { + for _, detail := range status.Details() { + if t, ok := detail.(*errdetails.RetryInfo); ok { + return t + } + } return nil } + +func getThrottleDuration(t *errdetails.RetryInfo) time.Duration { + if t == nil || t.RetryDelay == nil { + return 0 + } + if t.RetryDelay.Seconds > 0 || t.RetryDelay.Nanos > 0 { + return time.Duration(t.RetryDelay.Seconds)*time.Second + time.Duration(t.RetryDelay.Nanos)*time.Nanosecond + } + return 0 +} diff --git a/exporter/otelarrowexporter/otelarrow_test.go b/exporter/otelarrowexporter/otelarrow_test.go new file mode 100644 index 000000000000..3d84fd6ca618 --- /dev/null +++ b/exporter/otelarrowexporter/otelarrow_test.go @@ -0,0 +1,1189 @@ +// Copyright The OpenTelemetry Authors +// SPDX-License-Identifier: Apache-2.0 + +package otelarrowexporter + +import ( + "context" + "fmt" + "io" + "net" + "net/http" + "path/filepath" + "runtime" + "sync" + "sync/atomic" + "testing" + "time" + + arrowpb "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1" + arrowpbMock "github.com/open-telemetry/otel-arrow/api/experimental/arrow/v1/mock" + "github.com/open-telemetry/otel-arrow/collector/testdata" + arrowRecord "github.com/open-telemetry/otel-arrow/pkg/otel/arrow_record" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/client" + "go.opentelemetry.io/collector/component" + "go.opentelemetry.io/collector/component/componenttest" + "go.opentelemetry.io/collector/config/configauth" + "go.opentelemetry.io/collector/config/configgrpc" + "go.opentelemetry.io/collector/config/configopaque" + "go.opentelemetry.io/collector/config/configtls" + "go.opentelemetry.io/collector/exporter" + "go.opentelemetry.io/collector/exporter/exportertest" + "go.opentelemetry.io/collector/extension" + "go.opentelemetry.io/collector/extension/auth" + "go.opentelemetry.io/collector/pdata/plog" + "go.opentelemetry.io/collector/pdata/plog/plogotlp" + "go.opentelemetry.io/collector/pdata/pmetric" + "go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp" + "go.opentelemetry.io/collector/pdata/ptrace" + "go.opentelemetry.io/collector/pdata/ptrace/ptraceotlp" + "go.uber.org/mock/gomock" + "go.uber.org/zap/zaptest" + "golang.org/x/net/http2/hpack" + "google.golang.org/genproto/googleapis/rpc/errdetails" + "google.golang.org/grpc" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/credentials" + "google.golang.org/grpc/metadata" + "google.golang.org/grpc/status" + "google.golang.org/protobuf/types/known/durationpb" + + "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/otelarrowexporter/internal/arrow/grpcmock" +) + +type mockReceiver struct { + srv *grpc.Server + ln net.Listener + requestCount *atomic.Int32 + totalItems *atomic.Int32 + mux sync.Mutex + metadata metadata.MD + exportError error +} + +func (r *mockReceiver) getMetadata() metadata.MD { + r.mux.Lock() + defer r.mux.Unlock() + return r.metadata +} + +func (r *mockReceiver) setExportError(err error) { + r.mux.Lock() + defer r.mux.Unlock() + r.exportError = err +} + +type mockTracesReceiver struct { + ptraceotlp.UnimplementedGRPCServer + mockReceiver + exportResponse func() ptraceotlp.ExportResponse + lastRequest ptrace.Traces +} + +func (r *mockTracesReceiver) Export(ctx context.Context, req ptraceotlp.ExportRequest) (ptraceotlp.ExportResponse, error) { + r.requestCount.Add(int32(1)) + td := req.Traces() + r.totalItems.Add(int32(td.SpanCount())) + r.mux.Lock() + defer r.mux.Unlock() + r.lastRequest = td + r.metadata, _ = metadata.FromIncomingContext(ctx) + return r.exportResponse(), r.exportError +} + +func (r *mockTracesReceiver) getLastRequest() ptrace.Traces { + r.mux.Lock() + defer r.mux.Unlock() + return r.lastRequest +} + +func (r *mockTracesReceiver) setExportResponse(fn func() ptraceotlp.ExportResponse) { + r.mux.Lock() + defer r.mux.Unlock() + r.exportResponse = fn +} + +func otelArrowTracesReceiverOnGRPCServer(ln net.Listener, useTLS bool) (*mockTracesReceiver, error) { + sopts := []grpc.ServerOption{} + + if useTLS { + _, currentFile, _, _ := runtime.Caller(0) + basepath := filepath.Dir(currentFile) + certpath := filepath.Join(basepath, filepath.Join("testdata", "test_cert.pem")) + keypath := filepath.Join(basepath, filepath.Join("testdata", "test_key.pem")) + + creds, err := credentials.NewServerTLSFromFile(certpath, keypath) + if err != nil { + return nil, err + } + sopts = append(sopts, grpc.Creds(creds)) + } + + rcv := &mockTracesReceiver{ + mockReceiver: mockReceiver{ + srv: grpc.NewServer(sopts...), + ln: ln, + requestCount: &atomic.Int32{}, + totalItems: &atomic.Int32{}, + }, + exportResponse: ptraceotlp.NewExportResponse, + } + + ptraceotlp.RegisterGRPCServer(rcv.srv, rcv) + + return rcv, nil +} + +func (r *mockTracesReceiver) start() { + go func() { + _ = r.srv.Serve(r.ln) + }() +} + +type mockLogsReceiver struct { + plogotlp.UnimplementedGRPCServer + mockReceiver + exportResponse func() plogotlp.ExportResponse + lastRequest plog.Logs +} + +func (r *mockLogsReceiver) Export(ctx context.Context, req plogotlp.ExportRequest) (plogotlp.ExportResponse, error) { + r.requestCount.Add(int32(1)) + ld := req.Logs() + r.totalItems.Add(int32(ld.LogRecordCount())) + r.mux.Lock() + defer r.mux.Unlock() + r.lastRequest = ld + r.metadata, _ = metadata.FromIncomingContext(ctx) + return r.exportResponse(), r.exportError +} + +func (r *mockLogsReceiver) getLastRequest() plog.Logs { + r.mux.Lock() + defer r.mux.Unlock() + return r.lastRequest +} + +func (r *mockLogsReceiver) setExportResponse(fn func() plogotlp.ExportResponse) { + r.mux.Lock() + defer r.mux.Unlock() + r.exportResponse = fn +} + +func otelArrowLogsReceiverOnGRPCServer(ln net.Listener) *mockLogsReceiver { + rcv := &mockLogsReceiver{ + mockReceiver: mockReceiver{ + srv: grpc.NewServer(), + requestCount: &atomic.Int32{}, + totalItems: &atomic.Int32{}, + }, + exportResponse: plogotlp.NewExportResponse, + } + + // Now run it as a gRPC server + plogotlp.RegisterGRPCServer(rcv.srv, rcv) + go func() { + _ = rcv.srv.Serve(ln) + }() + + return rcv +} + +type mockMetricsReceiver struct { + pmetricotlp.UnimplementedGRPCServer + mockReceiver + exportResponse func() pmetricotlp.ExportResponse + lastRequest pmetric.Metrics +} + +func (r *mockMetricsReceiver) Export(ctx context.Context, req pmetricotlp.ExportRequest) (pmetricotlp.ExportResponse, error) { + md := req.Metrics() + r.requestCount.Add(int32(1)) + r.totalItems.Add(int32(md.DataPointCount())) + r.mux.Lock() + defer r.mux.Unlock() + r.lastRequest = md + r.metadata, _ = metadata.FromIncomingContext(ctx) + return r.exportResponse(), r.exportError +} + +func (r *mockMetricsReceiver) getLastRequest() pmetric.Metrics { + r.mux.Lock() + defer r.mux.Unlock() + return r.lastRequest +} + +func (r *mockMetricsReceiver) setExportResponse(fn func() pmetricotlp.ExportResponse) { + r.mux.Lock() + defer r.mux.Unlock() + r.exportResponse = fn +} + +func otelArrowMetricsReceiverOnGRPCServer(ln net.Listener) *mockMetricsReceiver { + rcv := &mockMetricsReceiver{ + mockReceiver: mockReceiver{ + srv: grpc.NewServer(), + requestCount: &atomic.Int32{}, + totalItems: &atomic.Int32{}, + }, + exportResponse: pmetricotlp.NewExportResponse, + } + + // Now run it as a gRPC server + pmetricotlp.RegisterGRPCServer(rcv.srv, rcv) + go func() { + _ = rcv.srv.Serve(ln) + }() + + return rcv +} + +type hostWithExtensions struct { + component.Host + exts map[component.ID]component.Component +} + +func newHostWithExtensions(exts map[component.ID]component.Component) component.Host { + return &hostWithExtensions{ + Host: componenttest.NewNopHost(), + exts: exts, + } +} + +func (h *hostWithExtensions) GetExtensions() map[component.ID]component.Component { + return h.exts +} + +type testAuthExtension struct { + extension.Extension + + prc credentials.PerRPCCredentials +} + +func newTestAuthExtension(t *testing.T, mdf func(ctx context.Context) map[string]string) auth.Client { + ctrl := gomock.NewController(t) + prc := grpcmock.NewMockPerRPCCredentials(ctrl) + prc.EXPECT().RequireTransportSecurity().AnyTimes().Return(false) + prc.EXPECT().GetRequestMetadata(gomock.Any(), gomock.Any()).AnyTimes().DoAndReturn( + func(ctx context.Context, _ ...string) (map[string]string, error) { + return mdf(ctx), nil + }, + ) + return &testAuthExtension{ + prc: prc, + } +} + +func (a *testAuthExtension) RoundTripper(_ http.RoundTripper) (http.RoundTripper, error) { + return nil, fmt.Errorf("unused") +} + +func (a *testAuthExtension) PerRPCCredentials() (credentials.PerRPCCredentials, error) { + return a.prc, nil +} + +func TestSendTraces(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.start() + // Also closes the connection. + defer rcv.srv.GracefulStop() + + // Start an OTLP exporter and point to the receiver. + factory := NewFactory() + authID := component.NewID(component.MustNewType("testauth")) + expectedHeader := []string{"header-value"} + + cfg := factory.CreateDefaultConfig().(*Config) + // Disable queuing to ensure that we execute the request when calling ConsumeTraces + // otherwise we will not see any errors. + cfg.QueueSettings.Enabled = false + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + Headers: map[string]configopaque.String{ + "header": configopaque.String(expectedHeader[0]), + }, + Auth: &configauth.Authentication{ + AuthenticatorID: authID, + }, + } + // This test fails w/ Arrow enabled because the function + // passed to newTestAuthExtension() below requires it the + // caller's context, and the newStream doesn't have it. + cfg.Arrow.Disabled = true + + set := exportertest.NewNopCreateSettings() + set.BuildInfo.Description = "Collector" + set.BuildInfo.Version = "1.2.3test" + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := newHostWithExtensions( + map[component.ID]component.Component{ + authID: newTestAuthExtension(t, func(ctx context.Context) map[string]string { + return map[string]string{ + "callerid": client.FromContext(ctx).Metadata.Get("in_callerid")[0], + } + }), + }, + ) + assert.NoError(t, exp.Start(context.Background(), host)) + + // Ensure that initially there is no data in the receiver. + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + newCallerContext := func(value string) context.Context { + return client.NewContext(context.Background(), + client.Info{ + Metadata: client.NewMetadata(map[string][]string{ + "in_callerid": {value}, + }), + }, + ) + } + const caller1 = "caller1" + const caller2 = "caller2" + callCtx1 := newCallerContext(caller1) + callCtx2 := newCallerContext(caller2) + + // Send empty trace. + td := ptrace.NewTraces() + assert.NoError(t, exp.ConsumeTraces(callCtx1, td)) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Ensure it was received empty. + assert.EqualValues(t, 0, rcv.totalItems.Load()) + md := rcv.getMetadata() + + // Expect caller1 and the static header + require.EqualValues(t, expectedHeader, md.Get("header")) + require.EqualValues(t, []string{caller1}, md.Get("callerid")) + + // A trace with 2 spans. + td = testdata.GenerateTraces(2) + + err = exp.ConsumeTraces(callCtx2, td) + assert.NoError(t, err) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 1 + }, 10*time.Second, 5*time.Millisecond) + + // Verify received span. + assert.EqualValues(t, 2, rcv.totalItems.Load()) + assert.EqualValues(t, 2, rcv.requestCount.Load()) + assert.EqualValues(t, td, rcv.getLastRequest()) + + // Test the static metadata + md = rcv.getMetadata() + require.EqualValues(t, expectedHeader, md.Get("header")) + require.Equal(t, len(md.Get("User-Agent")), 1) + require.Contains(t, md.Get("User-Agent")[0], "Collector/1.2.3test") + + // Test the caller's dynamic metadata + require.EqualValues(t, []string{caller2}, md.Get("callerid")) + + // Return partial success + rcv.setExportResponse(func() ptraceotlp.ExportResponse { + response := ptraceotlp.NewExportResponse() + partialSuccess := response.PartialSuccess() + partialSuccess.SetErrorMessage("Some spans were not ingested") + partialSuccess.SetRejectedSpans(1) + + return response + }) + + // A request with 2 Trace entries. + td = testdata.GenerateTraces(2) + + // PartialSuccess is not an error. + err = exp.ConsumeTraces(callCtx1, td) + assert.NoError(t, err) +} + +func TestSendTracesWhenEndpointHasHttpScheme(t *testing.T) { + tests := []struct { + name string + useTLS bool + scheme string + gRPCClientSettings configgrpc.ClientConfig + }{ + { + name: "Use https scheme", + useTLS: true, + scheme: "https://", + gRPCClientSettings: configgrpc.ClientConfig{}, + }, + { + name: "Use http scheme", + useTLS: false, + scheme: "http://", + gRPCClientSettings: configgrpc.ClientConfig{ + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + }, + }, + } + + for _, test := range tests { + t.Run(test.name, func(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + rcv, err := otelArrowTracesReceiverOnGRPCServer(ln, test.useTLS) + rcv.start() + require.NoError(t, err, "Failed to start mock OTLP receiver") + // Also closes the connection. + defer rcv.srv.GracefulStop() + + // Start an OTLP exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + cfg.ClientConfig = test.gRPCClientSettings + cfg.ClientConfig.Endpoint = test.scheme + ln.Addr().String() + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + if test.useTLS { + cfg.ClientConfig.TLSSetting.InsecureSkipVerify = true + } + set := exportertest.NewNopCreateSettings() + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + assert.NoError(t, exp.Start(context.Background(), host)) + + // Ensure that initially there is no data in the receiver. + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + // Send empty trace. + td := ptrace.NewTraces() + assert.NoError(t, exp.ConsumeTraces(context.Background(), td)) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Ensure it was received empty. + assert.EqualValues(t, 0, rcv.totalItems.Load()) + }) + } +} + +func TestSendMetrics(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + rcv := otelArrowMetricsReceiverOnGRPCServer(ln) + // Also closes the connection. + defer rcv.srv.GracefulStop() + + // Start an OTLP exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + // Disable queuing to ensure that we execute the request when calling ConsumeMetrics + // otherwise we will not see any errors. + cfg.QueueSettings.Enabled = false + cfg.RetryConfig.Enabled = false + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + Headers: map[string]configopaque.String{ + "header": "header-value", + }, + } + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + set := exportertest.NewNopCreateSettings() + set.BuildInfo.Description = "Collector" + set.BuildInfo.Version = "1.2.3test" + exp, err := factory.CreateMetricsExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + + assert.NoError(t, exp.Start(context.Background(), host)) + + // Ensure that initially there is no data in the receiver. + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + // Send empty metric. + md := pmetric.NewMetrics() + assert.NoError(t, exp.ConsumeMetrics(context.Background(), md)) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Ensure it was received empty. + assert.EqualValues(t, 0, rcv.totalItems.Load()) + + // Send two metrics. + md = testdata.GenerateMetrics(2) + + err = exp.ConsumeMetrics(context.Background(), md) + assert.NoError(t, err) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 1 + }, 10*time.Second, 5*time.Millisecond) + + expectedHeader := []string{"header-value"} + + // Verify received metrics. + assert.EqualValues(t, uint32(2), rcv.requestCount.Load()) + assert.EqualValues(t, uint32(4), rcv.totalItems.Load()) + assert.EqualValues(t, md, rcv.getLastRequest()) + + mdata := rcv.getMetadata() + require.EqualValues(t, mdata.Get("header"), expectedHeader) + require.Equal(t, len(mdata.Get("User-Agent")), 1) + require.Contains(t, mdata.Get("User-Agent")[0], "Collector/1.2.3test") + + st := status.New(codes.InvalidArgument, "Invalid argument") + rcv.setExportError(st.Err()) + + // Send two metrics.. + md = testdata.GenerateMetrics(2) + + err = exp.ConsumeMetrics(context.Background(), md) + assert.Error(t, err) + + rcv.setExportError(nil) + + // Return partial success + rcv.setExportResponse(func() pmetricotlp.ExportResponse { + response := pmetricotlp.NewExportResponse() + partialSuccess := response.PartialSuccess() + partialSuccess.SetErrorMessage("Some data points were not ingested") + partialSuccess.SetRejectedDataPoints(1) + + return response + }) + + // Send two metrics. + md = testdata.GenerateMetrics(2) + assert.NoError(t, exp.ConsumeMetrics(context.Background(), md)) +} + +func TestSendTraceDataServerDownAndUp(t *testing.T) { + // Find the addr, but don't start the server. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + // Disable queuing to ensure that we execute the request when calling ConsumeTraces + // otherwise we will not see the error. + cfg.QueueSettings.Enabled = false + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + // Need to wait for every request blocking until either request timeouts or succeed. + // Do not rely on external retry logic here, if that is intended set InitialInterval to 100ms. + WaitForReady: true, + } + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + set := exportertest.NewNopCreateSettings() + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + + assert.NoError(t, exp.Start(context.Background(), host)) + + // A trace with 2 spans. + td := testdata.GenerateTraces(2) + ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second) + assert.Error(t, exp.ConsumeTraces(ctx, td)) + assert.EqualValues(t, context.DeadlineExceeded, ctx.Err()) + cancel() + + ctx, cancel = context.WithTimeout(context.Background(), 1*time.Second) + assert.Error(t, exp.ConsumeTraces(ctx, td)) + assert.EqualValues(t, context.DeadlineExceeded, ctx.Err()) + cancel() + + startServerAndMakeRequest(t, exp, td, ln) + + ctx, cancel = context.WithTimeout(context.Background(), 1*time.Second) + assert.Error(t, exp.ConsumeTraces(ctx, td)) + assert.EqualValues(t, context.DeadlineExceeded, ctx.Err()) + cancel() + + // First call to startServerAndMakeRequest closed the connection. There is a race condition here that the + // port may be reused, if this gets flaky rethink what to do. + ln, err = net.Listen("tcp", ln.Addr().String()) + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + startServerAndMakeRequest(t, exp, td, ln) + + ctx, cancel = context.WithTimeout(context.Background(), 1*time.Second) + assert.Error(t, exp.ConsumeTraces(ctx, td)) + assert.EqualValues(t, context.DeadlineExceeded, ctx.Err()) + cancel() +} + +func TestSendTraceDataServerStartWhileRequest(t *testing.T) { + // Find the addr, but don't start the server. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + } + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + set := exportertest.NewNopCreateSettings() + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + + assert.NoError(t, exp.Start(context.Background(), host)) + + // A trace with 2 spans. + td := testdata.GenerateTraces(2) + done := make(chan bool, 1) + defer close(done) + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + go func() { + assert.NoError(t, exp.ConsumeTraces(ctx, td)) + done <- true + }() + + time.Sleep(2 * time.Second) + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.start() + defer rcv.srv.GracefulStop() + // Wait until one of the conditions below triggers. + select { + case <-ctx.Done(): + t.Fail() + case <-done: + assert.NoError(t, ctx.Err()) + } + cancel() +} + +func TestSendTracesOnResourceExhaustion(t *testing.T) { + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err) + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.setExportError(status.Error(codes.ResourceExhausted, "resource exhausted")) + rcv.start() + defer rcv.srv.GracefulStop() + + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + cfg.RetryConfig.InitialInterval = 0 + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + } + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + set := exportertest.NewNopCreateSettings() + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + assert.NoError(t, exp.Start(context.Background(), host)) + + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + td := ptrace.NewTraces() + assert.NoError(t, exp.ConsumeTraces(context.Background(), td)) + + assert.Never(t, func() bool { + return rcv.requestCount.Load() > 1 + }, 1*time.Second, 5*time.Millisecond, "Should not retry if RetryInfo is not included into status details by the server.") + + rcv.requestCount.Swap(0) + + st := status.New(codes.ResourceExhausted, "resource exhausted") + st, _ = st.WithDetails(&errdetails.RetryInfo{ + RetryDelay: durationpb.New(100 * time.Millisecond), + }) + rcv.setExportError(st.Err()) + + assert.NoError(t, exp.ConsumeTraces(context.Background(), td)) + + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 1 + }, 10*time.Second, 5*time.Millisecond, "Should retry if RetryInfo is included into status details by the server.") +} + +func startServerAndMakeRequest(t *testing.T, exp exporter.Traces, td ptrace.Traces, ln net.Listener) { + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.start() + defer rcv.srv.GracefulStop() + // Ensure that initially there is no data in the receiver. + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + // Clone the request and store as expected. + expectedData := ptrace.NewTraces() + td.CopyTo(expectedData) + + // Resend the request, this should succeed. + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + assert.NoError(t, exp.ConsumeTraces(ctx, td)) + cancel() + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Verify received span. + assert.EqualValues(t, 2, rcv.totalItems.Load()) + assert.EqualValues(t, expectedData, rcv.getLastRequest()) +} + +func TestSendLogData(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "localhost:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + rcv := otelArrowLogsReceiverOnGRPCServer(ln) + // Also closes the connection. + defer rcv.srv.GracefulStop() + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + // Disable queuing to ensure that we execute the request when calling ConsumeLogs + // otherwise we will not see any errors. + cfg.QueueSettings.Enabled = false + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + } + cfg.Arrow.MaxStreamLifetime = 100 * time.Second + set := exportertest.NewNopCreateSettings() + set.BuildInfo.Description = "Collector" + set.BuildInfo.Version = "1.2.3test" + exp, err := factory.CreateLogsExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + + assert.NoError(t, exp.Start(context.Background(), host)) + + // Ensure that initially there is no data in the receiver. + assert.EqualValues(t, 0, rcv.requestCount.Load()) + + // Send empty request. + ld := plog.NewLogs() + assert.NoError(t, exp.ConsumeLogs(context.Background(), ld)) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Ensure it was received empty. + assert.EqualValues(t, 0, rcv.totalItems.Load()) + + // A request with 2 log entries. + ld = testdata.GenerateLogs(2) + + err = exp.ConsumeLogs(context.Background(), ld) + assert.NoError(t, err) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 1 + }, 10*time.Second, 5*time.Millisecond) + + // Verify received logs. + assert.EqualValues(t, 2, rcv.requestCount.Load()) + assert.EqualValues(t, 2, rcv.totalItems.Load()) + assert.EqualValues(t, ld, rcv.getLastRequest()) + + md := rcv.getMetadata() + require.Equal(t, len(md.Get("User-Agent")), 1) + require.Contains(t, md.Get("User-Agent")[0], "Collector/1.2.3test") + + st := status.New(codes.InvalidArgument, "Invalid argument") + rcv.setExportError(st.Err()) + + // A request with 2 log entries. + ld = testdata.GenerateLogs(2) + + err = exp.ConsumeLogs(context.Background(), ld) + assert.Error(t, err) + + rcv.setExportError(nil) + + // Return partial success + rcv.setExportResponse(func() plogotlp.ExportResponse { + response := plogotlp.NewExportResponse() + partialSuccess := response.PartialSuccess() + partialSuccess.SetErrorMessage("Some log records were not ingested") + partialSuccess.SetRejectedLogRecords(1) + + return response + }) + + // A request with 2 log entries. + ld = testdata.GenerateLogs(2) + + err = exp.ConsumeLogs(context.Background(), ld) + assert.NoError(t, err) +} + +// TestSendArrowTracesNotSupported tests a successful OTel-Arrow export w/ +// and without Arrow, w/ WaitForReady and without. +func TestSendArrowTracesNotSupported(t *testing.T) { + for _, waitForReady := range []bool{true, false} { + for _, available := range []bool{true, false} { + t.Run(fmt.Sprintf("waitForReady=%v available=%v", waitForReady, available), + func(t *testing.T) { testSendArrowTraces(t, waitForReady, available) }) + } + } +} + +func testSendArrowTraces(t *testing.T, clientWaitForReady, streamServiceAvailable bool) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "127.0.0.1:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + authID := component.NewID(component.MustNewType("testauth")) + expectedHeader := []string{"arrow-ftw"} + cfg := factory.CreateDefaultConfig().(*Config) + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + WaitForReady: clientWaitForReady, + Headers: map[string]configopaque.String{ + "header": configopaque.String(expectedHeader[0]), + }, + Auth: &configauth.Authentication{ + AuthenticatorID: authID, + }, + } + // Arrow client is enabled, but the server doesn't support it. + cfg.Arrow = ArrowConfig{ + NumStreams: 1, + MaxStreamLifetime: 100 * time.Second, + } + + set := exportertest.NewNopCreateSettings() + set.TelemetrySettings.Logger = zaptest.NewLogger(t) + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + type isUserCall struct{} + + host := newHostWithExtensions( + map[component.ID]component.Component{ + authID: newTestAuthExtension(t, func(ctx context.Context) map[string]string { + if ctx.Value(isUserCall{}) == nil { + return nil + } + return map[string]string{ + "callerid": "arrow", + } + }), + }, + ) + assert.NoError(t, exp.Start(context.Background(), host)) + + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + if streamServiceAvailable { + rcv.startStreamMockArrowTraces(t, okStatusFor) + } + + // Delay the server start, slightly. + go func() { + time.Sleep(100 * time.Millisecond) + rcv.start() + }() + + // Send two trace items. + td := testdata.GenerateTraces(2) + + // Set the context key indicating this is per-request state, + // so the auth extension returns data. + err = exp.ConsumeTraces(context.WithValue(context.Background(), isUserCall{}, true), td) + assert.NoError(t, err) + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Verify two items, one request received. + assert.EqualValues(t, int32(2), rcv.totalItems.Load()) + assert.EqualValues(t, int32(1), rcv.requestCount.Load()) + assert.EqualValues(t, td, rcv.getLastRequest()) + + // Expect the correct metadata, with or without arrow. + md := rcv.getMetadata() + require.EqualValues(t, []string{"arrow"}, md.Get("callerid")) + require.EqualValues(t, expectedHeader, md.Get("header")) + + rcv.srv.GracefulStop() +} + +func okStatusFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: arrowpb.StatusCode_OK, + } +} + +func failedStatusFor(id int64) *arrowpb.BatchStatus { + return &arrowpb.BatchStatus{ + BatchId: id, + StatusCode: arrowpb.StatusCode_INVALID_ARGUMENT, + StatusMessage: "test failed", + } +} + +type anyStreamServer interface { + Send(*arrowpb.BatchStatus) error + Recv() (*arrowpb.BatchArrowRecords, error) + grpc.ServerStream +} + +func (r *mockTracesReceiver) startStreamMockArrowTraces(t *testing.T, statusFor func(int64) *arrowpb.BatchStatus) { + ctrl := gomock.NewController(t) + + doer := func(server anyStreamServer) error { + consumer := arrowRecord.NewConsumer() + var hdrs []hpack.HeaderField + hdrsDecoder := hpack.NewDecoder(4096, func(hdr hpack.HeaderField) { + hdrs = append(hdrs, hdr) + }) + for { + records, err := server.Recv() + if status, ok := status.FromError(err); ok && status.Code() == codes.Canceled { + break + } + if err != nil { + // No errors are allowed, except EOF. + require.Equal(t, io.EOF, err) + break + } + + got, err := consumer.TracesFrom(records) + require.NoError(t, err) + + // Reset and parse headers + hdrs = nil + _, err = hdrsDecoder.Write(records.Headers) + require.NoError(t, err) + md, ok := metadata.FromIncomingContext(server.Context()) + require.True(t, ok) + + for _, hf := range hdrs { + md[hf.Name] = append(md[hf.Name], hf.Value) + } + + // Place the metadata into the context, where + // the test framework (independent of Arrow) + // receives it. + ctx := metadata.NewIncomingContext(context.Background(), md) + + for _, traces := range got { + _, err := r.Export(ctx, ptraceotlp.NewExportRequestFromTraces(traces)) + require.NoError(t, err) + } + require.NoError(t, server.Send(statusFor(records.BatchId))) + } + return nil + } + + type singleBinding struct { + arrowpb.UnsafeArrowTracesServiceServer + *arrowpbMock.MockArrowTracesServiceServer + } + svc := arrowpbMock.NewMockArrowTracesServiceServer(ctrl) + + arrowpb.RegisterArrowTracesServiceServer(r.srv, singleBinding{ + MockArrowTracesServiceServer: svc, + }) + svc.EXPECT().ArrowTraces(gomock.Any()).Times(1).DoAndReturn(doer) + +} + +func TestSendArrowFailedTraces(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "127.0.0.1:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + WaitForReady: true, + } + // Arrow client is enabled, but the server doesn't support it. + cfg.Arrow = ArrowConfig{ + NumStreams: 1, + MaxStreamLifetime: 100 * time.Second, + } + cfg.QueueSettings.Enabled = false + + set := exportertest.NewNopCreateSettings() + set.TelemetrySettings.Logger = zaptest.NewLogger(t) + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + assert.NoError(t, exp.Start(context.Background(), host)) + + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.startStreamMockArrowTraces(t, failedStatusFor) + + // Delay the server start, slightly. + go func() { + time.Sleep(100 * time.Millisecond) + rcv.start() + }() + + // Send two trace items. + td := testdata.GenerateTraces(2) + err = exp.ConsumeTraces(context.Background(), td) + assert.Error(t, err) + assert.Contains(t, err.Error(), "test failed") + + // Wait until it is received. + assert.Eventually(t, func() bool { + return rcv.requestCount.Load() > 0 + }, 10*time.Second, 5*time.Millisecond) + + // Verify two items, one request received. + assert.EqualValues(t, int32(2), rcv.totalItems.Load()) + assert.EqualValues(t, int32(1), rcv.requestCount.Load()) + assert.EqualValues(t, td, rcv.getLastRequest()) + + rcv.srv.GracefulStop() +} + +func TestUserDialOptions(t *testing.T) { + // Start an OTel-Arrow receiver. + ln, err := net.Listen("tcp", "127.0.0.1:") + require.NoError(t, err, "Failed to find an available address to run the gRPC server: %v", err) + + // Start an OTel-Arrow exporter and point to the receiver. + factory := NewFactory() + cfg := factory.CreateDefaultConfig().(*Config) + cfg.ClientConfig = configgrpc.ClientConfig{ + Endpoint: ln.Addr().String(), + TLSSetting: configtls.ClientConfig{ + Insecure: true, + }, + WaitForReady: true, + } + cfg.Arrow.Disabled = true + cfg.QueueSettings.Enabled = false + + const testAgent = "test-user-agent (release=:+1:)" + + // This overrides the default provided in otelArrow.go + cfg.UserDialOptions = []grpc.DialOption{ + grpc.WithUserAgent(testAgent), + } + + set := exportertest.NewNopCreateSettings() + set.TelemetrySettings.Logger = zaptest.NewLogger(t) + exp, err := factory.CreateTracesExporter(context.Background(), set, cfg) + require.NoError(t, err) + require.NotNil(t, exp) + + defer func() { + assert.NoError(t, exp.Shutdown(context.Background())) + }() + + host := componenttest.NewNopHost() + assert.NoError(t, exp.Start(context.Background(), host)) + + td := testdata.GenerateTraces(2) + + rcv, _ := otelArrowTracesReceiverOnGRPCServer(ln, false) + rcv.start() + defer rcv.srv.GracefulStop() + + err = exp.ConsumeTraces(context.Background(), td) + assert.NoError(t, err) + + require.Equal(t, len(rcv.getMetadata().Get("User-Agent")), 1) + require.Contains(t, rcv.getMetadata().Get("User-Agent")[0], testAgent) +} diff --git a/exporter/otelarrowexporter/testdata/config.yaml b/exporter/otelarrowexporter/testdata/config.yaml index 46134951f462..db9e8016ce0e 100644 --- a/exporter/otelarrowexporter/testdata/config.yaml +++ b/exporter/otelarrowexporter/testdata/config.yaml @@ -30,3 +30,4 @@ arrow: disabled: false max_stream_lifetime: 2h payload_compression: "zstd" + prioritizer: leastloaded8 From a133a8efefbe34dd45d8d4c8473ebbd75f4bdcc3 Mon Sep 17 00:00:00 2001 From: Dominik Rosiek <58699848+sumo-drosiek@users.noreply.github.com> Date: Mon, 13 May 2024 10:05:36 +0200 Subject: [PATCH 2/2] [exporter/sumologic] change logs behavior (#32939) **Description:** * set OTLP as default format * add support for OTLP format * do not support metadata attributes * do not support source headers **Link to tracking Issue:** #32315 **Testing:** * unit tests **Documentation:** * inline comments * readme --------- Signed-off-by: Dominik Rosiek --- .chloggen/drosiek-exporter-logs.yaml | 31 + exporter/sumologicexporter/README.md | 43 +- exporter/sumologicexporter/compress.go | 77 -- exporter/sumologicexporter/compress_test.go | 142 --- exporter/sumologicexporter/config.go | 36 +- exporter/sumologicexporter/config_test.go | 99 +- exporter/sumologicexporter/exporter.go | 113 +-- exporter/sumologicexporter/exporter_test.go | 275 ++++-- exporter/sumologicexporter/factory.go | 2 - exporter/sumologicexporter/factory_test.go | 13 +- exporter/sumologicexporter/go.mod | 2 +- exporter/sumologicexporter/sender.go | 205 +++-- exporter/sumologicexporter/sender_test.go | 941 ++++++++++++++------ 13 files changed, 1117 insertions(+), 862 deletions(-) create mode 100644 .chloggen/drosiek-exporter-logs.yaml delete mode 100644 exporter/sumologicexporter/compress.go delete mode 100644 exporter/sumologicexporter/compress_test.go diff --git a/.chloggen/drosiek-exporter-logs.yaml b/.chloggen/drosiek-exporter-logs.yaml new file mode 100644 index 000000000000..c0181af0e02d --- /dev/null +++ b/.chloggen/drosiek-exporter-logs.yaml @@ -0,0 +1,31 @@ +# Use this changelog template to create an entry for release notes. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: breaking + +# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) +component: sumologicexporter + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: change logs behavior + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +issues: [31479] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: | + * set OTLP as default format + * add support for OTLP format + * do not support metadata attributes + * do not support source headers + +# If your change doesn't affect end users or the exported elements of any package, +# you should instead start your pull request title with [chore] or use the "Skip Changelog" label. +# Optional: The change log or logs in which this entry should be included. +# e.g. '[user]' or '[user, api]' +# Include 'user' if the change is relevant to end users. +# Include 'api' if there is a change to a library API. +# Default: '[user]' +change_logs: [user] diff --git a/exporter/sumologicexporter/README.md b/exporter/sumologicexporter/README.md index c678df2c3213..39267e26e61e 100644 --- a/exporter/sumologicexporter/README.md +++ b/exporter/sumologicexporter/README.md @@ -18,7 +18,7 @@ For some time we have been developing the [new Sumo Logic exporter](https://github.com/SumoLogic/sumologic-otel-collector/tree/main/pkg/exporter/sumologicexporter#sumo-logic-exporter) and now we are in the process of moving it into this repository. -The following options are deprecated for logs and already do not work for metrics: +The following options are no longer supported: - `metric_format: {carbon2, graphite}` - `metadata_attributes: []` @@ -30,7 +30,7 @@ The following options are deprecated for logs and already do not work for metric After the new exporter will be moved to this repository: - `carbon2` and `graphite` are no longer supported and `prometheus` or `otlp` format should be used -- all resource level attributes are going to be treated (are treated for metrics) as `metadata_attributes`. You can use [Group by Attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor) to move attributes from record level to resource level. For example: +- all resource level attributes are treated as `metadata_attributes` so this option is no longer supported. You can use [Group by Attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor) to move attributes from record level to resource level. For example: ```yaml # before switch to new collector @@ -45,7 +45,7 @@ After the new exporter will be moved to this repository: - my_attribute ``` -- Source templates (`source_category`, `source_name` and `source_host`) are going to be removed from the exporter and sources may be set using `_sourceCategory`, `sourceName` or `_sourceHost` resource attributes. This feature has been already disabled for metrics. We recommend to use [Transform Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor/). For example: +- Source templates (`source_category`, `source_name` and `source_host`) are no longer supported. We recommend to use [Transform Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor/). For example: ```yaml # before switch to new collector @@ -88,12 +88,12 @@ exporters: # List of regexes for attributes which should be send as metadata # default = [] # - # This option is deprecated: + # This option is unsupported: # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sumologicexporter#migration-to-new-architecture metadata_attributes: [] - # format to use when sending logs to Sumo Logic, default = json, - log_format: {json, text} + # format to use when sending logs to Sumo Logic, default = otlp, + log_format: {otlp, json, text} # format to use when sending metrics to Sumo Logic, default = otlp, # NOTE: only `otlp` is supported when used with sumologicextension @@ -112,7 +112,7 @@ exporters: # Please regfer to Source temmplates for formatting explanation: # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sumologicexporter#source-templates # - # This option is deprecated: + # This option is unsupported: # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sumologicexporter#migration-to-new-architecture graphite_template: