Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry Protocol with Apache Arrow Receiver initial skeleton #30766

Merged
merged 37 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
cd217a1
empty package
jmacd Jan 23, 2024
55b5117
skeleton code & lint
jmacd Jan 24, 2024
3e8ad65
lint
jmacd Jan 24, 2024
edbbe3c
revert one file
jmacd Jan 24, 2024
89c94a5
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 25, 2024
b372a1c
lint
jmacd Jan 25, 2024
3d045ca
version update
jmacd Jan 25, 2024
084ad4e
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 26, 2024
8089387
version update
jmacd Jan 26, 2024
f7fbcce
this is stupid
jmacd Jan 26, 2024
40a2201
make failed me, do this manually
jmacd Jan 26, 2024
3af8198
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 29, 2024
cca45f2
[chore] multimod update stable modules
jmacd Jan 29, 2024
9d61be7
[chore] multimod update beta modules
jmacd Jan 29, 2024
b8efa66
i.d.k. make update-otel
jmacd Jan 29, 2024
baca4f4
goporto
jmacd Jan 29, 2024
9487d45
again
jmacd Jan 29, 2024
32f6f3a
manual
jmacd Jan 29, 2024
23b9ade
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 30, 2024
420ebb4
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 30, 2024
85e3b76
manual
jmacd Jan 30, 2024
6e42111
one sum fix
jmacd Jan 30, 2024
22bd384
try again
jmacd Jan 30, 2024
7b1a2e9
again
jmacd Jan 30, 2024
a97992c
gendist
jmacd Jan 30, 2024
67942a8
versions.yaml
jmacd Jan 30, 2024
044bf4c
oh
jmacd Jan 31, 2024
d0c306f
fix mod
jmacd Jan 31, 2024
60ecbb6
porto-gci
jmacd Jan 31, 2024
cfa836f
repackage
jmacd Jan 31, 2024
ea166ae
gci
jmacd Jan 31, 2024
ca9ee88
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 31, 2024
fce4631
use this repo's sharedcomponent
jmacd Jan 31, 2024
ab60883
use metadata; crosslink
jmacd Jan 31, 2024
86e7771
tidy
jmacd Jan 31, 2024
6a55c31
grrr
jmacd Jan 31, 2024
6a0acc3
remove func
jmacd Jan 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .chloggen/new-otelarrow-receiver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
change_type: new_component
component: otelarrow
note: Skeleton of new OpenTelemetry Protocol with Apache Arrow Receiver
issues: [26491]
subtext:
change_logs: [user]
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ receiver/nsxtreceiver/ @open-telemetry/collect
receiver/opencensusreceiver/ @open-telemetry/collector-contrib-approvers @open-telemetry/collector-approvers
receiver/oracledbreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax @crobert-1 @atoulme
receiver/osqueryreceiver/ @open-telemetry/collector-contrib-approvers @codeboten @nslaughter @smithclay
receiver/otelarrowreceiver/ @open-telemetry/collector-contrib-approvers @jmacd @moh-osman3
receiver/otlpjsonfilereceiver/ @open-telemetry/collector-contrib-approvers @djaglowski @atoulme
receiver/podmanreceiver/ @open-telemetry/collector-contrib-approvers @rogercoll
receiver/postgresqlreceiver/ @open-telemetry/collector-contrib-approvers @djaglowski
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ body:
- receiver/opencensus
- receiver/oracledb
- receiver/osquery
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ body:
- receiver/opencensus
- receiver/oracledb
- receiver/osquery
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/other.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ body:
- receiver/opencensus
- receiver/oracledb
- receiver/osquery
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
1 change: 1 addition & 0 deletions receiver/otelarrowreceiver/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
198 changes: 198 additions & 0 deletions receiver/otelarrowreceiver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# OpenTelemetry Protocol with Apache Arrow Receiver

<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Stability | [development]: traces, metrics, logs |
| Distributions | [contrib] |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Areceiver%2Fotelarrow%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Areceiver%2Fotelarrow) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Areceiver%2Fotelarrow%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Areceiver%2Fotelarrow) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@jmacd](https://www.github.com/jmacd), [@moh-osman3](https://www.github.com/moh-osman3) |

[development]: https://github.com/open-telemetry/opentelemetry-collector#development
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
<!-- end autogenerated section -->

Receives telemetry data using [OpenTelemetry Protocol with Apache
Arrow](https://github.com/open-telemetry/otel-arrow) and standard
[OTLP](
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md)
protocol via gRPC.

## Getting Started

The OpenTelemetry Protocol with Apache Arrow receiver is an extension
of the core OpenTelemetry Collector [OTLP
receiver](https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver)
component with additional support for [OpenTelemetry Protocol with
Apache Arrow](https://github.com/open-telemetry/otel-arrow).

OpenTelemetry Protocol with Apache Arrow supports column-oriented data
transport using the Apache Arrow data format. The [OpenTelemetry
Protocol with Apache Arrow
exporter](../../exporter/otelarrowexporter/README.md)
converts OTLP data into an optimized representation and then sends
batches of data using Apache Arrow to encode the stream. This
component contains logic to reverse the process used in the
OpenTelemetry Protocol with Apache Arrow exporter.

The use of an OpenTelemetry Protocol with Apache Arrow
exporter-receiver pair is recommended when the network is expensive.
Typically, expect to see a 50% reduction in bandwidth compared with
the same data being sent using standard OTLP/gRPC and gzip
compression.

This component includes all the features and configuration of the core
OTLP receiver, making it possible to upgrade from the core component
simply by replacing "otlp" with "otelarrow" as the component name in
the collector configuration.

To enable the OpenTelemetry Protocol with Apache Arrow receiver,
include it in the list of receivers for a pipeline. No further
configuration is needed. This receiver listens on the standard
OTLP/gRPC port 4317 and serves standard OTLP over gRPC out of the box.

```yaml
receivers:
otelarrow:
```

## Advanced Configuration

Users may wish to configure gRPC settings, for example:

```
receivers:
otelarrow:
protocols:
grpc:
...
```

- `endpoint` (default = 0.0.0.0:4317 for grpc protocol):
host:port to which the receiver is going to receive data. The valid syntax is
described at https://github.com/grpc/grpc/blob/master/doc/naming.md.

Several common configuration structures provide additional capabilities automatically:

- [gRPC settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configgrpc/README.md)
- [TLS and mTLS settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configtls/README.md)

### Arrow-specific Configuration

In the `arrow` configuration block, the following settings are available:

- `memory_limit_mib` (default: 128): limits the amount of concurrent memory used by Arrow data buffers.

When the limit is reached, the receiver will return RESOURCE_EXHAUSTED
error codes to the receiver, which are [conditionally retryable, see
exporter retry configuration](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md).

### Compression Configuration

In the `arrow` configuration block, `zstd` sub-section applies to all
compression levels used by exporters:

- `memory_limit_mib` limits memory dedicated to Zstd decompression, per stream (default 128)
- `max_window_size_mib`: maximum size of the Zstd window in MiB, 0 indicates to determine based on level (default 32)
- `concurrency`: controls background CPU used for decompression, 0 indicates to let `zstd` library decide (default 1)

### Keepalive configuration

As a gRPC streaming service, the OTel Arrow receiver is able to limit
stream lifetime through configuration of the underlying http/2
connection via keepalive settings.

Keepalive settings are vital to the operation of OTel Arrow, because
longer-lived streams use more memory and streams are fixed to a single
host. Since every stream of data is different, we recommend
experimenting to find a good balance between memory usage, stream
lifetime, and load balance.

gRPC libraries do not build-in a facility for long-lived RPCs to learn
about impending http/2 connection state changes, including the event
that initiates connection reset. While the receiver knows its own
keepalive settings, a shorter maximum connection lifetime can be
imposed by intermediate http/2 proxies, and therefore the receiver and
exporter are expected to independently configure these limits.

```
receivers:
otelarrow:
protocols:
grpc:
keepalive:
server_parameters:
max_connection_age: 1m
max_connection_age_grace: 10m
```

In the example configuration above, OpenTelemetry Protocol with Apache
Arrow streams will have reset initiated after 10 minutes. Note that
`max_connection_age` is set to a small value and we recommend tuning
`max_connection_age_grace`.

OTel Arrow exporters are expected to configure their
`max_stream_lifetime` property to a value that is slightly smaller
than the receiver's `max_connection_age_grace` setting, which causes
the exporter to cleanly shut down streams, allowing requests to
complete before the http/2 connection is forcibly closed. While the
exporter will retry data that was in-flight during an unexpected
stream shutdown, instrumentation about the telemety pipeline will show
RPC errors when the exporter's `max_stream_lifetime` is not configured
correctly.

[See the exporter README for more
guidance](../../exporter/otelarrowexporter/README.md). For the
example where `max_connection_age_grace` is set to 10 minutes, the
exporter's `max_stream_lifetime` should be set to the same number
minus a reasonable timeout to allow in-flight requests to complete.
For example, an exporter with `9m30s` stream lifetime:

```
exporters:
otelarrow:
timeout: 30s
arrow:
max_stream_lifetime: 9m30s
endpoint: ...
tls: ...
```

### Receiver metrics

In addition to the the standard
[obsreport](https://pkg.go.dev/go.opentelemetry.io/collector/obsreport)
metrics, this component provides network-level measurement instruments
which we anticipate will become part of `obsreport` in the future. At
the `normal` level of metrics detail:

- `receiver_recv`: uncompressed bytes received, prior to compression
- `receiver_recv_wire`: compressed bytes received, on the wire.

Arrow's compression performance can be derived by dividing the average
`receiver_recv` value by the average `receiver_recv_wire` value.

At the `detailed` metrics detail level, information about the stream
of data being returned from the receiver will be instrumented:

- `receiver_sent`: uncompressed bytes sent, prior to compression
- `receiver_sent_wire`: compressed bytes sent, on the wire.

There several OpenTelemetry Protocol with Apache Arrow-consumer
related metrics available to help diagnose internal performance.
These are disabled at the basic level of detail. At the normal level,
these metrics are introduced:

- `arrow_batch_records`: Counter of Arrow-IPC records processed
- `arrow_memory_inuse`: UpDownCounter of memory in use by current streams
- `arrow_schema_resets`: Counter of times the schema was adjusted, by data type.

```
service
...
telemetry:
...
metrics:
...
level: detailed
```
52 changes: 52 additions & 0 deletions receiver/otelarrowreceiver/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package otelarrowreceiver // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/otelarrowreceiver"

import (
"fmt"

"github.com/open-telemetry/otel-arrow/collector/compression/zstd"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/config/configgrpc"
)

// Protocols is the configuration for the supported protocols.
type Protocols struct {
GRPC configgrpc.GRPCServerSettings `mapstructure:"grpc"`
Arrow ArrowSettings `mapstructure:"arrow"`
}

// ArrowSettings support configuring the Arrow receiver.
type ArrowSettings struct {
// MemoryLimitMiB is the size of a shared memory region used
// by all Arrow streams, in MiB. When too much load is
// passing through, they will see ResourceExhausted errors.
MemoryLimitMiB uint64 `mapstructure:"memory_limit_mib"`

// Zstd settings apply to OTel-Arrow use of gRPC specifically.
Zstd zstd.DecoderConfig `mapstructure:"zstd"`
}

// Config defines configuration for OTel Arrow receiver.
type Config struct {
// Protocols is the configuration for gRPC and Arrow.
Protocols `mapstructure:"protocols"`
}

var _ component.Config = (*Config)(nil)

// Validate checks the receiver configuration is valid
func (cfg *Config) Validate() error {
if err := cfg.Arrow.Validate(); err != nil {
return err
}
return nil
}

func (cfg *ArrowSettings) Validate() error {
if err := cfg.Zstd.Validate(); err != nil {
return fmt.Errorf("zstd decoder: invalid configuration: %w", err)
}
return nil
}
Loading
Loading