Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the otelarrow receiver scaffold #26519

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/otelarrow-receiver-scaffold.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: otelarrowreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: The OTel Arrow receiver receives telemetry data using the OTel-Arrow protocol via gRPC and standard OTLP protocol via gRPC or HTTP.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [26491]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ receiver/nginxreceiver/ @open-telemetry/collect
receiver/nsxtreceiver/ @open-telemetry/collector-contrib-approvers @dashpole @schmikei
receiver/opencensusreceiver/ @open-telemetry/collector-contrib-approvers @open-telemetry/collector-approvers
receiver/oracledbreceiver/ @open-telemetry/collector-contrib-approvers @dmitryax @crobert-1 @atoulme
receiver/otelarrowreceiver/ @open-telemetry/collector-contrib-approvers @jmacd @moh-osman3
jmacd marked this conversation as resolved.
Show resolved Hide resolved
receiver/otlpjsonfilereceiver/ @open-telemetry/collector-contrib-approvers @djaglowski @atoulme
receiver/podmanreceiver/ @open-telemetry/collector-contrib-approvers @rogercoll
receiver/postgresqlreceiver/ @open-telemetry/collector-contrib-approvers @djaglowski
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,7 @@ body:
- receiver/nsxt
- receiver/opencensus
- receiver/oracledb
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ body:
- receiver/nsxt
- receiver/opencensus
- receiver/oracledb
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/other.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ body:
- receiver/nsxt
- receiver/opencensus
- receiver/oracledb
- receiver/otelarrow
- receiver/otlpjsonfile
- receiver/podman
- receiver/postgresql
Expand Down
10 changes: 5 additions & 5 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,11 @@ updates:
schedule:
interval: "weekly"
day: "wednesday"
- package-ecosystem: "gomod"
directory: "/receiver/otelarrowreceiver"
schedule:
interval: "weekly"
day: "wednesday"
- package-ecosystem: "gomod"
directory: "/receiver/otlpjsonfilereceiver"
schedule:
Expand Down Expand Up @@ -1097,8 +1102,3 @@ updates:
schedule:
interval: "weekly"
day: "wednesday"
- package-ecosystem: "gomod"
directory: "/receiver/solacereceiver"
schedule:
interval: "weekly"
day: "wednesday"
1 change: 1 addition & 0 deletions receiver/otelarrowreceiver/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
186 changes: 186 additions & 0 deletions receiver/otelarrowreceiver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# OTel-Arrow Receiver

<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Stability | [development]: traces, metrics, logs |
| Distributions | [contrib] |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Areceiver%2Fotelarrow%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Areceiver%2Fotelarrow) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Areceiver%2Fotelarrow%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Areceiver%2Fotelarrow) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@jmacd](https://www.github.com/jmacd), [@lquerel](https://www.github.com/lquerel) |

[development]: https://github.com/open-telemetry/opentelemetry-collector#development
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
<!-- end autogenerated section -->

Receives telemetry data using the
[OTel-Arrow](https://github.com/open-telemetry/otel-arrow) protocol
via gRPC and standard [OTLP](
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md)
protocol via gRPC or HTTP.

## Getting Started

The OTel-Arrow receiver is an extension of the core OpenTelemetry
Collector [OTLP
receiver](https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver)
component with additional support for the
[OTel-Arrow](https://github.com/open-telemetry/otel-arrow) protocol.

OTel-Arrow supports column-oriented data transport using the Apache
Arrow data format. The OTel-Arrow
exporter
converts OTLP data into an optimized representation and then sends
batches of data using Apache Arrow to encode the stream. This
component contains logic to reverse the process used in the OTel-Arrow
exporter.

The use of an OTel-Arrow exporter-receiver pair is recommended when
the network is expensive. Typically, expect to see a 50% reduction in
bandwidth compared with the same data being sent using standard
OTLP/gRPC and gzip compression.

This component includes all the features and configuration of the core
OTLP receiver, making it possible to upgrade from the core component
simply by replacing "otlp" with "otelarrow" as the component name in
the collector configuration.

To enable the OTel-Arrow receiver, include it in the list of receivers
for a pipeline. No further configuration is needed. This receiver
listens on the standard OTLP/gRPC port 4317 and serves standard OTLP
over gRPC out of the box.

```yaml
receivers:
otelarrow:
```

## Advanced Configuration

Users may wish to configure gRPC settings, for example:

```
receivers:
otelarrow:
protocols:
grpc:
...
```

- `endpoint` (default = 0.0.0.0:4317 for grpc protocol, 0.0.0.0:4318 http protocol):
host:port to which the receiver is going to receive data. The valid syntax is
described at https://github.com/grpc/grpc/blob/master/doc/naming.md.

Several common configuration structures provide additional capabilities automatically:

- [gRPC settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configgrpc/README.md)
- [TLS and mTLS settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configtls/README.md)

### Arrow-specific Configuration

In the `arrow` configuration block, the following settings are available:

- `memory_limit` (default: 128MiB): limits the amount of concurrent memory used by Arrow data buffers.

When the limit is reached, the receiver will return RESOURCE_EXHAUSTED
error codes to the receiver, which are [conditionally retryable, see
exporter retry configuration](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md).

### Keepalive configuration

As a gRPC streaming service, the OTel Arrow receiver is able to limit
stream lifetime through configuration of the underlying http/2
connection via keepalive settings.

Keepalive settings are vital to the operation of OTel Arrow, because
longer-lived streams use more memory and streams are fixed to a single
host. Since every stream of data is different, we recommend
experimenting to find a good balance between memory usage, stream
lifetime, and load balance.

gRPC libraries do not build-in a facility for long-lived RPCs to learn
about impending http/2 connection state changes, including the event
that initiates connection reset. While the receiver knows its own
keepalive settings, a shorter maximum connection lifetime can be
imposed by intermediate http/2 proxies, and therefore the receiver and
exporter are expected to independently configure these limits.

```
receivers:
otelarrow:
protocols:
grpc:
keepalive:
server_parameters:
max_connection_age: 1m
max_connection_age_grace: 10m
```

In the example configuration above, OTel-Arrow streams will have reset
initiated after 10 minutes. Note that `max_connection_age` is set to
a small value and we recommend tuning `max_connection_age_grace`.

OTel Arrow exporters are expected to configure their
`max_stream_lifetime` property to a value that is slightly smaller
than the receiver's `max_connection_age_grace` setting, which causes
the exporter to cleanly shut down streams, allowing requests to
complete before the http/2 connection is forcibly closed. While the
exporter will retry data that was in-flight during an unexpected
stream shutdown, instrumentation about the telemety pipeline will show
RPC errors when the exporter's `max_stream_lifetime` is not configured
correctly.

[See the exporter README for more
guidance](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md). For the
example where `max_connection_age_grace` is set to 10 minutes, the
exporter's `max_stream_lifetime` should be set to the same number
minus a reasonable timeout to allow in-flight requests to complete.
For example, an exporter with `9m30s` stream lifetime:

```
exporters:
otelarrow:
timeout: 30s
arrow:
max_stream_lifetime: 9m30s
endpoint: ...
tls: ...
```

### Receiver metrics

In addition to the the standard
[obsreport](https://pkg.go.dev/go.opentelemetry.io/collector/obsreport)
metrics, this component provides network-level measurement instruments
which we anticipate will become part of `obsreport` in the future. At
the `normal` level of metrics detail:

- `receiver_recv`: uncompressed bytes received, prior to compression
- `receiver_recv_wire`: compressed bytes received, on the wire.

Arrow's compression performance can be derived by dividing the average
`receiver_recv` value by the average `receiver_recv_wire` value.

At the `detailed` metrics detail level, information about the stream
of data being returned from the receiver will be instrumented:

- `receiver_sent`: uncompressed bytes sent, prior to compression
- `receiver_sent_wire`: compressed bytes sent, on the wire.

## HTTP-specific documentation

To enable optional OTLP/HTTP support, the HTTP protocol must be
explicitly listed. It will use port 4318 by default. The OTel Arrow
protocol is not currently supported over HTTP.

```
receivers:
otelarrow:
protocols:
http:
```

See the core OTLP receiver for documentation specific to HTTP
connections, including:

- [Writing with HTTP/JSON](https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver#writing-with-httpjson)
- [CORS (Cross-origin resource sharing)](https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver#cors-cross-origin-resource-sharing)
119 changes: 119 additions & 0 deletions receiver/otelarrowreceiver/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package otelarrowreceiver // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/otelarrowreceiver"

import (
"errors"
"fmt"
"net/url"
"path"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/config/configgrpc"
"go.opentelemetry.io/collector/config/confighttp"
"go.opentelemetry.io/collector/confmap"
)

const (
// Protocol values.
protoGRPC = "protocols::grpc"
protoHTTP = "protocols::http"
)

type httpServerSettings struct {
*confighttp.HTTPServerSettings `mapstructure:",squash"`

// The URL path to receive traces on. If omitted "/v1/traces" will be used.
TracesURLPath string `mapstructure:"traces_url_path,omitempty"`

// The URL path to receive metrics on. If omitted "/v1/metrics" will be used.
MetricsURLPath string `mapstructure:"metrics_url_path,omitempty"`

// The URL path to receive logs on. If omitted "/v1/logs" will be used.
LogsURLPath string `mapstructure:"logs_url_path,omitempty"`
}

// Protocols is the configuration for the supported protocols.
type Protocols struct {
GRPC *configgrpc.GRPCServerSettings `mapstructure:"grpc"`
HTTP *httpServerSettings `mapstructure:"http"`
Arrow *ArrowSettings `mapstructure:"arrow"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a separate section for configuration? If the only setting is a memory limit, any reason not to include it at the top level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to open-telemetry/otel-arrow#43. This code is designed to drop-in where an OTLP receiver once stood, so leaving the Arrow settings in a separate section for future compatibility (was the idea).

}

// ArrowSettings support configuring the Arrow receiver.
type ArrowSettings struct {
// MemoryLimit is the size of a shared memory region used by
// all Arrow streams. When too much load is passing through, they
// will see ResourceExhausted errors.
MemoryLimit uint64
}

// Config defines configuration for OTel Arrow receiver.
type Config struct {
// Protocols is the configuration for the supported protocols, currently gRPC and HTTP (Proto and JSON).
Protocols `mapstructure:"protocols"`
}

var _ component.Config = (*Config)(nil)
var _ confmap.Unmarshaler = (*Config)(nil)

// Validate checks the receiver configuration is valid
func (cfg *Config) Validate() error {
if cfg.GRPC == nil && cfg.HTTP == nil {
return errors.New("must specify at least one protocol when using the OTel Arrow receiver")
}
if cfg.Arrow != nil && cfg.GRPC == nil {
return errors.New("must specify at gRPC protocol when using the OTLP Arrow receiver")
jmacd marked this conversation as resolved.
Show resolved Hide resolved
}
return nil
}

// Unmarshal a confmap.Conf into the config struct.
func (cfg *Config) Unmarshal(conf *confmap.Conf) error {
// first load the config normally
err := conf.Unmarshal(cfg, confmap.WithErrorUnused())
if err != nil {
return err
}

// Note: since this is the OTel-Arrow exporter, not the core component,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is http only not a valid configuration for this receiver? if it is, then i would remove this comment and re-enable the check below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP only means you've disabled OTel Arrow and you're identical to an OTLP receiver. I'm not sure why the user would want this. (Again, open-telemetry/otel-arrow#43 comes up.)

README states:

This receiver listens on the standard OTLP/gRPC port 4317 and serves standard OTLP over gRPC out of the box.

and later

To enable optional OTLP/HTTP support, the HTTP protocol must be explicitly listed. It will use port 4318 by default. The OTel Arrow protocol is not currently supported over HTTP.

if we added OTel Arrow support for HTTP streams, possibly, then it would make sense to apply the change you described, but I think we should separate the two transports into separate components.

// we allow a configuration that is free of an explicit protocol, i.e.,
// we assume gRPC but we do not assume HTTP, whereas the core component
// also has:
//
// if !conf.IsSet(protoGRPC) {
// cfg.GRPC = nil
// }

if !conf.IsSet(protoHTTP) {
cfg.HTTP = nil
} else {
var err error

if cfg.HTTP.TracesURLPath, err = sanitizeURLPath(cfg.HTTP.TracesURLPath); err != nil {
return err
}
if cfg.HTTP.MetricsURLPath, err = sanitizeURLPath(cfg.HTTP.MetricsURLPath); err != nil {
return err
}
if cfg.HTTP.LogsURLPath, err = sanitizeURLPath(cfg.HTTP.LogsURLPath); err != nil {
return err
}
}

return nil
}

// Verify signal URL path sanity
func sanitizeURLPath(urlPath string) (string, error) {
u, err := url.Parse(urlPath)
if err != nil {
return "", fmt.Errorf("invalid HTTP URL path set for signal: %w", err)
}

if !path.IsAbs(u.Path) {
u.Path = "/" + u.Path
}
return u.Path, nil
}
Loading
Loading