Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTLP Ingestion endpoint #12571

Merged
merged 2 commits into from Jul 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
@@ -1,6 +1,7 @@
/web/ui @juliusv
/web/ui/module @juliusv @nexucis
/storage/remote @csmarchbanks @cstyan @bwplotka @tomwilkie
/storage/remote/otlptranslator @gouthamve @jesusvazquez
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/discovery/kubernetes @brancz
/tsdb @jesusvazquez
/promql @roidelapluie
Expand Down
3 changes: 3 additions & 0 deletions .golangci.yml
Expand Up @@ -3,6 +3,9 @@ run:
skip-files:
# Skip autogenerated files.
- ^.*\.(pb|y)\.go$
skip-dirs:
# Copied it from a different source
- storage/remote/otlptranslator/prometheusremotewrite

output:
sort-results: true
Expand Down
5 changes: 4 additions & 1 deletion cmd/prometheus/main.go
Expand Up @@ -170,6 +170,9 @@ func (c *flagConfig) setFeatureListOptions(logger log.Logger) error {
case "remote-write-receiver":
c.web.EnableRemoteWriteReceiver = true
level.Warn(logger).Log("msg", "Remote write receiver enabled via feature flag remote-write-receiver. This is DEPRECATED. Use --web.enable-remote-write-receiver.")
case "otlp-write-receiver":
c.web.EnableOTLPWriteReceiver = true
level.Info(logger).Log("msg", "Experimental OTLP write receiver enabled")
case "expand-external-labels":
c.enableExpandExternalLabels = true
level.Info(logger).Log("msg", "Experimental expand-external-labels enabled")
Expand Down Expand Up @@ -420,7 +423,7 @@ func main() {
a.Flag("scrape.discovery-reload-interval", "Interval used by scrape manager to throttle target groups updates.").
Hidden().Default("5s").SetValue(&cfg.scrape.DiscoveryReloadInterval)

a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-at-modifier, promql-negative-offset, promql-per-step-stats, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.").
a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-at-modifier, promql-negative-offset, promql-per-step-stats, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.").
Default("").StringsVar(&cfg.featureList)

promlogflag.AddFlags(a, &cfg.promlogConfig)
Expand Down
2 changes: 1 addition & 1 deletion docs/command-line/prometheus.md
Expand Up @@ -52,7 +52,7 @@ The Prometheus monitoring server
| <code class="text-nowrap">--query.timeout</code> | Maximum time a query may take before being aborted. Use with server mode only. | `2m` |
| <code class="text-nowrap">--query.max-concurrency</code> | Maximum number of queries executed concurrently. Use with server mode only. | `20` |
| <code class="text-nowrap">--query.max-samples</code> | Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return. Use with server mode only. | `50000000` |
| <code class="text-nowrap">--enable-feature</code> | Comma separated feature names to enable. Valid options: agent, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-at-modifier, promql-negative-offset, promql-per-step-stats, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details. | |
| <code class="text-nowrap">--enable-feature</code> | Comma separated feature names to enable. Valid options: agent, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-at-modifier, promql-negative-offset, promql-per-step-stats, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details. | |
| <code class="text-nowrap">--log.level</code> | Only log messages with the given severity or above. One of: [debug, info, warn, error] | `info` |
| <code class="text-nowrap">--log.format</code> | Output format of log messages. One of: [logfmt, json] | `logfmt` |

Expand Down
3 changes: 3 additions & 0 deletions go.mod
Expand Up @@ -54,6 +54,8 @@ require (
github.com/shurcooL/httpfs v0.0.0-20190707220628-8d4bc4ba7749
github.com/stretchr/testify v1.8.4
github.com/vultr/govultr/v2 v2.17.2
go.opentelemetry.io/collector/pdata v0.66.0
go.opentelemetry.io/collector/semconv v0.81.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.42.0
go.opentelemetry.io/otel v1.16.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.16.0
Expand All @@ -64,6 +66,7 @@ require (
go.uber.org/atomic v1.11.0
go.uber.org/automaxprocs v1.5.2
go.uber.org/goleak v1.2.1
go.uber.org/multierr v1.8.0
golang.org/x/net v0.12.0
golang.org/x/oauth2 v0.9.0
golang.org/x/sync v0.2.0
Expand Down
9 changes: 8 additions & 1 deletion go.sum
Expand Up @@ -447,7 +447,7 @@ github.com/hashicorp/go-uuid v1.0.0/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/b
github.com/hashicorp/go-uuid v1.0.1/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/bN7x4byOro=
github.com/hashicorp/go-uuid v1.0.2 h1:cfejS+Tpcp13yd5nYHWDI6qVCny6wyX2Mt5SGur2IGE=
github.com/hashicorp/go-version v1.2.0/go.mod h1:fltr4n8CU8Ke44wwGCBoEymUuxUHl09ZGVZPK5anwXA=
github.com/hashicorp/go-version v1.2.1 h1:zEfKbn2+PDgroKdiOzqiE8rsmLqU2uwi5PB5pBJ3TkI=
github.com/hashicorp/go-version v1.6.0 h1:feTTfFNnjP967rlCxM/I9g701jU+RN74YKx2mOkIeek=
github.com/hashicorp/go.net v0.0.1/go.mod h1:hjKkEWcCURg++eb33jQU7oqQcI9XDCnUzHA0oac0k90=
github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
Expand Down Expand Up @@ -794,6 +794,10 @@ go.opencensus.io v0.22.3/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw=
go.opencensus.io v0.22.4/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw=
go.opencensus.io v0.24.0 h1:y73uSU6J157QMP2kn2r30vwW1A2W2WFwSCGnAVxeaD0=
go.opencensus.io v0.24.0/go.mod h1:vNK8G9p7aAivkbmorf4v+7Hgx+Zs0yY+0fOtgBfjQKo=
go.opentelemetry.io/collector/pdata v0.66.0 h1:UdE5U6MsDNzuiWaXdjGx2lC3ElVqWmN/hiUE8vyvSuM=
go.opentelemetry.io/collector/pdata v0.66.0/go.mod h1:pqyaznLzk21m+1KL6fwOsRryRELL+zNM0qiVSn0MbVc=
go.opentelemetry.io/collector/semconv v0.81.0 h1:lCYNNo3powDvFIaTPP2jDKIrBiV1T92NK4QgL/aHYXw=
go.opentelemetry.io/collector/semconv v0.81.0/go.mod h1:TlYPtzvsXyHOgr5eATi43qEMqwSmIziivJB2uctKswo=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.42.0 h1:pginetY7+onl4qN1vl0xW/V/v6OBZ0vVdH+esuJgvmM=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.42.0/go.mod h1:XiYsayHc36K3EByOO6nbAXnAWbrUxdjUROCEeeROOH8=
go.opentelemetry.io/otel v1.16.0 h1:Z7GVAX/UkAXPKsy94IU+i6thsQS4nb7LviLpnaNeW8s=
Expand All @@ -817,6 +821,7 @@ go.opentelemetry.io/proto/otlp v0.19.0 h1:IVN6GR+mhC4s5yfcTbmzHYODqvWAp3ZedA2SJP
go.opentelemetry.io/proto/otlp v0.19.0/go.mod h1:H7XAot3MsfNsj7EXtrA2q5xSNQ10UqI405h3+duxN4U=
go.uber.org/atomic v1.3.2/go.mod h1:gD2HeocX3+yG+ygLZcrzQJaqmWj9AIm7n08wl/qW/PE=
go.uber.org/atomic v1.5.0/go.mod h1:sABNBOSYdrvTF6hTgEIbc7YasKWGhgEQZyfxyTvoXHQ=
go.uber.org/atomic v1.7.0/go.mod h1:fEN4uk6kAWBTFdckzkM89CLk9XfWZrxpCo0nPH17wJc=
go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE=
go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0=
go.uber.org/automaxprocs v1.5.2 h1:2LxUOGiR3O6tw8ui5sZa2LAaHnsviZdVOUZw4fvbnME=
Expand All @@ -825,6 +830,8 @@ go.uber.org/goleak v1.2.1 h1:NBol2c7O1ZokfZ0LEU9K6Whx/KnwvepVetCUhtKja4A=
go.uber.org/goleak v1.2.1/go.mod h1:qlT2yGI9QafXHhZZLxlSuNsMw3FFLxBr+tBRlmO1xH4=
go.uber.org/multierr v1.1.0/go.mod h1:wR5kodmAFQ0UK8QlbwjlSNy0Z68gJhDJUG5sjR94q/0=
go.uber.org/multierr v1.3.0/go.mod h1:VgVr7evmIr6uPjLBxg28wmKNXyqE9akIJ5XnfpiKl+4=
go.uber.org/multierr v1.8.0 h1:dg6GjLku4EH+249NNmoIciG9N/jURbDG+pFlTkhzIC8=
go.uber.org/multierr v1.8.0/go.mod h1:7EAYxJLBy9rStEaz58O2t4Uvip6FSURkq8/ppBp95ak=
go.uber.org/tools v0.0.0-20190618225709-2cfd321de3ee/go.mod h1:vJERXedbb3MVM5f9Ejo0C68/HhF8uaILCdgjnY+goOA=
go.uber.org/zap v1.10.0/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q=
go.uber.org/zap v1.13.0/go.mod h1:zwrFLgMcdUuIBviXEYEH1YKNaOBnKXsx2IPda5bBwHM=
Expand Down
64 changes: 62 additions & 2 deletions storage/remote/codec.go
Expand Up @@ -14,6 +14,7 @@
package remote

import (
"compress/gzip"
"errors"
"fmt"
"io"
Expand All @@ -26,6 +27,7 @@ import (
"github.com/gogo/protobuf/proto"
"github.com/golang/snappy"
"github.com/prometheus/common/model"
"go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp"
"golang.org/x/exp/slices"

"github.com/prometheus/prometheus/model/exemplar"
Expand All @@ -38,8 +40,13 @@ import (
"github.com/prometheus/prometheus/tsdb/chunks"
)

// decodeReadLimit is the maximum size of a read request body in bytes.
const decodeReadLimit = 32 * 1024 * 1024
const (
// decodeReadLimit is the maximum size of a read request body in bytes.
decodeReadLimit = 32 * 1024 * 1024

pbContentType = "application/x-protobuf"
jsonContentType = "application/json"
)

type HTTPError struct {
msg string
Expand Down Expand Up @@ -806,3 +813,56 @@ func DecodeWriteRequest(r io.Reader) (*prompb.WriteRequest, error) {

return &req, nil
}

func DecodeOTLPWriteRequest(r *http.Request) (pmetricotlp.ExportRequest, error) {
contentType := r.Header.Get("Content-Type")
var decoderFunc func(buf []byte) (pmetricotlp.ExportRequest, error)
switch contentType {
case pbContentType:
decoderFunc = func(buf []byte) (pmetricotlp.ExportRequest, error) {
req := pmetricotlp.NewExportRequest()
return req, req.UnmarshalProto(buf)
}

case jsonContentType:
decoderFunc = func(buf []byte) (pmetricotlp.ExportRequest, error) {
req := pmetricotlp.NewExportRequest()
return req, req.UnmarshalJSON(buf)
}

default:
return pmetricotlp.NewExportRequest(), fmt.Errorf("unsupported content type: %s, supported: [%s, %s]", contentType, jsonContentType, pbContentType)
}

reader := r.Body
// Handle compression.
switch r.Header.Get("Content-Encoding") {
case "gzip":
gr, err := gzip.NewReader(reader)
if err != nil {
return pmetricotlp.NewExportRequest(), err
}
reader = gr

case "":
// No compression.

default:
return pmetricotlp.NewExportRequest(), fmt.Errorf("unsupported compression: %s. Only \"gzip\" or no compression supported", r.Header.Get("Content-Encoding"))
}

body, err := io.ReadAll(reader)
if err != nil {
r.Body.Close()
return pmetricotlp.NewExportRequest(), err
}
if err = r.Body.Close(); err != nil {
return pmetricotlp.NewExportRequest(), err
}
otlpReq, err := decoderFunc(body)
if err != nil {
return pmetricotlp.NewExportRequest(), err
}

return otlpReq, nil
}
23 changes: 23 additions & 0 deletions storage/remote/otlptranslator/README.md
@@ -0,0 +1,23 @@
## Copying from opentelemetry/opentelemetry-collector-contrib

This files in the `prometheus/` and `prometheusremotewrite/` are copied from the OpenTelemetry Project[^1].

This is done instead of adding a go.mod dependency because OpenTelemetry depends on `prometheus/prometheus` and a cyclic dependency will be created. This is just a temporary solution and the long-term solution is to move the required packages from OpenTelemetry into `prometheus/prometheus`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the author the normalization of metrics and labels of the Prometheus exporters in the OpenTelemetry Contrib project.

Several comments:

  • I hope you guys intend to enable the code that used to be behind the feature gate, to make sure Otel metrics are converted to proper Prometheus metrics that follow the metrics naming conventions (with unit, etc.).
  • OpenTelemetry plans to use namespaces everywhere in metric attributes, which will inevitably lead to very long labels in Prometheus. We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.
  • To improve the performance, I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name. I think that would be a nice addition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the original package and the review!

I hope you guys intend to enable the code that used to be behind the feature gate

Already done :) We don't want our users to disable the feature gate, hence its removed, and the functionality is enabled by default.

We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.

Interesting, can you give an example?

I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name.

That sounds great. We're using the same package in Mimir and I think we're seeing CPU cycles being spent on the string replaces. Looking it up using a cache sounds good. cc @bboreham @charleskorn

Do you still have plans to implement it upstream? If no, could you open an issue to track it, and we'll try to get someone to work on it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otel Technical Committee recently decided to enforce namespace prefixes for all metrics attributes. All metrics semantic conventions from OpenTelemetry are expected to be updated accordingly in the future.

For example, for JVM metrics: the process.runtime.jvm.gc.duration metric will come with the process.runtime.jvm.gc.name
and process.runtime.jvm.gc.action attributes. This will end up as process_runtime_jvm_gc_duration_seconds_counter{process_runtime_jvm_gc_name="...", process_runtime_jvm_gc_action="..."} in PromQL.

We could decide to simplify process_runtime_jvm_gc_name and process_runtime_jvm_gc_action labels to name and action respectively by removing the matching namespaces in metric name and its labels.

The benefit is that we get metrics and labels that look much like "classic" Prometheus metrics and labels. The problem is that the behavior could be counter-intuitive and difficult to predict in some cases (which labels get simplified or not?).

IMHO, everything would be better if OpenTelemetry didn't enforce namespaces in metrics attributes, but that boat has sailed 😅

Regarding the cache mechanism, I still had hopes to eventually implement it, but who am I kidding, I've been considering this for a year now, and haven't done a thing. I'll open an issue here ;-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New issue: #12627

We don't copy in `./prometheus` through this script because that package imports a collector specific featuregate package we don't want to import. The featuregate package is being removed now, and in the future we will copy this folder too.

To update the dependency is a multi-step process:
1. Vendor the latest `prometheus/prometheus`@`main` into [`opentelemetry/opentelemetry-collector-contrib`](https://github.com/open-telemetry/opentelemetry-collector-contrib)
1. Update the VERSION in `update-copy.sh`.
1. Run `./update-copy.sh`.

### Why copy?

This is because the packages we copy depend on the [`prompb`](https://github.com/prometheus/prometheus/blob/main/prompb) package. While the package is relatively stable, there are still changes. For example, https://github.com/prometheus/prometheus/pull/11935 changed the types.
This means if we depend on the upstream packages directly, we will never able to make the changes like above. Hence we're copying the code for now.

### I need to manually change these files

When we do want to make changes to the types in `prompb`, we might need to edit the files directly. That is OK, please let @gouthamve or @jesusvazquez know so they can take care of updating the upstream code (by vendoring in `prometheus/prometheus` upstream and resolving conflicts) and then will run the copy
script again to keep things updated.

[^1]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/translator/prometheus and https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/translator/prometheusremotewrite
41 changes: 41 additions & 0 deletions storage/remote/otlptranslator/prometheus/normalize_label.go
@@ -0,0 +1,41 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package normalize

import (
"strings"
"unicode"
)

// Normalizes the specified label to follow Prometheus label names standard
//
// See rules at https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
//
// Labels that start with non-letter rune will be prefixed with "key_"
//
// Exception is made for double-underscores which are allowed
func NormalizeLabel(label string) string {
// Trivial case
if len(label) == 0 {
return label
}

// Replace all non-alphanumeric runes with underscores
label = strings.Map(sanitizeRune, label)

// If label starts with a number, prepend with "key_"
if unicode.IsDigit(rune(label[0])) {
label = "key_" + label
}

return label
}

// Return '_' for anything non-alphanumeric
func sanitizeRune(r rune) rune {
if unicode.IsLetter(r) || unicode.IsDigit(r) {
return r
}
return '_'
}
@@ -0,0 +1,19 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package normalize

import (
"testing"

"github.com/stretchr/testify/require"
)

func TestSanitizeDropSanitization(t *testing.T) {
require.Equal(t, "", NormalizeLabel(""))
require.Equal(t, "_test", NormalizeLabel("_test"))
require.Equal(t, "key_0test", NormalizeLabel("0test"))
require.Equal(t, "test", NormalizeLabel("test"))
require.Equal(t, "test__", NormalizeLabel("test_/"))
require.Equal(t, "__test", NormalizeLabel("__test"))
}