Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Integrations in Grafana Agent Operator #1224

Merged
merged 13 commits into from
Feb 18, 2022

Conversation

rfratto
Copy link
Member

@rfratto rfratto commented Jan 4, 2022

This RFC proposes a way to add support for integrations to Grafana Agent Operator by:

  • Defining a new Integration CRD, which specifies an integration to run.
  • Update GrafanaAgent to discover Integrations and run them.

Supersedes #883.

@rfratto rfratto added the proposal Proposal or RFC label Jan 4, 2022
@rfratto rfratto requested review from rlankfo and rgeyer January 4, 2022 17:06
@rfratto
Copy link
Member Author

rfratto commented Jan 4, 2022

@flokli I know you've used the operator in the past and given valuable feedback, so just giving you a heads up for this in case you wanted to weigh in :)

Copy link
Contributor

@flokli flokli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Thanks for the highlight!

docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
Comment on lines 128 to 132
Some integrations may require changes to the deployed Pods to function
properly. MetricsIntegrations will additionally support declaring `volumes`,
`volumeMounts`, `secrets` and `configMaps`. These fields will be merged with
the fields of the same name from the root GrafanaAgent resource when creating
integration pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this look like? Will I need to pass the credential inline inside the MetricsIntegration resource? Or how would the syntax to refer to a secret from a ConfigMap look like?

Maybe an example with credentials should be used.
I skimmed at https://grafana.com/docs/grafana-cloud/integrations/integrations/integration-github/, and there the token is simply provided inline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, (unfortunately) not all of the exporters we embed support file-based secrets. If an integration doesn't support it, you'd have to declare the secret verbatim in the YAML block. But you could use it for things like kafka_exporter:

apiVersion: monitoring.grafana.com/v1alpha1
kind: MetricsIntegration
metadata:
  name: kafka
  namespace: default
spec:
  name: kafka_exporter
  type: normal
  config: |
    ca_file: /etc/grafana-agent/secrets/kafka-ca-file
    # ... 
  # Same "secrets" field present in GrafanaAgent.spec, where each secret 
  # is loaded from the same namespace and gets exposed at 
  # /etc/grafana-agent/secrets/<secret name>
  secrets: [kafka-ca-file]

Comment on lines 16 to 29
available in Grafana Agent. This document proposes adding support for
integrations into the Grafana Agent Operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to properly define what an "Integration" actually /is/.

https://grafana.com/docs/grafana-cloud/integrations/integrations/ seems to contain a mix of scraping configs, "grafana-agent integrations" (like the github one) and things you can "install" into your grafana installation (etcd, cloudwatch integration).

For now I'll just assume the second type in my list is what this proposal is about, but the integrations name is really used for a bunch of different things, that should probably have distinct names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that naming is hard :) In the context of Grafana Agent, an integration is something that helps you generate telemetry data. Historically that's been embedded Prometheus exporters, but people are starting work on other types of integrations for things like logging.

I've tried to clarify this in the doc.

docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
Comment on lines 73 to 76
* `daemonset`: Declares that the `name` integration should be run on every Node in the
Kubernetes cluster. It is invalid to have a GrafanaAgent CRD discover more
than one `daemonset` integrations with the same `name` . Example integration:
`node_exporter`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While practical for sure, I'm not convinced we should ship all those exporters in the grafana-agent binary at all. It's probably perfectly okay to tell people to install prometheus-node-exporter from https://prometheus-community.github.io/helm-charts with the values necessary to get monitors installed and labels in the right shape dashboards expect them to be (and document this prominently)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably always want to continue embedding some exporters in the agent, since it makes things easier for non-Kubernetes users, which is still a userbase we're interested in helping.

Though if an exporter only makes sense for Kubernetes (or isn't popular enough to justify embedding it), we would definitely prefer recommending installing that exporter with Helm and using a ServiceMonitor/PodMonitor/Probe for collecting metrics from it instead of embedding it directly into the agent.

@flokli
Copy link
Contributor

flokli commented Jan 4, 2022

Orthogonal question: Tracing wouldn't be a daemonset integration, but its own thing, right?

Right now, I define manage that daemonset manually, outside of the operators control (yet)

docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
docs/rfcs/xxxx-integrations-in-operator.md Outdated Show resolved Hide resolved
@rfratto
Copy link
Member Author

rfratto commented Jan 4, 2022

Orthogonal question: Tracing wouldn't be a daemonset integration, but its own thing, right?

Right now, I define manage that daemonset manually, outside of the operators control (yet)

Right, Tracing support would be its own thing with its own set of CRDs at some point. The agent has four "subsytems:"

  1. Metrics
  2. Logs
  3. Integrations
  4. Traces

We've added support for the first two subsystems in the operator, and this adds support for the third. Traces would be next up after this.

(and thanks for reviewing this!)

rgeyer
rgeyer previously approved these changes Jan 7, 2022
Copy link
Contributor

@rgeyer rgeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slow to respond, but this LGTM.

@rfratto
Copy link
Member Author

rfratto commented Feb 16, 2022

I've been trying to find a way to support integrations in the operator since September, and the most recent attempt at implementing this proposal revealed yet more issues.

My biggest takeaway is that integrations are really complicated, and we need to take baby steps towards a nicer solution. I've just pushed a third attempt at this proposal which supports all integrations in a generic way, but does so at the expense of some dramatic tradeoffs. It's not perfect, but I can't think of a simpler solution that we can build on top of.

PTAL @hjet @captncraig @rlankfo

@rfratto rfratto dismissed rgeyer’s stale review February 16, 2022 19:19

Rejecting approval because the nature of the proposal has changed


* `allNodes`: True when the `name` integration should run on all Kubernetes
Nodes.
* `unique`: True when the `name` integration must be unique across a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To motivate this would maybe provide a sample use case here / example of an integration where you'd need this to be the case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what happens if you say allNodes: true and unique: true. Not all combinations make sense. Maybe a single field with three possibilities is the way to go (although I don't love singleton to describe it either)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a truth table for relevant integrations:

allNodes unique example integration
false false mysqld_exporter
false true statsd_exporter
true false process_exporter
true true node_exporter

Some of the table is arbitrary; there's no reason you couldn't deploy statsd_exporter on all nodes.

Also it's noteworthy to say that allNodes has no bearing on the generated agent config, while unique does.

Copy link
Member

@rlankfo rlankfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I think we can iterate on this to make configuration easier for users in the future but with that we would likely have some trade-offs you've already considered (e.g. operator is aware of specific integrations and how to handle them).

Copy link
Contributor

@hjet hjet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well! This is a great start and like @rlankfo said we can incrementally hide more config from the user / build more validation logic into operator. In most cases we'll be generating these for our users anyways....

Copy link
Contributor

@captncraig captncraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. My biggest point of confusion is just the "unique" constraint and what it means. I'd like to make sure we have the correct terminology to make that distinction clearer.

@rfratto
Copy link
Member Author

rfratto commented Feb 18, 2022

I like it. My biggest point of confusion is just the "unique" constraint and what it means. I'd like to make sure we have the correct terminology to make that distinction clearer.

Yeah, I've been struggling with the naming convention too. I'll merge the RFC as-is in Draft, and we can revisit brainstorming names to use consistently across the operator and metrics-next before this is included in a release.

@rfratto
Copy link
Member Author

rfratto commented Feb 18, 2022

I'm assigning this RFC-0002.

@rfratto rfratto enabled auto-merge (squash) February 18, 2022 17:46
@rfratto rfratto enabled auto-merge (squash) February 18, 2022 17:50
@rfratto rfratto merged commit 6cb2cac into grafana:main Feb 18, 2022
mattdurham added a commit that referenced this pull request Feb 25, 2022
* Update node_exporter dependency to v1.3.1 (#1228)

* Add node_exporter to depcheck

* update weaveworks/common dependency

* map current release flags and changed defaults

* documentation

* revert accidental checkin

* print out flags when node_exporter test fails to assist debugging

* oops, i introduced some flags from master by mistake

* Introduce experimental integrations revamp (#1198)

* [dev.multiple-integrations] Enable present integrations by default, deprecate enabled field (#1062)

* integrations: default to enabled by default

* document deprecation of enabled

* pkg/integrations: support *_configs field for integrations (#1130)

Creates the basic code to unmarshal integrations from a YAML field
called <integration name>_configs, which is a slice of that integration.

Note that this is NOT wired up to the integrations manager yet, and
trying to run the agent with more than one integration of the same type
will likely cause problems.

* [dev.multiple-integrations] Prototype new integrations subsystem (#1142)

* wip: prototype new integrations subsystem

* implement Controller with basic logic for Integration and UpdateIntegration

* Implement HTTPIntegration for Controller

* decouple controller and subsystem

* don't have controller implement integration

slightly less smelly now

* multiplexer integration

* rely on boilerplate for multiplexing for now

generics would be nice here

* remove multiplex_integration.go

Also a little code smelly. Instead of having integrations that run other
integrations, I'm going to fall back to having only one controller.

* introduce Subsystem, unexport Controller

start wiring up things to Subsystem

* introduce v2 agent integration to use for testing

* start wiring metrics integrations

* rename Options to Globals

call a spade a spade

* add subsystem options to globals

* remove dead code

* metricsutils: calculate self-scraping based on globals

* complete HTTP target API

* working example with agent integration

* appease the linter

* don't return an error when context to cancel an integration is closed

* once again i am asking the linter to forgive my typos

* fix bug where labels from individual targets were getting dropped at the API endpoint

* pkg/config: fix broken test

* finish unit tests for integrations v2 controller

* metricsutil/metricshandler_integration: make job name unique

Before this change, the job name would have collided when using multiple
instances of the same integration.

* ensure that global subsystem labels are injected into targets

* integrations/v2: Infer target hostname from SD API host (#1175)

* [dev.multiple-integrations] integrations/v2: allow shimming between v1 and v2 integrations. (#1179)

* integrations/v2: allow shimming between v1 and v2 integrations.

Shimming is done by changing how the integration registration works; a
new RegisterDynamic was added that allows for creating Configs at
runtime. Here be dragons; this should be removed whenever we no longer
have a need for it.

* fix lint

* pkg/integrations/v2: use "RegisterLegacy" instead of a generic mechanism

* fine, I won't add the deprecation notice if it will make the linter sad

* pkg/integrations: re-align (#1181)

This commit reverts 69ba2dd in favor of
allowing the new subsystem to handle multiple instances of integrations.

This commit also removes the wal_truncate_frequency field from
integrations as it is the only field from old integrations that does not
have a current counterpart.

* [dev.multiple-integrations] Hide integrations/v2 behind a feature flag (#1185)

* feature flag wip

* dynamically switch between integrations v1 and v2

default to v1.

* pkg/integrations/versionselector to file in pkg/config

* pkg/config: fix defaults for Integrations

* pkg/config: use more generic way to unmarshal differently based on flag

* add missing godoc comment

* more comments

* switch to deferred unmarshaling

* remove unused Config field

* simplify completeUnmarshal

* do not perform lazy deferred unmarshaling

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

* Merge main into dev.multiple-integrations (#1184)

* Fix typo (#1141)

* Traces: Improved pod association in PromSD processor (#1137)

* Improve k8s pod association

* Add tests

* Changelog

* typo

* Add prom_sd_pod_association

* Extend tests for pod associations

* Docs for pod association config

* Lint fixes

* Move to unreleased

* Add instrumentation recommendations

* Remove uncessary constants

* Improve tests

* remote config with http(s) provider (#1143)

* sample remote config code with http provider

* use t.TempDir() in unit test

* no need to clean up after T.TempDir()

* use NewClientFromConfig and make caller responsible for calling SetDirectory

* handle nil HTTPClientConfig

* remove blank identifier assignment

* pass basic auth command line flags for remote config

* address pr nits

* add expiremental flag

* set loader inline

* update changelog

* add remote config section in docs

* pr comment updates

* announce patch releases for cve-2021-41090 (#1152)

* Merge patch release to main (#1153)

* Add secret type to sensitive values

* Break out config tests to their own implementation. Also remove username has a sensitive value.

* Update changelog

* Fix failing test

* Scrub secrets when marshaling instance configs

* update for v0.21

* Updated changes from the merge.

* Remove changelog

* Scrub out receivers has ***receivers_scrubber***:null

* obscure etcd/consul credentials

* Update pkg/traces/config_test.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Update pkg/config/config.go

* go fmt

* Change to using custom object and return <secret>

* Fix bad merge

* [v0.21.2] toggle config endpoint (#19)

* disable /-/config endpoint by default

* disable scraping api get endpoint as well

* fix new test

* add test and rename flag

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Update version to v0.21.2

* Update defaults.go

* fix /-/config endpoint

* also fix non-pointer config bug

* temporarily disable linting for release

* fix lint errors

Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter (#1162)

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter

A recent change broke the usage of POSTGRES_EXPORTER_DATA_SOURCE_NAME for the postgres_exporter.
As the incorrect variable was checked in the if clause, it always raises an error.

* changelog: keep feature -> enhancement -> bugfix order

* postgres_exporter: add regression test

Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix syntax error in Jsonnet logs helper method (#1174)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* cAdvisor Integration (#1081)

* Add cadvisor module

* Begin creating common config for cadvisor

* Don't export internal state

* Finish config options for cadvisor

* Set config options, and implement cAdvisor collectors

* Linting

* Buildflags for cadvisor only in linux

* I R LEArN Build Tags

* Don't zero value the zero value

* Offload sketchy global var manipulation to the integrations Run func

* Remove unused collectors

* Lint

* Create generic stub integration and use it for cadvisor

* Lint

* Final refactor of cAdvisor config for unsupported platforms. Pared down stub integrations.

* Lint

* Docs for cadvisor config

* Update changelog

* Update pkg/integrations/stub_integration.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Reorder changelog

* Instance key clarity

* Inclusive naming

* Finish name changes

Keep default disable metric list in sync with upstream

Idiomatic golang

* Hardcode disabled metrics for cadvisor

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Remove log-level flag from systemd unit file (#1177)

* Upgrade to OTel v0.40.0 (#1176)

* Upgrade to OTel v0.40.0

* Changelog

* Add factories check

* go mod tidy

* config/features: create package to standardize experimental features (#1170)

* config/features: create package to standardize experiemental features

This commit introduces a new package, pkg/config/features, which allows
defining a set of features and validating whether flags associated with
those features are allowed to be set.

Closes #1163

* update documentation

(also s/enabled-features/enable-features)

* Fix typo

* Update pkg/config/features/features.go

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Revert "Merge main into dev.multiple-integrations (#1184)" (#1189)

This reverts commit ad76ec5.

* [dev.multiple-integrations] Revert breaking changes to existing integrations (#1191)

* revert breaking changes to integrations v1

This commit reverts #1062 in favor of making breaking changes directly
in integrations-next instead. The part of #1181 to remove
`wal_truncate_frequency` has also been reverted.

As part of this change, the enabled field is removed from the v2
common metrics configs, and v2 integrations can no longer be disabled.
v2 integrations can only be disabled by removing them from the YAML.

* integrations/v2: remove stale reference to ErrDisabled

(fix typo too)

* integrations/v2: bring in common config decoupling

* [dev.multiple-integrations] Introduce autoscraper (#1195)

* pkg/integrations/v2: introduce self-scraping

* linting

* [dev.multiple-integrations] Multiple instances of integrations (#1196)

* multiple instances of integrations

opt in relevant v1 integrations into supporting multiple instances

* shims should check for instance key override

* Document integrations-next (#1197)

* document integrations-next

* remove json tags since they make markdown unhappy

* changelog

* s/Run/RunIntegration

* remove stale comment about integrations.controller purpose

* create dedicated run method for instanceScraper

* s/expoter/exporter/g

* Document why an autoscrape.Scraper manages a set of per-instance scrapers

* spell out prerequisite instead of pre-req

* use go.uber.org/atomic to make the code a little easier to follow

* remove started callback for running integration

* use smaller interface for autoscrape

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Fix panic when using 'stdout' in automatic logging (#1233)

* integrations-next: fix bug where v2 integrations were not being strictly unmarshaled (#1235)

* Remove jsonnet vendor folders (#1222)

* remove jsonnet vendor

This adds all vendor folders into .gitignore and removes cached vendor
files from the repository.

Closes #1221

* Update scripts and instructions for jsonnet vendor removal

* `make example-dashboards` will now also run `jb install`
* k3d environment instructions now include `jb install`
* smoke-test.bash will now run `jb install` prior to `tk apply`

* Fix link to k3d example in DEVELOPERS.md (#1242)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix node_exporter upgrade docs (#1239)

* Fix panic in automatic logging with stdout backend (#1243)

* pkg/util: support custom yaml.Unmarshaler implementations for util.UnmarshalYAMLMerged (#1244)

It's common for config types to have implement yaml.Unmarshaler for:

* Applying defaults
* Applying extra logic post-unmarshal

If these config types were unmarshaled through util.UnmarshalYAMLMerged,
the yaml.Unmarshaler implementation would never complete successfully,
preventing the post-unmarshal logic from running.

This issue was introduced in #1192, but went unnoticed until #1228
implemented yaml.Unmarshaler to perform field migrations. #1240 reported
the issue.

This commit fixes the bug by performing a second non-strict unmarshal to
ensure that all input values unmarshal successfully, with the exception
of unmarshal errors unrelated to unrecognized field names.

This is hacky, but it's worthwhile noting that util.UnmarshalYAMLMerged
is a temporary workaround needed for the integrations-next migration,
and will eventually be removed.

* Update k3d example grafana/grafonnet-lib version (#1246)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Create an e2e framework with support for running tests against k8s (#1234)

* e2e: create an e2e framework with support for running tests against a k3d cluster

* add new E2E drone job

* E2E tests should pass when doing a release

* sign drone.yml again

* move e2e lint to different step that has golangci-lint installed

* upgrade golangci-lint and go for e2e test

* e2e: add gcc

* E2E: install build-essential to get a working full gcc env

* :(

* e2e: support running from inside of docker

* fix lint error

* address review feedback

* Operator: fix bug where /-/ready and /-/healthy always returned 404 (#1252)

* operator: fix bug where /-/ready and /-/healthy always returned 404

controller-runtime must have at least one ready/healthy check for the endpoints to exist

* fix lint error, use healthz.Ping

* Make scraping-svc use the new `metrics:` key (#1259)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* update prometheus dependency (#1260)

* corrected typo (#1265)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags (#1264)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use :main tag for images in smoke test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Set IMAGE_BRANCH_TAG env var in drone and actions pipelines

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove quotes from Makefile variable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove force_release action

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* prepare for v0.22.0 release (#1266)

* prepare for v0.22.0 release

* remove E2E pipeline

* Add basic testing framework for operator (#1268)

* remove dedicated go.mod for e2e/

* move e2e/k8s to pkg/util/k8s

* Migrate operator tests to pkg/util/k8s

* remove dedicated e2e tests

* allow skipping TestCluster in pkg/util/k8s

* remove e2e/

* fix bad merge

* fix order of make env args for windows

* actually declare referenced docker volume

* introduce pkg/util/subset for asserting subset of objects

* refactor operator so it's testable

* define basic integration test for operator

* fix lint errors

* fix invalid address in operator test config

* Update release-note.md (#1267)

* Set scrape User-Agent header during init (#1274)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Upgrade to Go 1.17 (#1278)

* Upgrade to 1.17.6 in go.mod and Dockerfiles

* Update CHANGELOG.md to mention the update

* Update Go version in drone/actions pipelines

* Update go.mod, go.sum files via

* Re-sign drone.yml

* Remove leading newline causing drone build to fail

* Bump golangci-lint image to a version using Go 1.17

* Re-attempt to solve linter issue with new golangci-lint image

* Remove suffix of exclude rules

* Clean previous Go version before unpacking Go 1.17

* Also clean up previous Go versions in other steps

* fix typo (#1284)

* Use custom Go version in agent-operator Dockerfile (#1286)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* pkg/operator: refactor resource hierarchy discovery (#1271)

* pkg/operator: refactor resource hierarchy discovery

This commit moves common logic related to discovering the resource
hierarchy to pkg/operator/hierarchy. This new package requires less
boilerplate, which the reconciler is updated to take advantage of.

* remove unused code

* test construction of resource hierarchy

* add missing build constraints

* small extra cleanup to use pointer package

* review feedback

* update agent-build-image for go 1.17 (#1287)

(also use a consistent base image tag instead of latest)

* Skip non-ready entries when listing instances (#1289)

* Skip non-ready instances in LoadInstances()

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix panic in prom_sd_processor when address is empty (#1279)

* Fix panic in prom_sd_processor when address is empty

* Fix panic in prom_sd_processor when address is empty

* Fix docs

* Add test case

* Lint

* Move to unreleased

* Operator: generate proxy_url for remote_write (#1298)

* operator: generate proxy_url for remote_write

* fix weird indentation in test

* Use log format in traces subsystem (#1272)

* Use log format in traces subsystem

* Changelog

* Undo unwanted change

* Fix changelog entry

* integrations-next: Add extra_labels to inject extra labels for an integration (#1312)

* integrations-next: Add extra_labels to inject extra labels for an integration.

* separate tests

* Fix anchor link on operator docs (#1302)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* updated config URL (#1304)

The existing URL returns a 404: https://grafana.com/docs/agent/latest/getting-started/configuration/_index.md 
Updated to https://grafana.com/docs/agent/latest/configuration/

* Fix typo in node_exporter (#1325)

* Allow remote_write URL credentials (#1329)

* Bypass Prometheus password redaction

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add inline secret in existing test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

* Add to scrubbed testcase as well

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Stop appending duplicate exemplars (#1316)

* Add memExemplar in stripeSeries as first iteration

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add test for skipped duplicate exemplars; Simplify conditional

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry; discard test errors

* Move changelog entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add Benchmark for AppendExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Discard error on added benchmark

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use original exemplar struct instead of custom memExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Surround benchmark loop with start/stop timers and close test storage

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add comment about prepopulating exemplars on WAL startup

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in the totalAppendedExemplars metric

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make comment more discoverable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make sure we're recording exemplars for non-nil series ref only

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* integrations-next: wait for integrations to exit after stopping them (#1318)

* integrations-next: wait for integrations to exit after stopping them

* fix lint errors

* minor refactor

* integrations-next: stop holding config mutex for entire reload

* make controller.run authoritative over running integrations

* fix log line

* move running integrations into a dedicated worker pool

* operator/hierarchy: stop using field selector when listing Secrets & ConfigMaps (#1340)

The initial implementation of hierarchy.KeySelector injected a
FieldSelector when listing Secrets and ConfigMaps to immediately return
the single object being queried for.

This causes a problem with the client generated by the
controller-runtime framework, where the client is wrapped in a cache and
field indexer (where only the namespace is indexed by default).

This commit avoids using the field selector and the index lookup. The
resulting behavior aligns more closely with discovering other resources
in the hierarchy (i.e., ServiceMonitors), where the List call is also
insufficient and needs post-processing via Matches to find the final
list of resources.

Given the controller-runtime client uses an informer for reads, all
relevant Secrets and ConfigMaps are already in-memory anyway, and using
the index for a faster List is a bit of an over-optimization at the
moment.

* Add dependabot to update go modules and github actions. (#1217)

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* smoke framework refactor (#1326)

* Agent smoke test (#1291)

* convert smoke script to go program

* update build for agent-smoke

* fix pr comments

* use existing log helper package

* refactor context cancel

* update exit codes

* use ticker

* prefer oklog/run instead of errgroup

* use nop logger

* refactor task interface

* remove functional options

* log.With for task loggers

* move smoke to tools

* build smoke image, push to internal registry

* move crow to tools

* add gcr_admin secret

* fix link to crow

* add smoke libsonnet and use in local k3d smoke test

* add deletePodBySelectorTask

* scale smoke-test replica down after local test

* refactor smoke Options to Config

* update duration usage message

* add some basic unit tests

* newlines

* pass mutation frequency and chaos frequency from smoke script

* pull crow image from gcr

* update smoke script

* move monitoring to smoke libsonnet

* move additional smoke resources needed in deployment tools

* reference libsonnet files from grafana-agent dep

* make drone

* fix images in smoke script

* get rid of extVars

* update k3d example environment to reference etcd from new location

* update smoke docker builds to use go1.17

* use pointer.Int64

* refactor smoke jsonnet (#1296)

* add policy rule for list and delete pods (#1319)

* refactor smoke.new function to take config object (#1327)

* Apply suggestions from code review

* Update production/tanka/grafana-agent/smoke/crow/main.libsonnet

* Update production/tanka/grafana-agent/smoke/main.libsonnet

* Update example/k3d/scripts/smoke-test.bash

Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* readme update (#1338)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Correct link to the configuration (#1036)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Add stale check Github Action (#1345)

* Add a stale check GH action to run every 24 hours

* remove old stale.yml file

* add permissions to action

* update the stale message to clarify when the stale label will get
removed

* Update .github/workflows/stale.yml

* stale action: fix missing indent (#1346)

* Fix mssql issue (#1351)

* Add K8s Events integration (#1330)

* Add K8s eventhandler integration (#1310)
* Add docs and sample manifests to eventhandler integration (#1328)
* Wait for cache to flush before returning
* Clarify eventhandler docs (#1334)
* Clarify docs
* Update CHANGELOG.md
* Review changes (#1349)

* stale action: fix typo in label exemptions (#1347)

* update withVolumesMixin for agent jsonnet (#1358)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Configure cluster label using logs client external_labels param (#1357)

* Configure cluster label using logs client external_labels param
* Update CHANGELOG.md

* add password file and basic auth round tripper in crow (#1361)

* add password file and basic auth round tripper in crow

* add ca-certificates in crow image

* add orgID flag

* update help text

* default send_exemplars to true in remote_write (#1352)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update eventhandler labels (#1368)

* Update eventhandler integration labels
* Update CHANGELOG
* Remove unnecessary kind label

* update changelog (#1374)

Remove BUGFIX entries that fix a bug introduced by main (i.e., bugs
which were never part of a release)

* Prepare for release of v0.23.0 (#1377)

* Update version references

* Fix fat-fingered delete; Remove mention of upgrade Go

* RFC: Design in the open (#1055)

* rfc: first draft of RFC0001

* add placeholder for PR

* update PR link

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify "designing in the open" is best-effort

* update 0001

* fix dead link in production/README.md

* add recommended sections for RFC proposals

* describe the process for approving a proposal

* ignore RFC template in link checker

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* do my nitty 80-char line length limit change

* indent pros/cons to a single section

* document process for superseding RFCs

* remove RFC mutability requirement

* add extra flavor around not recommending google docs

* require Google Doc -> RFC conversion

* move new files

Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Add Grafana Labs SECURITY.md (#1356)

Signed-off-by: Richard Hartmann <richih@richih.org>

* Add readiness check to metrics component (#1369)

* PR Base

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix autoscrape's mockInstance

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in atomic readiness check

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Add CHANGELOG.md entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Reference page to download windows installer (#1372)

Fixes #1366

* fix typo in node_exporter_config (#1389)

which should be `privileged` instead of `priviliged`

* Add option for Operator to pass arguments to GrafanaAgent #1227 (#1248)

* 1250 oauth2 tracing (#1386)

* Add oauth support for trace Otel trace exporter via opentelemetry-collector-contrib oauth2clientauthextension

* start extensions on collector instance startup

fix decoding to otelconfig

build extensions

add oauth extension to service map

* Update traces config documentation

* lint fixes

* fix godoc comments

* pass exporter index directly to exporter name generator

* PR feedback; Update Changelog

* sort extensions when sorting pipelines for testing determinism

* README: Fix link to agent logo (#1396)

* update MAINTAINERS.md (#1402)

* add smoke alerts to mixin; move local alerts into examples dir (#1397)

* add smoke alerts to mixin; move local alerts into examples dir

* add podPrefix for smoke test

* podPrefix in libsonnet config

* [RFC] Integrations in Grafana Agent Operator (#1224)

* rfc: integrations in grafana agent operator

Supersedes #883

* add missing links

* Apply suggestions from code review

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify how many daemonsets/deployments/service/secrets are created

* add example of defining secrets

* try defining integrations

* s/IntegrationsMonitor/IntegrationMonitor/g

* simplify proposal

* add alternatives

* remove old reference to `hasMetrics` field

* document example generated agent configuration file

* assign ID RFC-0002

* add missing PR link

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* add fake rw endpoint to smoke program (#1405)

* fix alerts typo (#1407)

* continuous delivery for smoke images (#1408)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* fix continuous delivery job errors (#1409)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* [operator] - Use _file variants for basic auth credentials. (#1411)

* use password_file alternatives in operator config

* update tests

* reduce smoke alert noise (#1412)

* reduce smoke alert noise

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update production/grafana-agent-mixin/alerts.libsonnet

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* update cpu check comment

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* add minimum load threshold to cpu alert

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Clarify usage of instanceNamespaceSelector (#1413)

* RFC-0001: Add status to RFC (#1391)

* rfc-0001: add rules for when RFC PRs should be merged

* use status field instead of merge to indicating state

* Parametrize logs DaemonSet K8s manifests (#1420)

* Parametrize logs daemonset K8s manifests
* Update CHANGELOG.md

* Extend linting configuration file (#1421)

* Add depguard linter to reject packages we tend to avoid
* Replace golint with revive, since golint is deprecated
* Remove interfacer, which is deprecated with no replacement
* Add makezero linter to detect misuse of make with append
* Add tenv to prefer t.Setenv over os.Setenv in tests
* Add whitespace to report unnecessary blank lines
* Ignore test files for errcheck

In addition to the above, the following changes were made:

* Remove settings that just re-set default values, instead pointing to the website to retrieve defaults.
* Simplify the errcheck rule to only include functions we actually need to ignore.

* Main merge changes

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>
Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>
Co-authored-by: DataPoints <langer.markus@gmail.com>
Co-authored-by: Alex <52292902+alexrudd2@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: melGL <81323402+melgl@users.noreply.github.com>
Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com>
Co-authored-by: Joseph Woodward <josephwoodward@xeuse.com>
Co-authored-by: hanif <hjet@users.noreply.github.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>
Co-authored-by: laiwei <laiwei.ustc@gmail.com>
Co-authored-by: Sam <shamsalmon@users.noreply.github.com>
Co-authored-by: Chris Knutson <christopher.knutson@gmail.com>
Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Craig Peterson <192540+captncraig@users.noreply.github.com>
mattdurham added a commit that referenced this pull request Feb 25, 2022
* Update node_exporter dependency to v1.3.1 (#1228)

* Add node_exporter to depcheck

* update weaveworks/common dependency

* map current release flags and changed defaults

* documentation

* revert accidental checkin

* print out flags when node_exporter test fails to assist debugging

* oops, i introduced some flags from master by mistake

* Introduce experimental integrations revamp (#1198)

* [dev.multiple-integrations] Enable present integrations by default, deprecate enabled field (#1062)

* integrations: default to enabled by default

* document deprecation of enabled

* pkg/integrations: support *_configs field for integrations (#1130)

Creates the basic code to unmarshal integrations from a YAML field
called <integration name>_configs, which is a slice of that integration.

Note that this is NOT wired up to the integrations manager yet, and
trying to run the agent with more than one integration of the same type
will likely cause problems.

* [dev.multiple-integrations] Prototype new integrations subsystem (#1142)

* wip: prototype new integrations subsystem

* implement Controller with basic logic for Integration and UpdateIntegration

* Implement HTTPIntegration for Controller

* decouple controller and subsystem

* don't have controller implement integration

slightly less smelly now

* multiplexer integration

* rely on boilerplate for multiplexing for now

generics would be nice here

* remove multiplex_integration.go

Also a little code smelly. Instead of having integrations that run other
integrations, I'm going to fall back to having only one controller.

* introduce Subsystem, unexport Controller

start wiring up things to Subsystem

* introduce v2 agent integration to use for testing

* start wiring metrics integrations

* rename Options to Globals

call a spade a spade

* add subsystem options to globals

* remove dead code

* metricsutils: calculate self-scraping based on globals

* complete HTTP target API

* working example with agent integration

* appease the linter

* don't return an error when context to cancel an integration is closed

* once again i am asking the linter to forgive my typos

* fix bug where labels from individual targets were getting dropped at the API endpoint

* pkg/config: fix broken test

* finish unit tests for integrations v2 controller

* metricsutil/metricshandler_integration: make job name unique

Before this change, the job name would have collided when using multiple
instances of the same integration.

* ensure that global subsystem labels are injected into targets

* integrations/v2: Infer target hostname from SD API host (#1175)

* [dev.multiple-integrations] integrations/v2: allow shimming between v1 and v2 integrations. (#1179)

* integrations/v2: allow shimming between v1 and v2 integrations.

Shimming is done by changing how the integration registration works; a
new RegisterDynamic was added that allows for creating Configs at
runtime. Here be dragons; this should be removed whenever we no longer
have a need for it.

* fix lint

* pkg/integrations/v2: use "RegisterLegacy" instead of a generic mechanism

* fine, I won't add the deprecation notice if it will make the linter sad

* pkg/integrations: re-align (#1181)

This commit reverts 69ba2dd in favor of
allowing the new subsystem to handle multiple instances of integrations.

This commit also removes the wal_truncate_frequency field from
integrations as it is the only field from old integrations that does not
have a current counterpart.

* [dev.multiple-integrations] Hide integrations/v2 behind a feature flag (#1185)

* feature flag wip

* dynamically switch between integrations v1 and v2

default to v1.

* pkg/integrations/versionselector to file in pkg/config

* pkg/config: fix defaults for Integrations

* pkg/config: use more generic way to unmarshal differently based on flag

* add missing godoc comment

* more comments

* switch to deferred unmarshaling

* remove unused Config field

* simplify completeUnmarshal

* do not perform lazy deferred unmarshaling

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

* Merge main into dev.multiple-integrations (#1184)

* Fix typo (#1141)

* Traces: Improved pod association in PromSD processor (#1137)

* Improve k8s pod association

* Add tests

* Changelog

* typo

* Add prom_sd_pod_association

* Extend tests for pod associations

* Docs for pod association config

* Lint fixes

* Move to unreleased

* Add instrumentation recommendations

* Remove uncessary constants

* Improve tests

* remote config with http(s) provider (#1143)

* sample remote config code with http provider

* use t.TempDir() in unit test

* no need to clean up after T.TempDir()

* use NewClientFromConfig and make caller responsible for calling SetDirectory

* handle nil HTTPClientConfig

* remove blank identifier assignment

* pass basic auth command line flags for remote config

* address pr nits

* add expiremental flag

* set loader inline

* update changelog

* add remote config section in docs

* pr comment updates

* announce patch releases for cve-2021-41090 (#1152)

* Merge patch release to main (#1153)

* Add secret type to sensitive values

* Break out config tests to their own implementation. Also remove username has a sensitive value.

* Update changelog

* Fix failing test

* Scrub secrets when marshaling instance configs

* update for v0.21

* Updated changes from the merge.

* Remove changelog

* Scrub out receivers has ***receivers_scrubber***:null

* obscure etcd/consul credentials

* Update pkg/traces/config_test.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Update pkg/config/config.go

* go fmt

* Change to using custom object and return <secret>

* Fix bad merge

* [v0.21.2] toggle config endpoint (#19)

* disable /-/config endpoint by default

* disable scraping api get endpoint as well

* fix new test

* add test and rename flag

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Update version to v0.21.2

* Update defaults.go

* fix /-/config endpoint

* also fix non-pointer config bug

* temporarily disable linting for release

* fix lint errors

Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter (#1162)

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter

A recent change broke the usage of POSTGRES_EXPORTER_DATA_SOURCE_NAME for the postgres_exporter.
As the incorrect variable was checked in the if clause, it always raises an error.

* changelog: keep feature -> enhancement -> bugfix order

* postgres_exporter: add regression test

Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix syntax error in Jsonnet logs helper method (#1174)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* cAdvisor Integration (#1081)

* Add cadvisor module

* Begin creating common config for cadvisor

* Don't export internal state

* Finish config options for cadvisor

* Set config options, and implement cAdvisor collectors

* Linting

* Buildflags for cadvisor only in linux

* I R LEArN Build Tags

* Don't zero value the zero value

* Offload sketchy global var manipulation to the integrations Run func

* Remove unused collectors

* Lint

* Create generic stub integration and use it for cadvisor

* Lint

* Final refactor of cAdvisor config for unsupported platforms. Pared down stub integrations.

* Lint

* Docs for cadvisor config

* Update changelog

* Update pkg/integrations/stub_integration.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Reorder changelog

* Instance key clarity

* Inclusive naming

* Finish name changes

Keep default disable metric list in sync with upstream

Idiomatic golang

* Hardcode disabled metrics for cadvisor

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Remove log-level flag from systemd unit file (#1177)

* Upgrade to OTel v0.40.0 (#1176)

* Upgrade to OTel v0.40.0

* Changelog

* Add factories check

* go mod tidy

* config/features: create package to standardize experimental features (#1170)

* config/features: create package to standardize experiemental features

This commit introduces a new package, pkg/config/features, which allows
defining a set of features and validating whether flags associated with
those features are allowed to be set.

Closes #1163

* update documentation

(also s/enabled-features/enable-features)

* Fix typo

* Update pkg/config/features/features.go

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Revert "Merge main into dev.multiple-integrations (#1184)" (#1189)

This reverts commit ad76ec5.

* [dev.multiple-integrations] Revert breaking changes to existing integrations (#1191)

* revert breaking changes to integrations v1

This commit reverts #1062 in favor of making breaking changes directly
in integrations-next instead. The part of #1181 to remove
`wal_truncate_frequency` has also been reverted.

As part of this change, the enabled field is removed from the v2
common metrics configs, and v2 integrations can no longer be disabled.
v2 integrations can only be disabled by removing them from the YAML.

* integrations/v2: remove stale reference to ErrDisabled

(fix typo too)

* integrations/v2: bring in common config decoupling

* [dev.multiple-integrations] Introduce autoscraper (#1195)

* pkg/integrations/v2: introduce self-scraping

* linting

* [dev.multiple-integrations] Multiple instances of integrations (#1196)

* multiple instances of integrations

opt in relevant v1 integrations into supporting multiple instances

* shims should check for instance key override

* Document integrations-next (#1197)

* document integrations-next

* remove json tags since they make markdown unhappy

* changelog

* s/Run/RunIntegration

* remove stale comment about integrations.controller purpose

* create dedicated run method for instanceScraper

* s/expoter/exporter/g

* Document why an autoscrape.Scraper manages a set of per-instance scrapers

* spell out prerequisite instead of pre-req

* use go.uber.org/atomic to make the code a little easier to follow

* remove started callback for running integration

* use smaller interface for autoscrape

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Fix panic when using 'stdout' in automatic logging (#1233)

* integrations-next: fix bug where v2 integrations were not being strictly unmarshaled (#1235)

* Remove jsonnet vendor folders (#1222)

* remove jsonnet vendor

This adds all vendor folders into .gitignore and removes cached vendor
files from the repository.

Closes #1221

* Update scripts and instructions for jsonnet vendor removal

* `make example-dashboards` will now also run `jb install`
* k3d environment instructions now include `jb install`
* smoke-test.bash will now run `jb install` prior to `tk apply`

* Fix link to k3d example in DEVELOPERS.md (#1242)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix node_exporter upgrade docs (#1239)

* Fix panic in automatic logging with stdout backend (#1243)

* pkg/util: support custom yaml.Unmarshaler implementations for util.UnmarshalYAMLMerged (#1244)

It's common for config types to have implement yaml.Unmarshaler for:

* Applying defaults
* Applying extra logic post-unmarshal

If these config types were unmarshaled through util.UnmarshalYAMLMerged,
the yaml.Unmarshaler implementation would never complete successfully,
preventing the post-unmarshal logic from running.

This issue was introduced in #1192, but went unnoticed until #1228
implemented yaml.Unmarshaler to perform field migrations. #1240 reported
the issue.

This commit fixes the bug by performing a second non-strict unmarshal to
ensure that all input values unmarshal successfully, with the exception
of unmarshal errors unrelated to unrecognized field names.

This is hacky, but it's worthwhile noting that util.UnmarshalYAMLMerged
is a temporary workaround needed for the integrations-next migration,
and will eventually be removed.

* Update k3d example grafana/grafonnet-lib version (#1246)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Create an e2e framework with support for running tests against k8s (#1234)

* e2e: create an e2e framework with support for running tests against a k3d cluster

* add new E2E drone job

* E2E tests should pass when doing a release

* sign drone.yml again

* move e2e lint to different step that has golangci-lint installed

* upgrade golangci-lint and go for e2e test

* e2e: add gcc

* E2E: install build-essential to get a working full gcc env

* :(

* e2e: support running from inside of docker

* fix lint error

* address review feedback

* Operator: fix bug where /-/ready and /-/healthy always returned 404 (#1252)

* operator: fix bug where /-/ready and /-/healthy always returned 404

controller-runtime must have at least one ready/healthy check for the endpoints to exist

* fix lint error, use healthz.Ping

* Make scraping-svc use the new `metrics:` key (#1259)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* update prometheus dependency (#1260)

* corrected typo (#1265)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags (#1264)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use :main tag for images in smoke test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Set IMAGE_BRANCH_TAG env var in drone and actions pipelines

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove quotes from Makefile variable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove force_release action

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* prepare for v0.22.0 release (#1266)

* prepare for v0.22.0 release

* remove E2E pipeline

* Add basic testing framework for operator (#1268)

* remove dedicated go.mod for e2e/

* move e2e/k8s to pkg/util/k8s

* Migrate operator tests to pkg/util/k8s

* remove dedicated e2e tests

* allow skipping TestCluster in pkg/util/k8s

* remove e2e/

* fix bad merge

* fix order of make env args for windows

* actually declare referenced docker volume

* introduce pkg/util/subset for asserting subset of objects

* refactor operator so it's testable

* define basic integration test for operator

* fix lint errors

* fix invalid address in operator test config

* Update release-note.md (#1267)

* Set scrape User-Agent header during init (#1274)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Upgrade to Go 1.17 (#1278)

* Upgrade to 1.17.6 in go.mod and Dockerfiles

* Update CHANGELOG.md to mention the update

* Update Go version in drone/actions pipelines

* Update go.mod, go.sum files via

* Re-sign drone.yml

* Remove leading newline causing drone build to fail

* Bump golangci-lint image to a version using Go 1.17

* Re-attempt to solve linter issue with new golangci-lint image

* Remove suffix of exclude rules

* Clean previous Go version before unpacking Go 1.17

* Also clean up previous Go versions in other steps

* fix typo (#1284)

* Use custom Go version in agent-operator Dockerfile (#1286)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* pkg/operator: refactor resource hierarchy discovery (#1271)

* pkg/operator: refactor resource hierarchy discovery

This commit moves common logic related to discovering the resource
hierarchy to pkg/operator/hierarchy. This new package requires less
boilerplate, which the reconciler is updated to take advantage of.

* remove unused code

* test construction of resource hierarchy

* add missing build constraints

* small extra cleanup to use pointer package

* review feedback

* update agent-build-image for go 1.17 (#1287)

(also use a consistent base image tag instead of latest)

* Skip non-ready entries when listing instances (#1289)

* Skip non-ready instances in LoadInstances()

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix panic in prom_sd_processor when address is empty (#1279)

* Fix panic in prom_sd_processor when address is empty

* Fix panic in prom_sd_processor when address is empty

* Fix docs

* Add test case

* Lint

* Move to unreleased

* Operator: generate proxy_url for remote_write (#1298)

* operator: generate proxy_url for remote_write

* fix weird indentation in test

* Use log format in traces subsystem (#1272)

* Use log format in traces subsystem

* Changelog

* Undo unwanted change

* Fix changelog entry

* integrations-next: Add extra_labels to inject extra labels for an integration (#1312)

* integrations-next: Add extra_labels to inject extra labels for an integration.

* separate tests

* Fix anchor link on operator docs (#1302)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* updated config URL (#1304)

The existing URL returns a 404: https://grafana.com/docs/agent/latest/getting-started/configuration/_index.md 
Updated to https://grafana.com/docs/agent/latest/configuration/

* Fix typo in node_exporter (#1325)

* Allow remote_write URL credentials (#1329)

* Bypass Prometheus password redaction

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add inline secret in existing test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

* Add to scrubbed testcase as well

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Stop appending duplicate exemplars (#1316)

* Add memExemplar in stripeSeries as first iteration

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add test for skipped duplicate exemplars; Simplify conditional

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry; discard test errors

* Move changelog entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add Benchmark for AppendExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Discard error on added benchmark

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use original exemplar struct instead of custom memExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Surround benchmark loop with start/stop timers and close test storage

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add comment about prepopulating exemplars on WAL startup

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in the totalAppendedExemplars metric

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make comment more discoverable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make sure we're recording exemplars for non-nil series ref only

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* integrations-next: wait for integrations to exit after stopping them (#1318)

* integrations-next: wait for integrations to exit after stopping them

* fix lint errors

* minor refactor

* integrations-next: stop holding config mutex for entire reload

* make controller.run authoritative over running integrations

* fix log line

* move running integrations into a dedicated worker pool

* operator/hierarchy: stop using field selector when listing Secrets & ConfigMaps (#1340)

The initial implementation of hierarchy.KeySelector injected a
FieldSelector when listing Secrets and ConfigMaps to immediately return
the single object being queried for.

This causes a problem with the client generated by the
controller-runtime framework, where the client is wrapped in a cache and
field indexer (where only the namespace is indexed by default).

This commit avoids using the field selector and the index lookup. The
resulting behavior aligns more closely with discovering other resources
in the hierarchy (i.e., ServiceMonitors), where the List call is also
insufficient and needs post-processing via Matches to find the final
list of resources.

Given the controller-runtime client uses an informer for reads, all
relevant Secrets and ConfigMaps are already in-memory anyway, and using
the index for a faster List is a bit of an over-optimization at the
moment.

* Add dependabot to update go modules and github actions. (#1217)

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* smoke framework refactor (#1326)

* Agent smoke test (#1291)

* convert smoke script to go program

* update build for agent-smoke

* fix pr comments

* use existing log helper package

* refactor context cancel

* update exit codes

* use ticker

* prefer oklog/run instead of errgroup

* use nop logger

* refactor task interface

* remove functional options

* log.With for task loggers

* move smoke to tools

* build smoke image, push to internal registry

* move crow to tools

* add gcr_admin secret

* fix link to crow

* add smoke libsonnet and use in local k3d smoke test

* add deletePodBySelectorTask

* scale smoke-test replica down after local test

* refactor smoke Options to Config

* update duration usage message

* add some basic unit tests

* newlines

* pass mutation frequency and chaos frequency from smoke script

* pull crow image from gcr

* update smoke script

* move monitoring to smoke libsonnet

* move additional smoke resources needed in deployment tools

* reference libsonnet files from grafana-agent dep

* make drone

* fix images in smoke script

* get rid of extVars

* update k3d example environment to reference etcd from new location

* update smoke docker builds to use go1.17

* use pointer.Int64

* refactor smoke jsonnet (#1296)

* add policy rule for list and delete pods (#1319)

* refactor smoke.new function to take config object (#1327)

* Apply suggestions from code review

* Update production/tanka/grafana-agent/smoke/crow/main.libsonnet

* Update production/tanka/grafana-agent/smoke/main.libsonnet

* Update example/k3d/scripts/smoke-test.bash

Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* readme update (#1338)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Correct link to the configuration (#1036)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Add stale check Github Action (#1345)

* Add a stale check GH action to run every 24 hours

* remove old stale.yml file

* add permissions to action

* update the stale message to clarify when the stale label will get
removed

* Update .github/workflows/stale.yml

* stale action: fix missing indent (#1346)

* Fix mssql issue (#1351)

* Add K8s Events integration (#1330)

* Add K8s eventhandler integration (#1310)
* Add docs and sample manifests to eventhandler integration (#1328)
* Wait for cache to flush before returning
* Clarify eventhandler docs (#1334)
* Clarify docs
* Update CHANGELOG.md
* Review changes (#1349)

* stale action: fix typo in label exemptions (#1347)

* update withVolumesMixin for agent jsonnet (#1358)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Configure cluster label using logs client external_labels param (#1357)

* Configure cluster label using logs client external_labels param
* Update CHANGELOG.md

* add password file and basic auth round tripper in crow (#1361)

* add password file and basic auth round tripper in crow

* add ca-certificates in crow image

* add orgID flag

* update help text

* default send_exemplars to true in remote_write (#1352)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update eventhandler labels (#1368)

* Update eventhandler integration labels
* Update CHANGELOG
* Remove unnecessary kind label

* update changelog (#1374)

Remove BUGFIX entries that fix a bug introduced by main (i.e., bugs
which were never part of a release)

* Prepare for release of v0.23.0 (#1377)

* Update version references

* Fix fat-fingered delete; Remove mention of upgrade Go

* RFC: Design in the open (#1055)

* rfc: first draft of RFC0001

* add placeholder for PR

* update PR link

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify "designing in the open" is best-effort

* update 0001

* fix dead link in production/README.md

* add recommended sections for RFC proposals

* describe the process for approving a proposal

* ignore RFC template in link checker

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* do my nitty 80-char line length limit change

* indent pros/cons to a single section

* document process for superseding RFCs

* remove RFC mutability requirement

* add extra flavor around not recommending google docs

* require Google Doc -> RFC conversion

* move new files

Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Add Grafana Labs SECURITY.md (#1356)

Signed-off-by: Richard Hartmann <richih@richih.org>

* Add readiness check to metrics component (#1369)

* PR Base

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix autoscrape's mockInstance

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in atomic readiness check

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Add CHANGELOG.md entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Reference page to download windows installer (#1372)

Fixes #1366

* fix typo in node_exporter_config (#1389)

which should be `privileged` instead of `priviliged`

* Add option for Operator to pass arguments to GrafanaAgent #1227 (#1248)

* 1250 oauth2 tracing (#1386)

* Add oauth support for trace Otel trace exporter via opentelemetry-collector-contrib oauth2clientauthextension

* start extensions on collector instance startup

fix decoding to otelconfig

build extensions

add oauth extension to service map

* Update traces config documentation

* lint fixes

* fix godoc comments

* pass exporter index directly to exporter name generator

* PR feedback; Update Changelog

* sort extensions when sorting pipelines for testing determinism

* README: Fix link to agent logo (#1396)

* update MAINTAINERS.md (#1402)

* add smoke alerts to mixin; move local alerts into examples dir (#1397)

* add smoke alerts to mixin; move local alerts into examples dir

* add podPrefix for smoke test

* podPrefix in libsonnet config

* [RFC] Integrations in Grafana Agent Operator (#1224)

* rfc: integrations in grafana agent operator

Supersedes #883

* add missing links

* Apply suggestions from code review

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify how many daemonsets/deployments/service/secrets are created

* add example of defining secrets

* try defining integrations

* s/IntegrationsMonitor/IntegrationMonitor/g

* simplify proposal

* add alternatives

* remove old reference to `hasMetrics` field

* document example generated agent configuration file

* assign ID RFC-0002

* add missing PR link

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* add fake rw endpoint to smoke program (#1405)

* fix alerts typo (#1407)

* continuous delivery for smoke images (#1408)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* fix continuous delivery job errors (#1409)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* [operator] - Use _file variants for basic auth credentials. (#1411)

* use password_file alternatives in operator config

* update tests

* reduce smoke alert noise (#1412)

* reduce smoke alert noise

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update production/grafana-agent-mixin/alerts.libsonnet

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* update cpu check comment

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* add minimum load threshold to cpu alert

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Clarify usage of instanceNamespaceSelector (#1413)

* RFC-0001: Add status to RFC (#1391)

* rfc-0001: add rules for when RFC PRs should be merged

* use status field instead of merge to indicating state

* Parametrize logs DaemonSet K8s manifests (#1420)

* Parametrize logs daemonset K8s manifests
* Update CHANGELOG.md

* Extend linting configuration file (#1421)

* Add depguard linter to reject packages we tend to avoid
* Replace golint with revive, since golint is deprecated
* Remove interfacer, which is deprecated with no replacement
* Add makezero linter to detect misuse of make with append
* Add tenv to prefer t.Setenv over os.Setenv in tests
* Add whitespace to report unnecessary blank lines
* Ignore test files for errcheck

In addition to the above, the following changes were made:

* Remove settings that just re-set default values, instead pointing to the website to retrieve defaults.
* Simplify the errcheck rule to only include functions we actually need to ignore.

* Merging again!

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>
Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>
Co-authored-by: DataPoints <langer.markus@gmail.com>
Co-authored-by: Alex <52292902+alexrudd2@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: melGL <81323402+melgl@users.noreply.github.com>
Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com>
Co-authored-by: Joseph Woodward <josephwoodward@xeuse.com>
Co-authored-by: hanif <hjet@users.noreply.github.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>
Co-authored-by: laiwei <laiwei.ustc@gmail.com>
Co-authored-by: Sam <shamsalmon@users.noreply.github.com>
Co-authored-by: Chris Knutson <christopher.knutson@gmail.com>
Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Craig Peterson <192540+captncraig@users.noreply.github.com>
mattdurham added a commit that referenced this pull request Feb 25, 2022
* Update node_exporter dependency to v1.3.1 (#1228)

* Add node_exporter to depcheck

* update weaveworks/common dependency

* map current release flags and changed defaults

* documentation

* revert accidental checkin

* print out flags when node_exporter test fails to assist debugging

* oops, i introduced some flags from master by mistake

* Introduce experimental integrations revamp (#1198)

* [dev.multiple-integrations] Enable present integrations by default, deprecate enabled field (#1062)

* integrations: default to enabled by default

* document deprecation of enabled

* pkg/integrations: support *_configs field for integrations (#1130)

Creates the basic code to unmarshal integrations from a YAML field
called <integration name>_configs, which is a slice of that integration.

Note that this is NOT wired up to the integrations manager yet, and
trying to run the agent with more than one integration of the same type
will likely cause problems.

* [dev.multiple-integrations] Prototype new integrations subsystem (#1142)

* wip: prototype new integrations subsystem

* implement Controller with basic logic for Integration and UpdateIntegration

* Implement HTTPIntegration for Controller

* decouple controller and subsystem

* don't have controller implement integration

slightly less smelly now

* multiplexer integration

* rely on boilerplate for multiplexing for now

generics would be nice here

* remove multiplex_integration.go

Also a little code smelly. Instead of having integrations that run other
integrations, I'm going to fall back to having only one controller.

* introduce Subsystem, unexport Controller

start wiring up things to Subsystem

* introduce v2 agent integration to use for testing

* start wiring metrics integrations

* rename Options to Globals

call a spade a spade

* add subsystem options to globals

* remove dead code

* metricsutils: calculate self-scraping based on globals

* complete HTTP target API

* working example with agent integration

* appease the linter

* don't return an error when context to cancel an integration is closed

* once again i am asking the linter to forgive my typos

* fix bug where labels from individual targets were getting dropped at the API endpoint

* pkg/config: fix broken test

* finish unit tests for integrations v2 controller

* metricsutil/metricshandler_integration: make job name unique

Before this change, the job name would have collided when using multiple
instances of the same integration.

* ensure that global subsystem labels are injected into targets

* integrations/v2: Infer target hostname from SD API host (#1175)

* [dev.multiple-integrations] integrations/v2: allow shimming between v1 and v2 integrations. (#1179)

* integrations/v2: allow shimming between v1 and v2 integrations.

Shimming is done by changing how the integration registration works; a
new RegisterDynamic was added that allows for creating Configs at
runtime. Here be dragons; this should be removed whenever we no longer
have a need for it.

* fix lint

* pkg/integrations/v2: use "RegisterLegacy" instead of a generic mechanism

* fine, I won't add the deprecation notice if it will make the linter sad

* pkg/integrations: re-align (#1181)

This commit reverts 69ba2dd in favor of
allowing the new subsystem to handle multiple instances of integrations.

This commit also removes the wal_truncate_frequency field from
integrations as it is the only field from old integrations that does not
have a current counterpart.

* [dev.multiple-integrations] Hide integrations/v2 behind a feature flag (#1185)

* feature flag wip

* dynamically switch between integrations v1 and v2

default to v1.

* pkg/integrations/versionselector to file in pkg/config

* pkg/config: fix defaults for Integrations

* pkg/config: use more generic way to unmarshal differently based on flag

* add missing godoc comment

* more comments

* switch to deferred unmarshaling

* remove unused Config field

* simplify completeUnmarshal

* do not perform lazy deferred unmarshaling

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

* Merge main into dev.multiple-integrations (#1184)

* Fix typo (#1141)

* Traces: Improved pod association in PromSD processor (#1137)

* Improve k8s pod association

* Add tests

* Changelog

* typo

* Add prom_sd_pod_association

* Extend tests for pod associations

* Docs for pod association config

* Lint fixes

* Move to unreleased

* Add instrumentation recommendations

* Remove uncessary constants

* Improve tests

* remote config with http(s) provider (#1143)

* sample remote config code with http provider

* use t.TempDir() in unit test

* no need to clean up after T.TempDir()

* use NewClientFromConfig and make caller responsible for calling SetDirectory

* handle nil HTTPClientConfig

* remove blank identifier assignment

* pass basic auth command line flags for remote config

* address pr nits

* add expiremental flag

* set loader inline

* update changelog

* add remote config section in docs

* pr comment updates

* announce patch releases for cve-2021-41090 (#1152)

* Merge patch release to main (#1153)

* Add secret type to sensitive values

* Break out config tests to their own implementation. Also remove username has a sensitive value.

* Update changelog

* Fix failing test

* Scrub secrets when marshaling instance configs

* update for v0.21

* Updated changes from the merge.

* Remove changelog

* Scrub out receivers has ***receivers_scrubber***:null

* obscure etcd/consul credentials

* Update pkg/traces/config_test.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Update pkg/config/config.go

* go fmt

* Change to using custom object and return <secret>

* Fix bad merge

* [v0.21.2] toggle config endpoint (#19)

* disable /-/config endpoint by default

* disable scraping api get endpoint as well

* fix new test

* add test and rename flag

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Update version to v0.21.2

* Update defaults.go

* fix /-/config endpoint

* also fix non-pointer config bug

* temporarily disable linting for release

* fix lint errors

Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter (#1162)

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter

A recent change broke the usage of POSTGRES_EXPORTER_DATA_SOURCE_NAME for the postgres_exporter.
As the incorrect variable was checked in the if clause, it always raises an error.

* changelog: keep feature -> enhancement -> bugfix order

* postgres_exporter: add regression test

Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix syntax error in Jsonnet logs helper method (#1174)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* cAdvisor Integration (#1081)

* Add cadvisor module

* Begin creating common config for cadvisor

* Don't export internal state

* Finish config options for cadvisor

* Set config options, and implement cAdvisor collectors

* Linting

* Buildflags for cadvisor only in linux

* I R LEArN Build Tags

* Don't zero value the zero value

* Offload sketchy global var manipulation to the integrations Run func

* Remove unused collectors

* Lint

* Create generic stub integration and use it for cadvisor

* Lint

* Final refactor of cAdvisor config for unsupported platforms. Pared down stub integrations.

* Lint

* Docs for cadvisor config

* Update changelog

* Update pkg/integrations/stub_integration.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Reorder changelog

* Instance key clarity

* Inclusive naming

* Finish name changes

Keep default disable metric list in sync with upstream

Idiomatic golang

* Hardcode disabled metrics for cadvisor

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Remove log-level flag from systemd unit file (#1177)

* Upgrade to OTel v0.40.0 (#1176)

* Upgrade to OTel v0.40.0

* Changelog

* Add factories check

* go mod tidy

* config/features: create package to standardize experimental features (#1170)

* config/features: create package to standardize experiemental features

This commit introduces a new package, pkg/config/features, which allows
defining a set of features and validating whether flags associated with
those features are allowed to be set.

Closes #1163

* update documentation

(also s/enabled-features/enable-features)

* Fix typo

* Update pkg/config/features/features.go

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Revert "Merge main into dev.multiple-integrations (#1184)" (#1189)

This reverts commit ad76ec5.

* [dev.multiple-integrations] Revert breaking changes to existing integrations (#1191)

* revert breaking changes to integrations v1

This commit reverts #1062 in favor of making breaking changes directly
in integrations-next instead. The part of #1181 to remove
`wal_truncate_frequency` has also been reverted.

As part of this change, the enabled field is removed from the v2
common metrics configs, and v2 integrations can no longer be disabled.
v2 integrations can only be disabled by removing them from the YAML.

* integrations/v2: remove stale reference to ErrDisabled

(fix typo too)

* integrations/v2: bring in common config decoupling

* [dev.multiple-integrations] Introduce autoscraper (#1195)

* pkg/integrations/v2: introduce self-scraping

* linting

* [dev.multiple-integrations] Multiple instances of integrations (#1196)

* multiple instances of integrations

opt in relevant v1 integrations into supporting multiple instances

* shims should check for instance key override

* Document integrations-next (#1197)

* document integrations-next

* remove json tags since they make markdown unhappy

* changelog

* s/Run/RunIntegration

* remove stale comment about integrations.controller purpose

* create dedicated run method for instanceScraper

* s/expoter/exporter/g

* Document why an autoscrape.Scraper manages a set of per-instance scrapers

* spell out prerequisite instead of pre-req

* use go.uber.org/atomic to make the code a little easier to follow

* remove started callback for running integration

* use smaller interface for autoscrape

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Fix panic when using 'stdout' in automatic logging (#1233)

* integrations-next: fix bug where v2 integrations were not being strictly unmarshaled (#1235)

* Remove jsonnet vendor folders (#1222)

* remove jsonnet vendor

This adds all vendor folders into .gitignore and removes cached vendor
files from the repository.

Closes #1221

* Update scripts and instructions for jsonnet vendor removal

* `make example-dashboards` will now also run `jb install`
* k3d environment instructions now include `jb install`
* smoke-test.bash will now run `jb install` prior to `tk apply`

* Fix link to k3d example in DEVELOPERS.md (#1242)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix node_exporter upgrade docs (#1239)

* Fix panic in automatic logging with stdout backend (#1243)

* pkg/util: support custom yaml.Unmarshaler implementations for util.UnmarshalYAMLMerged (#1244)

It's common for config types to have implement yaml.Unmarshaler for:

* Applying defaults
* Applying extra logic post-unmarshal

If these config types were unmarshaled through util.UnmarshalYAMLMerged,
the yaml.Unmarshaler implementation would never complete successfully,
preventing the post-unmarshal logic from running.

This issue was introduced in #1192, but went unnoticed until #1228
implemented yaml.Unmarshaler to perform field migrations. #1240 reported
the issue.

This commit fixes the bug by performing a second non-strict unmarshal to
ensure that all input values unmarshal successfully, with the exception
of unmarshal errors unrelated to unrecognized field names.

This is hacky, but it's worthwhile noting that util.UnmarshalYAMLMerged
is a temporary workaround needed for the integrations-next migration,
and will eventually be removed.

* Update k3d example grafana/grafonnet-lib version (#1246)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Create an e2e framework with support for running tests against k8s (#1234)

* e2e: create an e2e framework with support for running tests against a k3d cluster

* add new E2E drone job

* E2E tests should pass when doing a release

* sign drone.yml again

* move e2e lint to different step that has golangci-lint installed

* upgrade golangci-lint and go for e2e test

* e2e: add gcc

* E2E: install build-essential to get a working full gcc env

* :(

* e2e: support running from inside of docker

* fix lint error

* address review feedback

* Operator: fix bug where /-/ready and /-/healthy always returned 404 (#1252)

* operator: fix bug where /-/ready and /-/healthy always returned 404

controller-runtime must have at least one ready/healthy check for the endpoints to exist

* fix lint error, use healthz.Ping

* Make scraping-svc use the new `metrics:` key (#1259)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* update prometheus dependency (#1260)

* corrected typo (#1265)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags (#1264)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use :main tag for images in smoke test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Set IMAGE_BRANCH_TAG env var in drone and actions pipelines

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove quotes from Makefile variable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove force_release action

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* prepare for v0.22.0 release (#1266)

* prepare for v0.22.0 release

* remove E2E pipeline

* Add basic testing framework for operator (#1268)

* remove dedicated go.mod for e2e/

* move e2e/k8s to pkg/util/k8s

* Migrate operator tests to pkg/util/k8s

* remove dedicated e2e tests

* allow skipping TestCluster in pkg/util/k8s

* remove e2e/

* fix bad merge

* fix order of make env args for windows

* actually declare referenced docker volume

* introduce pkg/util/subset for asserting subset of objects

* refactor operator so it's testable

* define basic integration test for operator

* fix lint errors

* fix invalid address in operator test config

* Update release-note.md (#1267)

* Set scrape User-Agent header during init (#1274)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Upgrade to Go 1.17 (#1278)

* Upgrade to 1.17.6 in go.mod and Dockerfiles

* Update CHANGELOG.md to mention the update

* Update Go version in drone/actions pipelines

* Update go.mod, go.sum files via

* Re-sign drone.yml

* Remove leading newline causing drone build to fail

* Bump golangci-lint image to a version using Go 1.17

* Re-attempt to solve linter issue with new golangci-lint image

* Remove suffix of exclude rules

* Clean previous Go version before unpacking Go 1.17

* Also clean up previous Go versions in other steps

* fix typo (#1284)

* Use custom Go version in agent-operator Dockerfile (#1286)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* pkg/operator: refactor resource hierarchy discovery (#1271)

* pkg/operator: refactor resource hierarchy discovery

This commit moves common logic related to discovering the resource
hierarchy to pkg/operator/hierarchy. This new package requires less
boilerplate, which the reconciler is updated to take advantage of.

* remove unused code

* test construction of resource hierarchy

* add missing build constraints

* small extra cleanup to use pointer package

* review feedback

* update agent-build-image for go 1.17 (#1287)

(also use a consistent base image tag instead of latest)

* Skip non-ready entries when listing instances (#1289)

* Skip non-ready instances in LoadInstances()

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix panic in prom_sd_processor when address is empty (#1279)

* Fix panic in prom_sd_processor when address is empty

* Fix panic in prom_sd_processor when address is empty

* Fix docs

* Add test case

* Lint

* Move to unreleased

* Operator: generate proxy_url for remote_write (#1298)

* operator: generate proxy_url for remote_write

* fix weird indentation in test

* Use log format in traces subsystem (#1272)

* Use log format in traces subsystem

* Changelog

* Undo unwanted change

* Fix changelog entry

* integrations-next: Add extra_labels to inject extra labels for an integration (#1312)

* integrations-next: Add extra_labels to inject extra labels for an integration.

* separate tests

* Fix anchor link on operator docs (#1302)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* updated config URL (#1304)

The existing URL returns a 404: https://grafana.com/docs/agent/latest/getting-started/configuration/_index.md 
Updated to https://grafana.com/docs/agent/latest/configuration/

* Fix typo in node_exporter (#1325)

* Allow remote_write URL credentials (#1329)

* Bypass Prometheus password redaction

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add inline secret in existing test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

* Add to scrubbed testcase as well

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Stop appending duplicate exemplars (#1316)

* Add memExemplar in stripeSeries as first iteration

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add test for skipped duplicate exemplars; Simplify conditional

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry; discard test errors

* Move changelog entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add Benchmark for AppendExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Discard error on added benchmark

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use original exemplar struct instead of custom memExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Surround benchmark loop with start/stop timers and close test storage

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add comment about prepopulating exemplars on WAL startup

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in the totalAppendedExemplars metric

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make comment more discoverable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make sure we're recording exemplars for non-nil series ref only

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* integrations-next: wait for integrations to exit after stopping them (#1318)

* integrations-next: wait for integrations to exit after stopping them

* fix lint errors

* minor refactor

* integrations-next: stop holding config mutex for entire reload

* make controller.run authoritative over running integrations

* fix log line

* move running integrations into a dedicated worker pool

* operator/hierarchy: stop using field selector when listing Secrets & ConfigMaps (#1340)

The initial implementation of hierarchy.KeySelector injected a
FieldSelector when listing Secrets and ConfigMaps to immediately return
the single object being queried for.

This causes a problem with the client generated by the
controller-runtime framework, where the client is wrapped in a cache and
field indexer (where only the namespace is indexed by default).

This commit avoids using the field selector and the index lookup. The
resulting behavior aligns more closely with discovering other resources
in the hierarchy (i.e., ServiceMonitors), where the List call is also
insufficient and needs post-processing via Matches to find the final
list of resources.

Given the controller-runtime client uses an informer for reads, all
relevant Secrets and ConfigMaps are already in-memory anyway, and using
the index for a faster List is a bit of an over-optimization at the
moment.

* Add dependabot to update go modules and github actions. (#1217)

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* smoke framework refactor (#1326)

* Agent smoke test (#1291)

* convert smoke script to go program

* update build for agent-smoke

* fix pr comments

* use existing log helper package

* refactor context cancel

* update exit codes

* use ticker

* prefer oklog/run instead of errgroup

* use nop logger

* refactor task interface

* remove functional options

* log.With for task loggers

* move smoke to tools

* build smoke image, push to internal registry

* move crow to tools

* add gcr_admin secret

* fix link to crow

* add smoke libsonnet and use in local k3d smoke test

* add deletePodBySelectorTask

* scale smoke-test replica down after local test

* refactor smoke Options to Config

* update duration usage message

* add some basic unit tests

* newlines

* pass mutation frequency and chaos frequency from smoke script

* pull crow image from gcr

* update smoke script

* move monitoring to smoke libsonnet

* move additional smoke resources needed in deployment tools

* reference libsonnet files from grafana-agent dep

* make drone

* fix images in smoke script

* get rid of extVars

* update k3d example environment to reference etcd from new location

* update smoke docker builds to use go1.17

* use pointer.Int64

* refactor smoke jsonnet (#1296)

* add policy rule for list and delete pods (#1319)

* refactor smoke.new function to take config object (#1327)

* Apply suggestions from code review

* Update production/tanka/grafana-agent/smoke/crow/main.libsonnet

* Update production/tanka/grafana-agent/smoke/main.libsonnet

* Update example/k3d/scripts/smoke-test.bash

Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* readme update (#1338)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Correct link to the configuration (#1036)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Add stale check Github Action (#1345)

* Add a stale check GH action to run every 24 hours

* remove old stale.yml file

* add permissions to action

* update the stale message to clarify when the stale label will get
removed

* Update .github/workflows/stale.yml

* stale action: fix missing indent (#1346)

* Fix mssql issue (#1351)

* Add K8s Events integration (#1330)

* Add K8s eventhandler integration (#1310)
* Add docs and sample manifests to eventhandler integration (#1328)
* Wait for cache to flush before returning
* Clarify eventhandler docs (#1334)
* Clarify docs
* Update CHANGELOG.md
* Review changes (#1349)

* stale action: fix typo in label exemptions (#1347)

* update withVolumesMixin for agent jsonnet (#1358)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Configure cluster label using logs client external_labels param (#1357)

* Configure cluster label using logs client external_labels param
* Update CHANGELOG.md

* add password file and basic auth round tripper in crow (#1361)

* add password file and basic auth round tripper in crow

* add ca-certificates in crow image

* add orgID flag

* update help text

* default send_exemplars to true in remote_write (#1352)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update eventhandler labels (#1368)

* Update eventhandler integration labels
* Update CHANGELOG
* Remove unnecessary kind label

* update changelog (#1374)

Remove BUGFIX entries that fix a bug introduced by main (i.e., bugs
which were never part of a release)

* Prepare for release of v0.23.0 (#1377)

* Update version references

* Fix fat-fingered delete; Remove mention of upgrade Go

* RFC: Design in the open (#1055)

* rfc: first draft of RFC0001

* add placeholder for PR

* update PR link

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify "designing in the open" is best-effort

* update 0001

* fix dead link in production/README.md

* add recommended sections for RFC proposals

* describe the process for approving a proposal

* ignore RFC template in link checker

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* do my nitty 80-char line length limit change

* indent pros/cons to a single section

* document process for superseding RFCs

* remove RFC mutability requirement

* add extra flavor around not recommending google docs

* require Google Doc -> RFC conversion

* move new files

Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Add Grafana Labs SECURITY.md (#1356)

Signed-off-by: Richard Hartmann <richih@richih.org>

* Add readiness check to metrics component (#1369)

* PR Base

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix autoscrape's mockInstance

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in atomic readiness check

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Add CHANGELOG.md entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Reference page to download windows installer (#1372)

Fixes #1366

* fix typo in node_exporter_config (#1389)

which should be `privileged` instead of `priviliged`

* Add option for Operator to pass arguments to GrafanaAgent #1227 (#1248)

* 1250 oauth2 tracing (#1386)

* Add oauth support for trace Otel trace exporter via opentelemetry-collector-contrib oauth2clientauthextension

* start extensions on collector instance startup

fix decoding to otelconfig

build extensions

add oauth extension to service map

* Update traces config documentation

* lint fixes

* fix godoc comments

* pass exporter index directly to exporter name generator

* PR feedback; Update Changelog

* sort extensions when sorting pipelines for testing determinism

* README: Fix link to agent logo (#1396)

* update MAINTAINERS.md (#1402)

* add smoke alerts to mixin; move local alerts into examples dir (#1397)

* add smoke alerts to mixin; move local alerts into examples dir

* add podPrefix for smoke test

* podPrefix in libsonnet config

* [RFC] Integrations in Grafana Agent Operator (#1224)

* rfc: integrations in grafana agent operator

Supersedes #883

* add missing links

* Apply suggestions from code review

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify how many daemonsets/deployments/service/secrets are created

* add example of defining secrets

* try defining integrations

* s/IntegrationsMonitor/IntegrationMonitor/g

* simplify proposal

* add alternatives

* remove old reference to `hasMetrics` field

* document example generated agent configuration file

* assign ID RFC-0002

* add missing PR link

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* add fake rw endpoint to smoke program (#1405)

* fix alerts typo (#1407)

* continuous delivery for smoke images (#1408)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* fix continuous delivery job errors (#1409)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* [operator] - Use _file variants for basic auth credentials. (#1411)

* use password_file alternatives in operator config

* update tests

* reduce smoke alert noise (#1412)

* reduce smoke alert noise

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update production/grafana-agent-mixin/alerts.libsonnet

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* update cpu check comment

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* add minimum load threshold to cpu alert

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Clarify usage of instanceNamespaceSelector (#1413)

* RFC-0001: Add status to RFC (#1391)

* rfc-0001: add rules for when RFC PRs should be merged

* use status field instead of merge to indicating state

* Parametrize logs DaemonSet K8s manifests (#1420)

* Parametrize logs daemonset K8s manifests
* Update CHANGELOG.md

* Extend linting configuration file (#1421)

* Add depguard linter to reject packages we tend to avoid
* Replace golint with revive, since golint is deprecated
* Remove interfacer, which is deprecated with no replacement
* Add makezero linter to detect misuse of make with append
* Add tenv to prefer t.Setenv over os.Setenv in tests
* Add whitespace to report unnecessary blank lines
* Ignore test files for errcheck

In addition to the above, the following changes were made:

* Remove settings that just re-set default values, instead pointing to the website to retrieve defaults.
* Simplify the errcheck rule to only include functions we actually need to ignore.

* Merging again!

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>
Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>
Co-authored-by: DataPoints <langer.markus@gmail.com>
Co-authored-by: Alex <52292902+alexrudd2@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: melGL <81323402+melgl@users.noreply.github.com>
Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com>
Co-authored-by: Joseph Woodward <josephwoodward@xeuse.com>
Co-authored-by: hanif <hjet@users.noreply.github.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>
Co-authored-by: laiwei <laiwei.ustc@gmail.com>
Co-authored-by: Sam <shamsalmon@users.noreply.github.com>
Co-authored-by: Chris Knutson <christopher.knutson@gmail.com>
Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Craig Peterson <192540+captncraig@users.noreply.github.com>
mattdurham added a commit that referenced this pull request Feb 25, 2022
* Update node_exporter dependency to v1.3.1 (#1228)

* Add node_exporter to depcheck

* update weaveworks/common dependency

* map current release flags and changed defaults

* documentation

* revert accidental checkin

* print out flags when node_exporter test fails to assist debugging

* oops, i introduced some flags from master by mistake

* Introduce experimental integrations revamp (#1198)

* [dev.multiple-integrations] Enable present integrations by default, deprecate enabled field (#1062)

* integrations: default to enabled by default

* document deprecation of enabled

* pkg/integrations: support *_configs field for integrations (#1130)

Creates the basic code to unmarshal integrations from a YAML field
called <integration name>_configs, which is a slice of that integration.

Note that this is NOT wired up to the integrations manager yet, and
trying to run the agent with more than one integration of the same type
will likely cause problems.

* [dev.multiple-integrations] Prototype new integrations subsystem (#1142)

* wip: prototype new integrations subsystem

* implement Controller with basic logic for Integration and UpdateIntegration

* Implement HTTPIntegration for Controller

* decouple controller and subsystem

* don't have controller implement integration

slightly less smelly now

* multiplexer integration

* rely on boilerplate for multiplexing for now

generics would be nice here

* remove multiplex_integration.go

Also a little code smelly. Instead of having integrations that run other
integrations, I'm going to fall back to having only one controller.

* introduce Subsystem, unexport Controller

start wiring up things to Subsystem

* introduce v2 agent integration to use for testing

* start wiring metrics integrations

* rename Options to Globals

call a spade a spade

* add subsystem options to globals

* remove dead code

* metricsutils: calculate self-scraping based on globals

* complete HTTP target API

* working example with agent integration

* appease the linter

* don't return an error when context to cancel an integration is closed

* once again i am asking the linter to forgive my typos

* fix bug where labels from individual targets were getting dropped at the API endpoint

* pkg/config: fix broken test

* finish unit tests for integrations v2 controller

* metricsutil/metricshandler_integration: make job name unique

Before this change, the job name would have collided when using multiple
instances of the same integration.

* ensure that global subsystem labels are injected into targets

* integrations/v2: Infer target hostname from SD API host (#1175)

* [dev.multiple-integrations] integrations/v2: allow shimming between v1 and v2 integrations. (#1179)

* integrations/v2: allow shimming between v1 and v2 integrations.

Shimming is done by changing how the integration registration works; a
new RegisterDynamic was added that allows for creating Configs at
runtime. Here be dragons; this should be removed whenever we no longer
have a need for it.

* fix lint

* pkg/integrations/v2: use "RegisterLegacy" instead of a generic mechanism

* fine, I won't add the deprecation notice if it will make the linter sad

* pkg/integrations: re-align (#1181)

This commit reverts 69ba2dd in favor of
allowing the new subsystem to handle multiple instances of integrations.

This commit also removes the wal_truncate_frequency field from
integrations as it is the only field from old integrations that does not
have a current counterpart.

* [dev.multiple-integrations] Hide integrations/v2 behind a feature flag (#1185)

* feature flag wip

* dynamically switch between integrations v1 and v2

default to v1.

* pkg/integrations/versionselector to file in pkg/config

* pkg/config: fix defaults for Integrations

* pkg/config: use more generic way to unmarshal differently based on flag

* add missing godoc comment

* more comments

* switch to deferred unmarshaling

* remove unused Config field

* simplify completeUnmarshal

* do not perform lazy deferred unmarshaling

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

* Merge main into dev.multiple-integrations (#1184)

* Fix typo (#1141)

* Traces: Improved pod association in PromSD processor (#1137)

* Improve k8s pod association

* Add tests

* Changelog

* typo

* Add prom_sd_pod_association

* Extend tests for pod associations

* Docs for pod association config

* Lint fixes

* Move to unreleased

* Add instrumentation recommendations

* Remove uncessary constants

* Improve tests

* remote config with http(s) provider (#1143)

* sample remote config code with http provider

* use t.TempDir() in unit test

* no need to clean up after T.TempDir()

* use NewClientFromConfig and make caller responsible for calling SetDirectory

* handle nil HTTPClientConfig

* remove blank identifier assignment

* pass basic auth command line flags for remote config

* address pr nits

* add expiremental flag

* set loader inline

* update changelog

* add remote config section in docs

* pr comment updates

* announce patch releases for cve-2021-41090 (#1152)

* Merge patch release to main (#1153)

* Add secret type to sensitive values

* Break out config tests to their own implementation. Also remove username has a sensitive value.

* Update changelog

* Fix failing test

* Scrub secrets when marshaling instance configs

* update for v0.21

* Updated changes from the merge.

* Remove changelog

* Scrub out receivers has ***receivers_scrubber***:null

* obscure etcd/consul credentials

* Update pkg/traces/config_test.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Update pkg/config/config.go

* go fmt

* Change to using custom object and return <secret>

* Fix bad merge

* [v0.21.2] toggle config endpoint (#19)

* disable /-/config endpoint by default

* disable scraping api get endpoint as well

* fix new test

* add test and rename flag

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Update version to v0.21.2

* Update defaults.go

* fix /-/config endpoint

* also fix non-pointer config bug

* temporarily disable linting for release

* fix lint errors

Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter (#1162)

* Fix POSTGRES_EXPORTER_DATA_SOURCE_NAME usage for postgres_exporter

A recent change broke the usage of POSTGRES_EXPORTER_DATA_SOURCE_NAME for the postgres_exporter.
As the incorrect variable was checked in the if clause, it always raises an error.

* changelog: keep feature -> enhancement -> bugfix order

* postgres_exporter: add regression test

Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix syntax error in Jsonnet logs helper method (#1174)

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* cAdvisor Integration (#1081)

* Add cadvisor module

* Begin creating common config for cadvisor

* Don't export internal state

* Finish config options for cadvisor

* Set config options, and implement cAdvisor collectors

* Linting

* Buildflags for cadvisor only in linux

* I R LEArN Build Tags

* Don't zero value the zero value

* Offload sketchy global var manipulation to the integrations Run func

* Remove unused collectors

* Lint

* Create generic stub integration and use it for cadvisor

* Lint

* Final refactor of cAdvisor config for unsupported platforms. Pared down stub integrations.

* Lint

* Docs for cadvisor config

* Update changelog

* Update pkg/integrations/stub_integration.go

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Reorder changelog

* Instance key clarity

* Inclusive naming

* Finish name changes

Keep default disable metric list in sync with upstream

Idiomatic golang

* Hardcode disabled metrics for cadvisor

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Remove log-level flag from systemd unit file (#1177)

* Upgrade to OTel v0.40.0 (#1176)

* Upgrade to OTel v0.40.0

* Changelog

* Add factories check

* go mod tidy

* config/features: create package to standardize experimental features (#1170)

* config/features: create package to standardize experiemental features

This commit introduces a new package, pkg/config/features, which allows
defining a set of features and validating whether flags associated with
those features are allowed to be set.

Closes #1163

* update documentation

(also s/enabled-features/enable-features)

* Fix typo

* Update pkg/config/features/features.go

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* enable cadvisor by default

* switch to using real feature flag

* fix postgres_exporter

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Revert "Merge main into dev.multiple-integrations (#1184)" (#1189)

This reverts commit ad76ec5.

* [dev.multiple-integrations] Revert breaking changes to existing integrations (#1191)

* revert breaking changes to integrations v1

This commit reverts #1062 in favor of making breaking changes directly
in integrations-next instead. The part of #1181 to remove
`wal_truncate_frequency` has also been reverted.

As part of this change, the enabled field is removed from the v2
common metrics configs, and v2 integrations can no longer be disabled.
v2 integrations can only be disabled by removing them from the YAML.

* integrations/v2: remove stale reference to ErrDisabled

(fix typo too)

* integrations/v2: bring in common config decoupling

* [dev.multiple-integrations] Introduce autoscraper (#1195)

* pkg/integrations/v2: introduce self-scraping

* linting

* [dev.multiple-integrations] Multiple instances of integrations (#1196)

* multiple instances of integrations

opt in relevant v1 integrations into supporting multiple instances

* shims should check for instance key override

* Document integrations-next (#1197)

* document integrations-next

* remove json tags since they make markdown unhappy

* changelog

* s/Run/RunIntegration

* remove stale comment about integrations.controller purpose

* create dedicated run method for instanceScraper

* s/expoter/exporter/g

* Document why an autoscrape.Scraper manages a set of per-instance scrapers

* spell out prerequisite instead of pre-req

* use go.uber.org/atomic to make the code a little easier to follow

* remove started callback for running integration

* use smaller interface for autoscrape

Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: Matt Durham <mattdurham@ppog.org>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Fix panic when using 'stdout' in automatic logging (#1233)

* integrations-next: fix bug where v2 integrations were not being strictly unmarshaled (#1235)

* Remove jsonnet vendor folders (#1222)

* remove jsonnet vendor

This adds all vendor folders into .gitignore and removes cached vendor
files from the repository.

Closes #1221

* Update scripts and instructions for jsonnet vendor removal

* `make example-dashboards` will now also run `jb install`
* k3d environment instructions now include `jb install`
* smoke-test.bash will now run `jb install` prior to `tk apply`

* Fix link to k3d example in DEVELOPERS.md (#1242)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix node_exporter upgrade docs (#1239)

* Fix panic in automatic logging with stdout backend (#1243)

* pkg/util: support custom yaml.Unmarshaler implementations for util.UnmarshalYAMLMerged (#1244)

It's common for config types to have implement yaml.Unmarshaler for:

* Applying defaults
* Applying extra logic post-unmarshal

If these config types were unmarshaled through util.UnmarshalYAMLMerged,
the yaml.Unmarshaler implementation would never complete successfully,
preventing the post-unmarshal logic from running.

This issue was introduced in #1192, but went unnoticed until #1228
implemented yaml.Unmarshaler to perform field migrations. #1240 reported
the issue.

This commit fixes the bug by performing a second non-strict unmarshal to
ensure that all input values unmarshal successfully, with the exception
of unmarshal errors unrelated to unrecognized field names.

This is hacky, but it's worthwhile noting that util.UnmarshalYAMLMerged
is a temporary workaround needed for the integrations-next migration,
and will eventually be removed.

* Update k3d example grafana/grafonnet-lib version (#1246)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Create an e2e framework with support for running tests against k8s (#1234)

* e2e: create an e2e framework with support for running tests against a k3d cluster

* add new E2E drone job

* E2E tests should pass when doing a release

* sign drone.yml again

* move e2e lint to different step that has golangci-lint installed

* upgrade golangci-lint and go for e2e test

* e2e: add gcc

* E2E: install build-essential to get a working full gcc env

* :(

* e2e: support running from inside of docker

* fix lint error

* address review feedback

* Operator: fix bug where /-/ready and /-/healthy always returned 404 (#1252)

* operator: fix bug where /-/ready and /-/healthy always returned 404

controller-runtime must have at least one ready/healthy check for the endpoints to exist

* fix lint error, use healthz.Ping

* Make scraping-svc use the new `metrics:` key (#1259)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* update prometheus dependency (#1260)

* corrected typo (#1265)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags (#1264)

* Use RELEASE_TAG to choose between `:main` and `:latest` docker tags

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use :main tag for images in smoke test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Set IMAGE_BRANCH_TAG env var in drone and actions pipelines

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove quotes from Makefile variable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Remove force_release action

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* prepare for v0.22.0 release (#1266)

* prepare for v0.22.0 release

* remove E2E pipeline

* Add basic testing framework for operator (#1268)

* remove dedicated go.mod for e2e/

* move e2e/k8s to pkg/util/k8s

* Migrate operator tests to pkg/util/k8s

* remove dedicated e2e tests

* allow skipping TestCluster in pkg/util/k8s

* remove e2e/

* fix bad merge

* fix order of make env args for windows

* actually declare referenced docker volume

* introduce pkg/util/subset for asserting subset of objects

* refactor operator so it's testable

* define basic integration test for operator

* fix lint errors

* fix invalid address in operator test config

* Update release-note.md (#1267)

* Set scrape User-Agent header during init (#1274)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Upgrade to Go 1.17 (#1278)

* Upgrade to 1.17.6 in go.mod and Dockerfiles

* Update CHANGELOG.md to mention the update

* Update Go version in drone/actions pipelines

* Update go.mod, go.sum files via

* Re-sign drone.yml

* Remove leading newline causing drone build to fail

* Bump golangci-lint image to a version using Go 1.17

* Re-attempt to solve linter issue with new golangci-lint image

* Remove suffix of exclude rules

* Clean previous Go version before unpacking Go 1.17

* Also clean up previous Go versions in other steps

* fix typo (#1284)

* Use custom Go version in agent-operator Dockerfile (#1286)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* pkg/operator: refactor resource hierarchy discovery (#1271)

* pkg/operator: refactor resource hierarchy discovery

This commit moves common logic related to discovering the resource
hierarchy to pkg/operator/hierarchy. This new package requires less
boilerplate, which the reconciler is updated to take advantage of.

* remove unused code

* test construction of resource hierarchy

* add missing build constraints

* small extra cleanup to use pointer package

* review feedback

* update agent-build-image for go 1.17 (#1287)

(also use a consistent base image tag instead of latest)

* Skip non-ready entries when listing instances (#1289)

* Skip non-ready instances in LoadInstances()

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Fix panic in prom_sd_processor when address is empty (#1279)

* Fix panic in prom_sd_processor when address is empty

* Fix panic in prom_sd_processor when address is empty

* Fix docs

* Add test case

* Lint

* Move to unreleased

* Operator: generate proxy_url for remote_write (#1298)

* operator: generate proxy_url for remote_write

* fix weird indentation in test

* Use log format in traces subsystem (#1272)

* Use log format in traces subsystem

* Changelog

* Undo unwanted change

* Fix changelog entry

* integrations-next: Add extra_labels to inject extra labels for an integration (#1312)

* integrations-next: Add extra_labels to inject extra labels for an integration.

* separate tests

* Fix anchor link on operator docs (#1302)

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* updated config URL (#1304)

The existing URL returns a 404: https://grafana.com/docs/agent/latest/getting-started/configuration/_index.md 
Updated to https://grafana.com/docs/agent/latest/configuration/

* Fix typo in node_exporter (#1325)

* Allow remote_write URL credentials (#1329)

* Bypass Prometheus password redaction

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add inline secret in existing test

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry

* Add to scrubbed testcase as well

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Stop appending duplicate exemplars (#1316)

* Add memExemplar in stripeSeries as first iteration

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add test for skipped duplicate exemplars; Simplify conditional

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add changelog entry; discard test errors

* Move changelog entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add Benchmark for AppendExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Discard error on added benchmark

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Use original exemplar struct instead of custom memExemplar

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Surround benchmark loop with start/stop timers and close test storage

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Add comment about prepopulating exemplars on WAL startup

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in the totalAppendedExemplars metric

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make comment more discoverable

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Make sure we're recording exemplars for non-nil series ref only

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* integrations-next: wait for integrations to exit after stopping them (#1318)

* integrations-next: wait for integrations to exit after stopping them

* fix lint errors

* minor refactor

* integrations-next: stop holding config mutex for entire reload

* make controller.run authoritative over running integrations

* fix log line

* move running integrations into a dedicated worker pool

* operator/hierarchy: stop using field selector when listing Secrets & ConfigMaps (#1340)

The initial implementation of hierarchy.KeySelector injected a
FieldSelector when listing Secrets and ConfigMaps to immediately return
the single object being queried for.

This causes a problem with the client generated by the
controller-runtime framework, where the client is wrapped in a cache and
field indexer (where only the namespace is indexed by default).

This commit avoids using the field selector and the index lookup. The
resulting behavior aligns more closely with discovering other resources
in the hierarchy (i.e., ServiceMonitors), where the List call is also
insufficient and needs post-processing via Matches to find the final
list of resources.

Given the controller-runtime client uses an informer for reads, all
relevant Secrets and ConfigMaps are already in-memory anyway, and using
the index for a faster List is a bit of an over-optimization at the
moment.

* Add dependabot to update go modules and github actions. (#1217)

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>

* smoke framework refactor (#1326)

* Agent smoke test (#1291)

* convert smoke script to go program

* update build for agent-smoke

* fix pr comments

* use existing log helper package

* refactor context cancel

* update exit codes

* use ticker

* prefer oklog/run instead of errgroup

* use nop logger

* refactor task interface

* remove functional options

* log.With for task loggers

* move smoke to tools

* build smoke image, push to internal registry

* move crow to tools

* add gcr_admin secret

* fix link to crow

* add smoke libsonnet and use in local k3d smoke test

* add deletePodBySelectorTask

* scale smoke-test replica down after local test

* refactor smoke Options to Config

* update duration usage message

* add some basic unit tests

* newlines

* pass mutation frequency and chaos frequency from smoke script

* pull crow image from gcr

* update smoke script

* move monitoring to smoke libsonnet

* move additional smoke resources needed in deployment tools

* reference libsonnet files from grafana-agent dep

* make drone

* fix images in smoke script

* get rid of extVars

* update k3d example environment to reference etcd from new location

* update smoke docker builds to use go1.17

* use pointer.Int64

* refactor smoke jsonnet (#1296)

* add policy rule for list and delete pods (#1319)

* refactor smoke.new function to take config object (#1327)

* Apply suggestions from code review

* Update production/tanka/grafana-agent/smoke/crow/main.libsonnet

* Update production/tanka/grafana-agent/smoke/main.libsonnet

* Update example/k3d/scripts/smoke-test.bash

Co-authored-by: Robert Lankford <robert.lankford@grafana.com>

* readme update (#1338)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Correct link to the configuration (#1036)

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>

* Add stale check Github Action (#1345)

* Add a stale check GH action to run every 24 hours

* remove old stale.yml file

* add permissions to action

* update the stale message to clarify when the stale label will get
removed

* Update .github/workflows/stale.yml

* stale action: fix missing indent (#1346)

* Fix mssql issue (#1351)

* Add K8s Events integration (#1330)

* Add K8s eventhandler integration (#1310)
* Add docs and sample manifests to eventhandler integration (#1328)
* Wait for cache to flush before returning
* Clarify eventhandler docs (#1334)
* Clarify docs
* Update CHANGELOG.md
* Review changes (#1349)

* stale action: fix typo in label exemptions (#1347)

* update withVolumesMixin for agent jsonnet (#1358)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Configure cluster label using logs client external_labels param (#1357)

* Configure cluster label using logs client external_labels param
* Update CHANGELOG.md

* add password file and basic auth round tripper in crow (#1361)

* add password file and basic auth round tripper in crow

* add ca-certificates in crow image

* add orgID flag

* update help text

* default send_exemplars to true in remote_write (#1352)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update eventhandler labels (#1368)

* Update eventhandler integration labels
* Update CHANGELOG
* Remove unnecessary kind label

* update changelog (#1374)

Remove BUGFIX entries that fix a bug introduced by main (i.e., bugs
which were never part of a release)

* Prepare for release of v0.23.0 (#1377)

* Update version references

* Fix fat-fingered delete; Remove mention of upgrade Go

* RFC: Design in the open (#1055)

* rfc: first draft of RFC0001

* add placeholder for PR

* update PR link

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify "designing in the open" is best-effort

* update 0001

* fix dead link in production/README.md

* add recommended sections for RFC proposals

* describe the process for approving a proposal

* ignore RFC template in link checker

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Update docs/rfcs/0001-designing-in-the-open.md

Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* do my nitty 80-char line length limit change

* indent pros/cons to a single section

* document process for superseding RFCs

* remove RFC mutability requirement

* add extra flavor around not recommending google docs

* require Google Doc -> RFC conversion

* move new files

Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>

* Add Grafana Labs SECURITY.md (#1356)

Signed-off-by: Richard Hartmann <richih@richih.org>

* Add readiness check to metrics component (#1369)

* PR Base

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Fix autoscrape's mockInstance

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

* Wire in atomic readiness check

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Add CHANGELOG.md entry

Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Reference page to download windows installer (#1372)

Fixes #1366

* fix typo in node_exporter_config (#1389)

which should be `privileged` instead of `priviliged`

* Add option for Operator to pass arguments to GrafanaAgent #1227 (#1248)

* 1250 oauth2 tracing (#1386)

* Add oauth support for trace Otel trace exporter via opentelemetry-collector-contrib oauth2clientauthextension

* start extensions on collector instance startup

fix decoding to otelconfig

build extensions

add oauth extension to service map

* Update traces config documentation

* lint fixes

* fix godoc comments

* pass exporter index directly to exporter name generator

* PR feedback; Update Changelog

* sort extensions when sorting pipelines for testing determinism

* README: Fix link to agent logo (#1396)

* update MAINTAINERS.md (#1402)

* add smoke alerts to mixin; move local alerts into examples dir (#1397)

* add smoke alerts to mixin; move local alerts into examples dir

* add podPrefix for smoke test

* podPrefix in libsonnet config

* [RFC] Integrations in Grafana Agent Operator (#1224)

* rfc: integrations in grafana agent operator

Supersedes #883

* add missing links

* Apply suggestions from code review

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* clarify how many daemonsets/deployments/service/secrets are created

* add example of defining secrets

* try defining integrations

* s/IntegrationsMonitor/IntegrationMonitor/g

* simplify proposal

* add alternatives

* remove old reference to `hasMetrics` field

* document example generated agent configuration file

* assign ID RFC-0002

* add missing PR link

Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>

* add fake rw endpoint to smoke program (#1405)

* fix alerts typo (#1407)

* continuous delivery for smoke images (#1408)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* fix continuous delivery job errors (#1409)

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* [operator] - Use _file variants for basic auth credentials. (#1411)

* use password_file alternatives in operator config

* update tests

* reduce smoke alert noise (#1412)

* reduce smoke alert noise

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* Update production/grafana-agent-mixin/alerts.libsonnet

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* update cpu check comment

Signed-off-by: Robbie Lankford <robert.lankford@grafana.com>

* add minimum load threshold to cpu alert

Co-authored-by: Robert Fratto <robertfratto@gmail.com>

* Clarify usage of instanceNamespaceSelector (#1413)

* RFC-0001: Add status to RFC (#1391)

* rfc-0001: add rules for when RFC PRs should be merged

* use status field instead of merge to indicating state

* Parametrize logs DaemonSet K8s manifests (#1420)

* Parametrize logs daemonset K8s manifests
* Update CHANGELOG.md

* Extend linting configuration file (#1421)

* Add depguard linter to reject packages we tend to avoid
* Replace golint with revive, since golint is deprecated
* Remove interfacer, which is deprecated with no replacement
* Add makezero linter to detect misuse of make with append
* Add tenv to prefer t.Setenv over os.Setenv in tests
* Add whitespace to report unnecessary blank lines
* Ignore test files for errcheck

In addition to the above, the following changes were made:

* Remove settings that just re-set default values, instead pointing to the website to retrieve defaults.
* Simplify the errcheck rule to only include functions we actually need to ignore.

Co-authored-by: Robert Fratto <robert.fratto@grafana.com>
Co-authored-by: Ursula Kallio <73951760+osg-grafana@users.noreply.github.com>
Co-authored-by: Mario <mariorvinas@gmail.com>
Co-authored-by: Robert Lankford <robert.lankford@grafana.com>
Co-authored-by: f11r <fiete.gruenter@rwth-aachen.de>
Co-authored-by: f11r <f11r@users.noreply.github.com>
Co-authored-by: Nick Pillitteri <56quarters@users.noreply.github.com>
Co-authored-by: Ryan Geyer <me@ryangeyer.com>
Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>
Co-authored-by: Robert Lankford <rlankfo@gmail.com>
Co-authored-by: Paschalis Tsilias <tpaschalis@users.noreply.github.com>
Co-authored-by: Patrick Koenig <pkoenig10@gmail.com>
Co-authored-by: DataPoints <langer.markus@gmail.com>
Co-authored-by: Alex <52292902+alexrudd2@users.noreply.github.com>
Co-authored-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: melGL <81323402+melgl@users.noreply.github.com>
Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com>
Co-authored-by: Joseph Woodward <josephwoodward@xeuse.com>
Co-authored-by: hanif <hjet@users.noreply.github.com>
Co-authored-by: Richard Hartmann <RichiH@users.noreply.github.com>
Co-authored-by: laiwei <laiwei.ustc@gmail.com>
Co-authored-by: Sam <shamsalmon@users.noreply.github.com>
Co-authored-by: Chris Knutson <christopher.knutson@gmail.com>
Co-authored-by: Florian Klink <flokli@flokli.de>
Co-authored-by: Craig Peterson <192540+captncraig@users.noreply.github.com>
@rfratto rfratto deleted the rfc-integrations-in-operator branch March 14, 2022 21:18
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 1, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. proposal Proposal or RFC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants