Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail config conversion - part 2 #4500

Merged
merged 20 commits into from
Jul 20, 2023

Conversation

thampiotr
Copy link
Contributor

@thampiotr thampiotr commented Jul 18, 2023

PR Description

  • Adds further support for promtail config conversions
    • All the SD features
    • Some of the logs producing elements, e.g. loki push api
  • Increases the testing coverage by attempting to load Flow config
  • Fixes an issue with prometheus.discovery.relabel using wrong exported field name
  • Fixes an issue with cloudflare typo

Notes to the Reviewer

Work still left (see TODO comments):

  • Remaining logs producing features
  • Some top-level features
  • Remaining loki.process stages

@thampiotr thampiotr force-pushed the thampiotr/promtail-config-conversion-part-2 branch from e5ccbfe to ed149f5 Compare July 19, 2023 12:57
@thampiotr thampiotr marked this pull request as ready for review July 19, 2023 13:27
@thampiotr thampiotr requested a review from a team as a code owner July 19, 2023 13:27
@thampiotr thampiotr requested a review from rfratto July 19, 2023 13:28
@@ -49,7 +49,7 @@ func appendDiscoveryRelabel(pb *prometheusBlocks, relabelConfigs []*prom_relabel
pb.discoveryRelabelBlocks = append(pb.discoveryRelabelBlocks, newPrometheusBlock(block, name, label, "", ""))

return &disc_relabel.Exports{
Output: newDiscoveryTargets(fmt.Sprintf("discovery.relabel.%s.targets", label)),
Output: newDiscoveryTargets(fmt.Sprintf("discovery.relabel.%s.output", label)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this issue thanks to adding an attempt to load flow config in testing.go.

@@ -8,7 +8,7 @@ loki.relabel "fun" {
max_cache_size = 0
}

loki.source.cloudfare "fun" {
loki.source.cloudflare "fun" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also found this thanks to attempting to load Flow config in testing.go.

@@ -96,6 +105,7 @@ func validateDiags(t *testing.T, expectedDiags []string, actualDiags diag.Diagno
if len(expectedDiags) > ix {
require.Equal(t, expectedDiags[ix], diag.String())
} else {
fmt.Printf("=== EXTRA DIAGS FOUND ===\n%s\n===========================\n", actualDiags[ix:])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prints all the diags that are left, so one can more quickly fix the expectations.

Comment on lines +49 to +51
if !fileExists(riverFile) && !fileExists(diagsFile) {
t.Fatalf("no expected diags or river for %s - missing test expectations?", path)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catches the case where there is a typo in file names.

Copy link
Contributor

@erikbaranowski erikbaranowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking solid. There are some questions on consistency between the converters I'd like to see us come to a consensus on so that we can jump between the converters in code with as little context switching as we can.

@@ -131,12 +131,33 @@ func appendScrapeConfig(
diags *diag.Diagnostics,
gctx *build.GlobalContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the builder concept a lot and I'm thinking about it some more. It's not in this changeset, but I wonder if the build.GlobalContext should have the cfg *promtailcfg.Config on it rather than the more specific TargetSyncPeriod. You can then use that as the context for anything you need from the entire config rather than specific properties that have to be wired in. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we have been using more fields (e.g. more than 3?) from the global context, that would certainly make sense.

Currently I suspect this will be quite limited with just a few fields and adding a large struct with everything in it such as cfg *promtailcfg.Config may result in larger cognitive load when reading this IMO. Seeing that there are only a few fields in the GlobalContext, I can quickly see that only few things are considered global.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the builder concept a lot and I'm thinking about it some more.

Thanks! One thing that builder is good for is a long pipeline where some steps are optional. I can have discovery.* -> discovery.relabel -> local.file_match -> loki.source.file -> loki.process -> loki.write and some of these intermediate components are optional. With the builder pattern I can skip a stage in the pipeline without too much code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus has a similar case too chaining together potential components up to 5

return cfg
}

func ValidateWeaveWorksServerCfg(cfg server.Config) diag.Diagnostics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm commenting on this line just to get the conversation going.

I think we should discuss and narrow in a consistent approach is for adding diags during the conversion. It seems like we currently have too much variance in how we do things which makes it hard for me to follow along if I'm looking at the converter validations:

  • a general validate.go in the *convert package
  • various validate* funcs in flow component specific files, ie. prometheusconvert.validatePrometheusScrape which returns diag.Diagnostics
  • adding them directly to the property on a builder, ie. build.AppendKubernetesSDs
  • passing a *diag.Diagnostics into another func, ie. build.convertStage

Not to suggest this is the best final pattern especially since I am interested in the builder pattern but the current pattern I have in place for prom/static converters exists like this:

  • Any diags for the main parsing is done in the top level prometheusconvert.Convert func
  • validation diags aren't added while appending components (maybe this is too strict for all cases, but it does separate the responsibilities)
  • validation.go provides a top level validate func that returns all validation diags not covered by prometheusconvert.Convert
    • The validation is ordered by the order of the properties in the source config type
    • Any non-component specific validation exists in this file
    • Any component specific validation calls a validation func which exists in that component's specific convert file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've noticed these differences too and I agree we could bring things to be a bit more consistent (this or preferably a future PR).

However there are a few reasons why I am hesitant about a one big validate step followed by building the config separately:

  • It spreads the code around a bit... I would prefer if adding a new feature support was ideally in one place. This is somewhat alleviated by the convention of a validate* function though in the same file as the conversion code.
  • However, it is still possible to forget to call validation or to forget to write it.
  • In one-step validation approach, some issues to validate would require iterating over arrays and layers of nil checking and reaching deep into the config struct, for example this one is in scrape_configs[*].journal.max_age. Since we'll already be walking through this config later to convert it, I found it redundant to walk through it in order to validate it.

Here's an idea how I think we could make the validation and conversion more consistent:

We could enforce the interface for config fragments conversions and force via API always validating when converting - this way it is harder to forget to call validate. An example interface could be like this:

// Converts config from F to T after validating and providing diagnostics.
type ConfigConverter[F interface{}, T interface{}] interface {
	Convert(fCfg F) (T, diag.Diagnostics)
}

And an example implementation and using of it:

type KubernetesSDConverter struct {
}

func (k KubernetesSDConverter) Convert(fCfg promkubernetes.SDConfig) (kubernetes.Arguments, diag.Diagnostics) {
	// here goes the current implementation
	panic("implement me")
}

func actualConversionCode(fCfg promkubernetes.SDConfig, f *builder.File, diags *diag.Diagnostics) {
	args, newDiags := KubernetesSDConverter{}.Convert(fCfg)
	diags.AddAll(newDiags)
	f.Body().AppendBlock(common.NewBlockWithOverride([]string{"discovery", "kubernetes"}, "default", args))
}

Alternatively I can refactor my code to match the pattern we have in the other converters - it may end up being a bit more verbose, but it'd be a feasible option too.

Lastly we can leave the mixed approach - I agree being consistent is better, but calling validation is also not the most complex part around here, so I think it is not too much of an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future us: we talked offline about moving forward as is for now and thinking about a refactor on some of this stuff after things settle. both approaches are very functional, and we can easily test any refactor.

converter/internal/promtailconvert/promtailconvert.go Outdated Show resolved Hide resolved
Copy link
Contributor

@erikbaranowski erikbaranowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@erikbaranowski erikbaranowski merged commit f23f310 into main Jul 20, 2023
8 checks passed
@erikbaranowski erikbaranowski deleted the thampiotr/promtail-config-conversion-part-2 branch July 20, 2023 15:39
clayton-cornell pushed a commit that referenced this pull request Aug 14, 2023
* wip

* fixes

* static SD

* file discovery

* consul

* consul, DO, docker swarm

* other unsupported

* GCE

* format-yaml

* ec2

* azure

* todos

* server validation

* loki push api

* goimports

* fix issue in prometheus/discovery/relabel

* fix lint

* fix another test case

* use random port

* move loki.write to own file
clayton-cornell pushed a commit that referenced this pull request Aug 14, 2023
* wip

* fixes

* static SD

* file discovery

* consul

* consul, DO, docker swarm

* other unsupported

* GCE

* format-yaml

* ec2

* azure

* todos

* server validation

* loki push api

* goimports

* fix issue in prometheus/discovery/relabel

* fix lint

* fix another test case

* use random port

* move loki.write to own file
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 22, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants