Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework command line arguments parsing #2343

Merged
merged 1 commit into from
Dec 15, 2022
Merged

Conversation

dmitryax
Copy link
Contributor

This PR solves two problems:

  1. In the current implementation, command flags specific to the Splunk distro are dropped from the arguments list passed to the collector core service by their names. It means that all the flag values are unintentionally passed downstream. Also, flags with = are not removed from the list and cause failures downstream. For example, ./bin/otelcol --config-dir=/tmp/conf ... fails with:
2022/12/10 19:15:52 main.go:106: application run finished with error: unknown flag: --config-dir

And removeFlag function is called from methods that are not expected to be mutating by their definition like ResolverURIs.

  1. Currently, we rely on the downstream collector core service to display --help flag output and get the following:
Usage:
  otelcol [flags]

Flags:
      --config --config=file:/path/to/first --config=file:path/to/second   Locations to the config file(s), note that only a single location can be set per flag entry e.g. --config=file:/path/to/first --config=file:path/to/second. (default [])
      --feature-gates Flag                                                 Comma-delimited list of feature gate identifiers. Prefix with '-' to disable the feature. '+' or no prefix will enable the feature.
  -h, --help                                                               help for otelcol
      --set func                                                           Set arbitrary component config property. The component has to be defined in the config file and the flag has a higher precedence. Array config properties are overridden and maps are joined. Example --set=processors.batch.timeout=2s
  -v, --version                                                            version for otelcol

It's not just incomplete but also provides a misleading description for --config flag: we don't support config schemas, so using any of the provided examples will fail. This change defines all the flags in our distro and makes the program use that definition in --help output instead of relying on the collector core service. Experimental flags like --configd, --config-dir and --discovery are hidden for now, but can be easily enabled. It provides the following output for --help flag:

Usage of otelcol:
      --config string          Locations to the config file(s), note that only a single location can be set per flag entry e.g. --config=/path/to/first --config=path/to/second. (default "[]")
      --dry-run                Don't run the service, just show the configuration
      --feature-gates string   Comma-delimited list of feature gate identifiers. Prefix with '-' to disable the feature. '+' or no prefix will enable the feature. (default "[]")
      --no-convert-config      Do not translate old configurations to the new format automatically. By default, old configurations are translated to the new format for backward compatibility.
      --set string             Set arbitrary component config property. The component has to be defined in the config file and the flag has a higher precedence. Array config properties are overridden and maps are joined. Example --set=processors.batch.timeout=2s (default "[]")
  -v, --version                Version of the collector.

Now we also explicitly pass all the flags that are handled by the collector core service instead of passing what we don't use. The reason for that is that there are no plans in the collector core to add any new flags. Most of the old flags were migrated to the configuration. And, even if something will be added, it potentially can be incompatible with our distro and introduce some broken flags to the --help output without us noticing.

And, as an opportunistic simplification, this change replaces an unnecessary Settings interface with a struct.

@dmitryax dmitryax requested review from a team as code owners December 11, 2022 03:50
This change solves two problems:

1. In the current implementation, command flags specific to the Splunk distro are dropped from the arguments list passed to the collector core service by their names. It means that all the flag values are unintentionally passed downstream. Also, flags with `=` are not removed from the list and cause failures downstream. For example, `./bin/otelcol --config-dir=/tmp/conf ...` fails with:

```
2022/12/10 19:15:52 main.go:106: application run finished with error: unknown flag: --config-dir
```

And `removeFlag` function is called from methods that are not expected to be mutating by their definition like `ResolverURIs`.

2. Currently, we rely on the downstream collector core service to display `--help` flag output and get the following:

```
Usage:
  otelcol [flags]

Flags:
      --config --config=file:/path/to/first --config=file:path/to/second   Locations to the config file(s), note that only a single location can be set per flag entry e.g. --config=file:/path/to/first --config=file:path/to/second. (default [])
      --feature-gates Flag                                                 Comma-delimited list of feature gate identifiers. Prefix with '-' to disable the feature. '+' or no prefix will enable the feature.
  -h, --help                                                               help for otelcol
      --set func                                                           Set arbitrary component config property. The component has to be defined in the config file and the flag has a higher precedence. Array config properties are overridden and maps are joined. Example --set=processors.batch.timeout=2s
  -v, --version                                                            version for otelcol
```

It's not just incomplete but also provides a misleading description for `--config` flag: we don't support config schemas, so using any of the provided examples will fail.  This change defines all the flags in our distro and makes the program use that definition in `--help` output instead of relying on the collector core service. Experimental flags like `--configd`, `--config-dir` and `--discovery` are hidden for now, but can be easily enabled. It provides the following output for `--help` flag:

```
Usage of otelcol:
      --config string          Locations to the config file(s), note that only a single location can be set per flag entry e.g. --config=/path/to/first --config=path/to/second. (default "[]")
      --dry-run                Don't run the service, just show the configuration
      --feature-gates string   Comma-delimited list of feature gate identifiers. Prefix with '-' to disable the feature. '+' or no prefix will enable the feature. (default "[]")
      --no-convert-config      Do not translate old configurations to the new format automatically. By default, old configurations are translated to the new format for backward compatibility.
      --set string             Set arbitrary component config property. The component has to be defined in the config file and the flag has a higher precedence. Array config properties are overridden and maps are joined. Example --set=processors.batch.timeout=2s (default "[]")
  -v, --version                Version of the collector.
```

Now we also explicitly pass all the flags that are handled by the collector core service instead of passing what we don't use. Reason for that is that there are no plans in the collector core to add any new flags. Most of the old flags were migrated to the configuration. And, even if something will be added, it potentially can be incompatible with our distro and introduce some broken flags to the `--help` output without us noticing.

And, as an opportunistic simplification, this change replaces an unnecessary `Settings` interface with a struct.
@@ -60,77 +57,58 @@ const (
ConfigDScheme = "splunk.configd"
)

type Settings interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, as an opportunistic simplification, this change replaces an unnecessary Settings interface with a struct.

The aim of this interface was to help clarify the intended consumer usage or available settings while hiding all the underlying flag/state management that was scattered throughout main.go. What are we gaining by using the exported internal struct directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any IDE or editor can show all available methods and point you right to the implementation, which is much better than pointing to an interface declaration that force you to look around for the actual implementation after that. Also, all the methods will be listed the same way as they are in the go doc https://pkg.go.dev/github.com/signalfx/splunk-otel-collector@v0.66.0/internal/settings#Settings. So would disagree that using interfaces for one struct makes developer live any better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more for maintainability and trying to establish a clear convention for how future settings should be added, and nothing to do w/ IDEs. Given these changes don't deviate too far from what was there I think it's ok but may be something to revisit as we add future options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more for maintainability

IMHO adding unnecessary interfaces reduces maintainability

establish a clear convention for how future settings should be added

I think it's rather an unnecessarily complicated convention for such a simple interface. Not sure if there is even a reason to expose them as methods... If they are all just getters using data from the struct that was set during initialization, we can even expose them as precalculated struct fields. That will put them all together in one place as it was with the interface. And it'll help to avoid extra work if they would be used more than once in the future. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to get too caught up in a dispute of personal preference, but they were interfaces because interfaces are collections of methods and not fields. The accessors do the computing and return the setting value, but could definitely be improved. They were exposed via the settings interface to encapsulate the ~legacy flag helpers while migrating to a central setting store to prevent breaking changes.

Processing them w/ initial parsing stages in New() and using exported fields is acceptable to me and would address their idempotence issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing them w/ initial parsing stages in New() and using exported fields is acceptable to me and would address their idempotence issues.

Sounds good, I was thinking about doing that but didn't want to go too far. I can do it in this or separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say let's cut an issue for

Processing them w/ initial parsing stages in New() and using exported fields is acceptable to me and would address their idempotence issues.

and keep it as the struct version for now.

cpArgs = append(cpArgs, args...)
// parseArgs returns new Settings instance from command line arguments.
func parseArgs(args []string) (*Settings, error) {
flagSet := flag.NewFlagSet("otelcol", flag.ContinueOnError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented the aim here is to not have to track or manage upstream options in our init logic unless we are using them directly. Why is this no longer desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that is that there are no plans in the collector core to add any new flags.

Do you have an issue for this?

it potentially can be incompatible with our distro and introduce some broken flags to the --help output without us noticing.

Do you mind clarifying why this expectation is something to avoid and why it's better to not inherit* features as upstream provides them?

Copy link
Contributor Author

@dmitryax dmitryax Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an issue for this?

No

Do you mind clarifying why this expectation is something to avoid and why it's better to not inherit* features as upstream provides them?

Because additional flags introduce some optional functionality. And that functionality potentially can be incompatible with our distro, which will lead to an additional broken flag in --help output without us noticing. If a new additional flag in core, for whatever reason, is required, we will catch it during the upgrade of the core dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still want to define all known flags (including core's) in our distro but pass everything that we don't know to the core just to potentially catch a new flag added to the core without documenting it, we can do that. But it will require manual argument parsing. I don't think it is worth it. I'd rather keep an eye on new flags added to the core (which is very unlikely to happen any time soon) and add them properly along with documentation in our distro, the same way as we do for components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still under the impression we probably don't want to define any flags we aren't using and the current problems we are running into are longstanding implementation defects. Are we not already manually parsing args atm and in your changes? I can see an inverse flagSetToArgs() that removes our distro-only flags and values as a major improvement to the current removeFlag().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it makes sense to fix the config parsing and other issues alone since we aren't in alignment on how to move forward otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing each issue alone requires adding more hacks on top of what we have. I tried to come up with one set of related changes that doesn't require doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context Dmitrii, are you saying that not doing the opportunistic change regarding Settings makes it more difficult to deliver/fix the two problems mentioned in the issue body?

There's a lot of "issues" and decision points to be made so I want to be 100% certain on which specific issues and challenges are meant by "fixing each issue alone" and "one set of related changes" and "doing that". Want to make sure I didn't get lost in the thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still disagree that we should pass all unknown flags, but happy to discuss that separately and apply another PR if needed.

Yeah, it's harder to remove that functionality if someone ends up needing it for some esoteric triage. Better imho to go with an explicit enumerative set of flags and let the customers cut tickets if they have use cases instead of them just silently DIYing stuff unbeknownst to us. Still catching up on this though, so treat this as my bit-more-than-initial impression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (love?) the vision for a better future Ryan is illustrating and agree that in general this is an unfortunate place to be in, stuck between a bunch of suboptimal design decisions and context we as rotel can't fully control. I think these integration pains need to be addressed and agree that Ryan's concerns of software quality need to be better internalized in our culture and inter-team communication, and I wouldn't say this PR fixes the "root" cause for some of the aforementioned issues. The timeframe for addressing this is "the sooner the better".

All that said, I still think we should merge this as-is for now. To my understanding the interface and customer messaging won't change drastically from the implicit one we have today, and this change fixes bug(s) without (imho) introducing any design decisions that we can't easily walk back from, and overall improves the customer experience.

For that reason I'm inclined to ship as-is.

Copy link
Contributor

@hughesjj hughesjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved so long as it's agreed that any design decisions introduced here won't be too hard to change or walk back in the future, which I believe to be the case.

Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved - will use this discussion for tech debt around this module.

@dmitryax dmitryax merged commit 3b9164c into main Dec 15, 2022
@delete-merged-branch delete-merged-branch bot deleted the exclude-flags-explicitly branch December 15, 2022 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants