Develop a way to handle application configuration #3210

bgrant0607 · 2022-05-22T16:14:20Z

This is related to #3119, but deserves its own issue.

In application-related resources, application configuration often constitutes a large proportion of the overall configuration size.

Application configuration is special in multiple ways:

Attributes can't be derived from well known KRM resource types
Many different formats, which are not KRM and sadly not as standardized as they could be
No explicit schema that kpt has access to

Command-line flags are evil, so I'll punt on them for now, other than using env var substitution to define their values.

Env vars are about the best case. Kustomize has support for generating ConfigMaps from env files, and Kubernetes can inject them as envvars. And, if represented natively in a ConfigMap or in a pod template, then they are KRM and could be edited as such. There's still no native schema though (kubernetes/kubernetes#4210). A command for editing env vars would also be nice.

I haven't looked for any kind of data, but presumably there are some relatively common file formats, such as INI, TOML, Spring Boot properties, etc.

A common, rational instinct is to normalize such formats into a universal, simpler structured form, generally a simple map or nested map. The most common approach is templating and template parameters, with all the consequences that implies. It's less terrible than other uses of templating if one views config files of unknown formats as just unstructured text, but does feel suboptimal. For instance, anyone familiar with how an application is configured would then need to learn the new representation and how it maps to the application-native one, since often syntax, capitalization, etc. are different. It also frequently requires insertion of conditional logic to handle present / not present of the properties. Some formats, such as JSON, are particularly challenging to ensure the output is valid.

For a variety of reasons, we rejected several proposals to support templating in Kubernetes itself (e.g., kubernetes/kubernetes#30716, kubernetes/kubernetes#89738, kubernetes/kubernetes#96346).

We investigated this issue some when we were designing ConfigMap (kubernetes/kubernetes#1553, kubernetes/kubernetes#2068).

I wonder if we could do something with http://augeas.net/index.html
"Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree. Configuration changes are made by manipulating this tree and saving it back into native config files."

We would like to provide a similar WYSIWYG transformation and editing experience for application configuration as for KRM resources, at least for a subset of common formats. We could even recommend an automation-friendly format for people writing their own applications.

This affects ~all the functionality of kpt: update merging, diffs, source and sink, function SDKs, the UI.

For example, we also need to be able to do granular merging during updates, in the original non-KRM config file, and the ensure any ConfigMaps they are embedded into are updated (#3119).

justinsb · 2022-07-12T01:37:44Z

One thing that Craig Box got me noodling about ... does yaml matter to kpt / to kubernetes? It clearly doesn't really matter; it's just a representation that we've decided upon.

Craig (jokingly?) suggested INI files as an alternative to yaml, and perhaps that is the path here. When we write configuration in INI or toml, we are actually setting values in a configuration object. That configuration object doesn't allow all keys, and has various restrictions on the values of those keys. In other words, even though we're writing in a different "expression" language, we could imagine writing an OpenAPI spec to describe the schema of the configuration.

This suggests we could think about writing a set of transformation functions from instances of CRDs to the various common configuration file formats. By doing so, we bring legacy configuration into the better-structured world of kubernetes and KRM.

We could do so either as a client-side object or as a true CRD with an operator.

This doesn't obviously solve #3119, so I'd imagine we would start client-side.

bgrant0607 · 2022-07-12T23:28:35Z

An example of an application with lots of configuration is kafka:
https://github.com/bitnami/charts/blob/master/bitnami/kafka/values.yaml#L93
https://github.com/mesosphere/dcos-kafka-service/blob/master/frameworks/kafka/universe/config.json

Similar to the overall approach to WYSIWYG configuration, I wouldn't want to abstract the application configuration. For instance, as a user or developer I'd expect it to match what I saw in the code or development environment or documentation:
https://kafka.apache.org/documentation/#configuration

So, yes, some apps would express configuration in INI or TOML.

This is where something like Augeas is interesting. "Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree." Looking at http://augeas.net/docs/augeas.pdf, the idea sounds very close to what we would want. Like a pluggable source/sink for specific non-KRM file types.

bgrant0607 · 2022-07-15T03:42:11Z

With #3118, we wouldn't technically need a custom source/sink. We'd still need custom parsing, marshaling, and visualization, though.

bgrant0607 · 2022-07-20T20:06:58Z

As a concrete example that would address a segment of applications, we looked at Spring Boot config (application.properties) in the early days of the kpt project, but it looks like the demo video recordings don't exist any more. This post discusses it:
https://www.springboottutorial.com/spring-boot-application-configuration

bgrant0607 · 2022-07-26T01:56:48Z

One specific category of application configuration is resource-dependent configuration: VM heap size, thread pool sizes, simultaneous connections, cache sizes, etc. Network- and disk-intensive applications often have a number of these tunable settings.

A number of legacy applications and even language runtimes are not container-aware. As an example, before Java was container-aware, additional automation was necessary that is not in more recent versions of the JDK.

Ideally these settings would be derived from container resource limits, either at run time, such as using an init container, or an application-specific function, which would be lighter weight than either an Operator or admission controller.

cc @johnbelamaric

johnbelamaric · 2022-07-26T18:04:14Z

I like an app-specific function, especially if is written in something like Starlark that does not require building and maintaining and coordinating versioning for a separate container image. An init container or custom Go function would require that.

bgrant0607 · 2022-07-28T16:06:47Z

This video discusses an in-pod templating approach using init containers, which is a variation on the entrypoint.sh script approach: https://youtu.be/eJmNSYvelSw?t=1087

I'm liking the Augeas idea, though. If we could convert lots of config formats to a canonical form in kpt fn source, we could manipulate the canonical form and write it back using kpt fn sink.

bgrant0607 · 2022-07-28T16:32:04Z

https://osquery.io/ apparently integrates with Augeas.
https://www.uptycs.com/blog/using-augeas-with-osquery-how-to-access-configuration-files-from-hundreds-of-applications

That's read-only, for queries.

Puppet integrates it also, for setting values:
https://puppet.com/docs/puppet/5.5/resources_augeas.html

And there's Go integration:
https://dev.to/raphink/configuration-surgery-with-go-structure-tags-12a4

bgrant0607 · 2022-07-28T17:41:21Z

More examples:
https://ghost.org/docs/config/
https://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html
https://www.postgresql.org/docs/current/config-setting.html#CONFIG-SETTING-CONFIGURATION-FILE
https://www.rabbitmq.com/configure.html
https://redis.io/docs/manual/config/
https://www.nginx.com/resources/wiki/start/topics/examples/full/
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
https://etcd.io/docs/v3.4/op-guide/configuration/
https://www.vaultproject.io/docs/configuration
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html
https://wpmudev.com/blog/wordpress-wp-config-file-guide/ (php may be too hard)
https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration

We can look through charts for more examples:
https://github.com/bitnami/charts/tree/master/bitnami

selfmanagingresource · 2022-07-28T17:42:25Z

Some discusson in our Kpt office hours

bgrant0607 · 2022-07-28T21:56:38Z

List of formats it looks like we need support for:

INI: https://github.com/go-ini/ini
Properties: https://github.com/magiconair/properties
JSON: https://pkg.go.dev/encoding/json
YAML: https://github.com/go-yaml/yaml
XML: https://pkg.go.dev/encoding/xml
TOML: https://github.com/pelletier/go-toml
env: https://github.com/caarlos0/env (?)
Line-delimited text, as a fallback for non-data-format cases like php, sql, lisp, etc.

This is not a lot of formats. They all have Go implementations with permissive open-source licenses, though they may not preserve comments and whitespace.

I like what Augeas has done, but most of the 300 formats it supports are for system files, which we don't need, so it would probably be easiest for us to develop our own implementation and canonical representation. We would want the mechanism to be similarly pluggable.

We will want to be able to infer the format, such as from file extension and/or trying to parse the file, with a fallback for the user to be able to specify the format.

bgrant0607 · 2022-07-29T01:29:49Z

Because kubernetes/kubernetes#831 was never done, the configuration needs to be in a ConfigMap in order for it to be injected into the application in a straightforward manner.

Options we discussed today for how to represent app config:

Only represented in the native format (INI, etc.) in the package. This would require apply-time conversion to a ConfigMap, which is kind of similar to what is sometimes done for Secrets. It would also require translation to/from a canonical internal representation for update, diff, KRM function format (kpt fn source and sink), KRM function SDKs, the UI, and anything else manipulating configuration (e.g., a command like jx gitops yset). This experience would be most similar to the current kustomize experience, for kustomize users that don't commit the kustomize build output.
Only represented in KRM, analogous to the internal format mentioned above, in the package. This requires a migration tool to convert from/to the native format and a way to translate it into the internal format for consumption by the application, such as apply-time translation and wrapping in a ConfigMap, or apply-time wrapping in a ConfigMap and runtime translation in an init container or controller.
Represented in both the native format and KRM in the package, with one of them as the source of truth and translation performed eagerly.

The advantage of the application's native format as the source of truth (option 1 or 3) is easier compatibility with the existing application ecosystem(s), without frequent format migrations: reference documentation, tutorials, samples, generators, editors, IDE plugins, Augeas plugins, etc. For instance, here's a mariadb config I could copy/paste:
https://www.ibm.com/docs/en/ztpf/1.1.0.15?topic=collection-mariadb-configuration-file-example

I personally don't have a problem with option 3, but it would be useful to get feedback from actual users.

For all the options, our tooling would manipulate our canonical representation.

The problem of a lack of a schema exists for all the options. We'd design the schema to match our canonical format regardless of which option we picked.

Option 1 requires more conversions back and forth by kpt. Option 2 requires more conversions back and forth by the user. Option 3 is the simplest and most flexible, but possibly harder to understand.

bgrant0607 · 2022-07-29T04:39:36Z

An example of toml embedded in helm chart values:
https://github.com/influxdata/helm-charts/blob/master/charts/telegraf/templates/configmap.yaml
https://github.com/influxdata/telegraf/tree/master/plugins/
and one opinion on that experience:
https://youtu.be/LBCmMTofNxw?t=1937

johnbelamaric · 2022-07-29T16:07:51Z

I suspect all three will be needed, but from a preferred order, I find Option 3 more aligned with the vision, for a couple reasons:

I think apply-time transformations should be avoided when possible, to ensure the integrity of the storage vs live state comparison.
Option 2 is painful, as evidenced by the opinion expressed above. But for very simple cases, it could be useful.

For Option 3, we can make it more palatable with a convention to identify the generated ConfigMaps. We have also discussed management of historical ConfigMaps so this fits in pretty well with that concept. For example, a particular annotation or even storing them in a special directory. A couple other considerations, that perhaps should be discussed on #3119 are: 1) how to combine multiple non-KRM files into a single ConfigMap; 2) how to name, annotate, label, etc. the ConfigMap. I am imagining a "stub" ConfigMap such that functions take in that CM, the raw file resource, and a key name.

bgrant0607 · 2022-07-29T18:01:29Z

I think what @yuwenma demoed was essentially option 2: represent the configuration in a canonical KRM format in the package. But instead of adapting the format in the apply step, used a ConfigMap with granular key-value pairs as the canonical format and added an init container to convert that to INI for the application.

johnbelamaric · 2022-07-29T18:11:14Z

Agreed. What I missed in your description of 3 above was that we would store in the canonical format - I was reading it as representing the native format and the generated ConfigMap(s), treating the canonical format as an intermediate in-memory representation. So we actually have three different formats: native, canonical, and generated ConfigMap. Which expands the options a bit, as to storing which subset of these three.

johnbelamaric · 2022-07-29T18:28:23Z

The other point we need to consider is the source of truth. Clearly the generated ConfigMaps are not it. So it leaves the native and canonical formats. If we store the canonical format, then we will have some confusion as to which is SoT.

Another way to think about SoT is to make it an opinionated pipeline of overrides. The native format - the one most easily edited by humans - is the input to the pipeline, which then may override values in that input to produce the final ConfigMap. This works pretty well for the simple case of an independent file and is straightforward: I edit the native file, but my fn render pipeline may tweak it further and rewrite the file. If we store the canonical format too, I think it muddies these waters.

This method doesn't preclude us being smart about the updates to the native files by internally parsing them to the canonical format, nor does it preclude us using that canonical format to present edits in the UI. Those updates and UI-based edits are subject to being overridden by the pipeline, of course.

It gets tricky when we have inputs that are interrelated between the config file and other resources, though. For example, if we change the port in the native file, does that propagate through the the Service port? Or vice-versa? While the "input with pipeline overrides" doesn't solve this problem, I think that's OK. This is actually the same problem we have for any other resources wrt SoT; the input just happens to be in a different format.

bgrant0607 · 2022-07-29T20:25:44Z

Ooh, I like the idea of storing generated objects in a subdirectory. That might be a useful pattern for generators more generally, especially in the case that post-generation edits aren't feasible: #2528.

Something I proposed in slack: Any applications that can specify config via environment variables should probably do so for now. The ConfigMap with granular key-value pairs could serve as the canonical format. Though it's not quite the native env file format that could be sourced by the shell (added to list above), it should be familiar to Kubernetes users.

bgrant0607 · 2022-07-29T20:28:57Z

Regarding 3 formats: fair point.

johnbelamaric · 2022-07-29T20:33:45Z

#3422 is relevant to this discussion

bgrant0607 · 2022-07-29T22:42:10Z

This PR has an example possible canonical format using granular, flattened key-value pairs (similar to Augeas's internal format) in a ConfigMap:
https://github.com/GoogleContainerTools/kpt-samples/pull/11/files

The python program that converts the corresponding env vars to the app's native INI format, which runs as an init container, is in that PR also. Presumably there's also a program that converts INI to the canonical format. Here there are only 2 formats because the canonical format is fed directly to the init container as opposed to generating a ConfigMap with an embedded INI file.

yuwenma · 2022-07-30T02:03:22Z

One big advantage for option 3:
Once users accept the idea of using canonical format to represent their non KRM app config, they can build logic between the non KRM files and their k8s resources directly and this will give them more flexibility to mutate and validate the package as a whole.

For example, by writing a simple KRM validator function, the platform developer can guarantee the MariaDB port number in INI file is the same as the Ghost deployment database port number. Right now, the most feasible way to do this is to use multi-line setters (not sure if it still works or not), which is the opposite of what we want.

I really like the summary that "Option 1 requires more conversions back and forth by kpt. Option 2 requires more conversions back and forth by the user. Option 3 is the simplest and most flexible, but possibly harder to understand.". For now, I'm leaning towards Option 1 because it gives the best user experience to get started. Only they do, we can "get feedback from actual users."

bgrant0607 · 2024-04-26T21:46:48Z

There's also still the configmap rollout issue.
kubernetes/kubernetes#22368

bgrant0607 added enhancement New feature or request design-doc labels May 22, 2022

mortent added this to ToDo in kpt kanban board Jun 8, 2022

bgrant0607 mentioned this issue Jul 8, 2022

Epic: WYSIWYG Kubernetes Application Configuration #3351

Open

29 tasks

yuwenma mentioned this issue Jul 18, 2022

Add a basic ghost package #3381

Merged

This was referenced Jul 26, 2022

Flesh out the input data model and patterns #3396

Open

Sane merge behavior for non-KRM files #3418

Open

yuwenma mentioned this issue Jul 30, 2022

A simple POC to change non KRM files to a structured k8s resource GoogleContainerTools/kpt-samples#11

Draft

yuwenma closed this as completed Jul 30, 2022

kpt kanban board automation moved this from ToDo to Done Jul 30, 2022

yuwenma reopened this Jul 30, 2022

yuwenma moved this from Done to ToDo in kpt kanban board Jul 30, 2022

yuwenma mentioned this issue Aug 15, 2022

Augeas exploration and conclusion #3465

Closed

mortent added area/site triaged Issue has been triaged by adding an `area/` label labels Jan 26, 2023

liamfallon mentioned this issue Apr 8, 2024

Flesh out the input data model and patterns nephio-project/porch-issue-transfer#132

Closed

liamfallon mentioned this issue Apr 23, 2024

Flesh out the input data model and patterns nephio-project/nephio#662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop a way to handle application configuration #3210

Develop a way to handle application configuration #3210

bgrant0607 commented May 22, 2022 •

edited

Loading

justinsb commented Jul 12, 2022

bgrant0607 commented Jul 12, 2022

bgrant0607 commented Jul 15, 2022

bgrant0607 commented Jul 20, 2022

bgrant0607 commented Jul 26, 2022

johnbelamaric commented Jul 26, 2022

bgrant0607 commented Jul 28, 2022

bgrant0607 commented Jul 28, 2022 •

edited

Loading

bgrant0607 commented Jul 28, 2022 •

edited

Loading

selfmanagingresource commented Jul 28, 2022

bgrant0607 commented Jul 28, 2022 •

edited

Loading

bgrant0607 commented Jul 29, 2022 •

edited

Loading

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022 •

edited

Loading

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022

yuwenma commented Jul 30, 2022 •

edited

Loading

bgrant0607 commented Apr 26, 2024

Develop a way to handle application configuration #3210

Develop a way to handle application configuration #3210

Comments

bgrant0607 commented May 22, 2022 • edited Loading

justinsb commented Jul 12, 2022

bgrant0607 commented Jul 12, 2022

bgrant0607 commented Jul 15, 2022

bgrant0607 commented Jul 20, 2022

bgrant0607 commented Jul 26, 2022

johnbelamaric commented Jul 26, 2022

bgrant0607 commented Jul 28, 2022

bgrant0607 commented Jul 28, 2022 • edited Loading

bgrant0607 commented Jul 28, 2022 • edited Loading

selfmanagingresource commented Jul 28, 2022

bgrant0607 commented Jul 28, 2022 • edited Loading

bgrant0607 commented Jul 29, 2022 • edited Loading

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022 • edited Loading

bgrant0607 commented Jul 29, 2022

johnbelamaric commented Jul 29, 2022

bgrant0607 commented Jul 29, 2022

yuwenma commented Jul 30, 2022 • edited Loading

bgrant0607 commented Apr 26, 2024

bgrant0607 commented May 22, 2022 •

edited

Loading

bgrant0607 commented Jul 28, 2022 •

edited

Loading

bgrant0607 commented Jul 28, 2022 •

edited

Loading

bgrant0607 commented Jul 28, 2022 •

edited

Loading

bgrant0607 commented Jul 29, 2022 •

edited

Loading

bgrant0607 commented Jul 29, 2022 •

edited

Loading

yuwenma commented Jul 30, 2022 •

edited

Loading