Automate reference documentation as YAML or JSON #24189

theletterf · 2023-07-11T14:22:12Z

Reference documentation for each component is hard to come by, update, and produce. Settings and metrics are, by far, the user-facing elements that might change more often between releases. This poses significant overhead on anyone trying to keep documentation up-to-date both upstream and downstream.

As suggested, hinted, or tried in #23054, open-telemetry/opentelemetry-collector#7679, #20509, #15233, or #22962, I think we could generate reference documentation for each component as YAML files, using mdatagen (?) and pulling description and other information from the code.

The YAML reference could then be used to generate Markdown files or even the README file itself. It could also be used downstream by distributions, thus faithfully passing on documentation on each component. Elements that could be automated include:

Metrics (if produced, particularly by receivers)
Configuration settings
Component metadata

An example is what Splunk has been doing here: https://github.com/splunk/collector-config-tools/tree/main/cfg-metadata

atoulme · 2023-07-11T22:54:30Z

As it stands, I think we are in a slow process to automate whatever parts of components we can generate using templates.
Ideally, we would do the following:

Generate a metadata.yaml section for each component, possibly reverse engineering that file using https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/cmd/configschema tools.
Regenerate config using the tool. Driving from the yaml file with enough data can allow us to get all the flavors of information we need:
- Default values
- Documentation string
- mapstructure entry

Ideally, with this yaml fragment, we can drive the config.go file, and the README, to contain a complete table of all config options. This comment is the closest to have with latest progress.

theletterf · 2023-07-12T10:04:02Z

That sounds great, @atoulme. Would this apply to metrics as well?

For settings, I was thinking of a schema like this:

name: <name_of_entity>
fields:
- name: <field_name>
  value_type: <data_type>
  default: <default_value>
  description: |
    <description>

For metrics, thinking also of APM instrumentations, it'd be something like:

name: <name_of_component_or_instrumentation>

metrics:
   <name_of_metric>:
     status: <default|custom|arbitrary_vendor_value>
     enabled: <true|false>
     type: <sum|gauge|counter|histogram|others>
       value_type: <int|string|...>
       monotonic: <true|false>
       aggregation: <cumulative|...>
     unit: <unit_of_measurement>
     description: <metric_description>
     attributes: [list_of_attributes]

dimensions:
   <name_of_dimension>:
      description: <description>
      properties:

resource_attributes:
  <name_resource_attribute>:
    description: <description>
    enabled: <true|false>
    value_type: <data_type>

attributes:
  <name_attribute>:
    description: <description>
    value_type: <data_type>
    enum: [possible_values_list]

mx-psi · 2023-07-13T12:53:54Z

It's unclear to me how to model the value_type. In principle, the value type may be any Go type at all with appropriate marshal and unmarshal functions. For example, how would we model the following?

Enum-like types such as configtelemetry.Level
Struct types like confignet.NetAddr that have special methods or stdlib opaque types like netip.Addr
Type aliases such as configopaque.String

It's also unclear to me how do we model Validate and Unmarshal while allowing for code generation. Do we generate a struct that is embedded in the actual configuration?

kevinslin · 2023-08-24T00:13:37Z

I recently used mdatagen to create jsonschemas of all the otel components in order to add intellisense support for otel configs in vscode.

One limitations of the current generated metadata is that it does not capture whether certain fields are optional or required. It also is unable to handle custom validation logic.

On the validation front - because ConfigValidator can contain arbitrary go code, its infeasible to convert it to a more declarative representation like jsonschema.

Apologies if this has been brought up in the past but curious if there was discussion in flipping the order and authoring configuration in something like jsonschema and generating go code based on it? Json schema seems to cover most of the component configurations I've seen in the wild, including the validations. If components really need arbitrary validation, that can still be kept as an escape hatch and documented in the jsonschema.

theletterf · 2023-08-29T08:41:26Z

Trying to galvanize this one a bit more. @chalin WDYT would be required to steer this initiative forward? Is there anything the docs SIG could do here?

kevinslin · 2023-08-29T14:09:07Z

I've done this sort of work in past projects (jsonschema -> code). If we want to move forward with this, happy to step in

theletterf · 2023-08-29T14:49:18Z

@kevinslin That sounds fantastic. What do you suggest?

kevinslin · 2023-08-29T22:07:23Z

High level flow:

create jsonschemas for all existing components (using mdatagen and some manual work)
create a spec and CLI tool for converting jsonschema to go config

at this point, minus custom validation logic, the jsonschema should generate the same config code that is manually written today

version both the jsonschema and the generated structs
update documentation that go over the new authoring process

In terms of arbitrary validation rules:

for simple validation rules, convert them to json schema (eg. check if field exists or some simple comparison option)
for complex validation that cannot be encoded in jsonschema, have an extra boolean field (eg. dynamicValidation: true) that indicates this

Ideally, we'd keep the exact same interfaces as exist today. Just generated via jsonschema instead of manually written.

Recently discovered that the SDK team is starting to do similar work for generating SDK configuration from jsonschema.
see open-telemetry/opentelemetry-go-contrib#4228 and open-telemetry/opentelemetry-java#5399

mx-psi · 2023-08-30T08:11:42Z

@kevinslin would be great if you can bring this up to one of the Collector SIG meetings (see here for the current times). Just add it to the agenda of a meeting you can attend and we can help you discuss the plan to ensure we are all aligned.

I am not familiar enough with jsonschema to answer this but I would like to see the questions I asked on #24189 (comment) resolved before moving forward to avoid being blocked mid-way

codeboten · 2023-08-30T16:39:07Z

I think there was some effort somewhat along the line of this in #13384

djaglowski · 2023-08-30T16:39:42Z

It's also unclear to me how do we model Validate and Unmarshal while allowing for code generation. Do we generate a struct that is embedded in the actual configuration?

As a data point, it appears we have roughly 20 custom unmarshal functions.

dashpole · 2023-08-30T16:42:58Z

One other note: Some components have configuration structs that live in another repository. E.g.:

opentelemetry-collector-contrib/exporter/googlecloudexporter/config.go

Lines 6 to 15 in 56ddbc3

    
           import ( 
        
           	"fmt" 
        
           	"github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector" 
        
           	"go.opentelemetry.io/collector/exporter/exporterhelper" 
        
           ) 
        
           // Config defines configuration for Google Cloud exporter. 
        
           type Config struct { 
        
           	collector.Config `mapstructure:",squash"`

bryan-aguilar · 2023-08-31T03:04:26Z

High level flow:
* create jsonschemas for all existing components (using `mdatagen` and some manual work)

* create a spec and CLI tool for converting jsonschema to go config
at this point, minus custom validation logic, the jsonschema should generate the same config code that is manually written today
* version both the jsonschema and the generated structs

* update documentation that go over the new authoring process
In terms of arbitrary validation rules:
* for simple validation rules, convert them to json schema (eg. check if field exists or some simple comparison option)

* for complex validation that cannot be encoded in jsonschema, have an extra boolean field (eg. `dynamicValidation: true`) that indicates this
Ideally, we'd keep the exact same interfaces as exist today. Just generated via jsonschema instead of manually written.

Recently discovered that the SDK team is starting to do similar work for generating SDK configuration from jsonschema. see open-telemetry/opentelemetry-go-contrib#4228 and open-telemetry/opentelemetry-java#5399

I was thinking about this a bit more after the collector sig this morning. By chance I was listening to a podcast that had some talking points on markup languages at the end and it made me wonder.

Do we really care what markup language is format is used to generate the config struct and thus the documentation? Do we only have to pick one, whether it be jsonschema, yaml (plz no)? There are other markup languages that exist, toml or cue, for example that could also work.

Do we want to take a higher level approach to this problem and instead make the source of truth configurable? The source of truth should be owned by the codeowners. As long as source of truth format -> code is possible do we need to standardize? Or can we choose a solution that looks something like the confmap provider package and allows us to translate from one format to another?

For example, component owner A wants to use JsonSchema and component owner b wants to use cue. It's the format they and their team are most familiar with. Should we cater to that scenario?

Ignoring my big comment that just asks a lot of questions and doesn't propose much other than a scope explosion to a small problem....I do like jsonschema and think it can be quite powerful.

kevinslin · 2023-09-04T16:44:58Z

Created an initial proposal for authoring component configuration using declarative schema. Would love to get any feedback, either in this issue or on the doc 🙏

☝️
@bryan-aguilar
@dashpole
@djaglowski
@codeboten
@mx-psi
@theletterf

theletterf · 2023-09-05T12:56:58Z

@kevinslin I love it. Just curious: how hard would it be to extend this model to trace instrumentation and other projects?

mx-psi · 2023-09-05T14:02:49Z

@kevinslin I love it. Just curious: how hard would it be to extend this model to trace instrumentation and other projects?

@theletterf See https://github.com/open-telemetry/opentelemetry-configuration for that :)

theletterf · 2023-09-05T14:12:57Z

@mx-psi Looks great, but how would Kevin's proposal intersect with the above? Would it be building upon it? Would then be the responsibility of each project maintainer to adopt those conventions and generate the files?

mx-psi · 2023-09-05T15:22:00Z

Looks great, but how would Kevin's proposal intersect with the above?

AIUI these are independent:

Kevin's proposal is about the Collector components and its configuration
The configuration WG is about how to configure trace/metrics/logs instrumentation

The only point of overlap when it comes to the Collector is configuring the Collector's own telemetry. We are working on that on open-telemetry/opentelemetry-collector/issues/7532

theletterf · 2023-09-06T08:03:43Z

Stake for tech writers and documentarians collaborating with the OTel projects is having a mostly similar mechanisms for producing and consuming reference documentation. Despite the differences in usage scenarios and stack, I think it'd be great to align as much as possible in the way the output is produced and presented—settings and metrics can be described in very similar ways after all. Not sure if my concern is clear enough though?

mx-psi · 2023-09-06T11:16:40Z

Stake for tech writers and documentarians collaborating with the OTel projects is having a mostly similar mechanisms for producing and consuming reference documentation.

This makes sense. @codeboten is involved both in the Configuration WG and in the Collector so we can use his input to ensure that we are aligned in that sense.

I think it'd be great to align as much as possible in the way the output is produced and presented—settings and metrics can be described in very similar ways after all. Not sure if my concern is clear enough though?

Not sure I understand this second part. Is this about what way to express things in jsonschema given several options to do so? Is this about having a similar configuration schema for configuration sections configuring the same thing?

theletterf · 2023-09-07T14:49:19Z

Not sure I understand this second part. Is this about what way to express things in jsonschema given several options to do so? Is this about having a similar configuration schema for configuration sections configuring the same thing?

More like the second. For example, settings: whether they are Collector components's settings or instrumentation settings, they could share the same structure with few non-overlapping extensions:

name: <name_of_entity>
fields:
- name: <field_name>
  value_type: <data_type>
  default: <default_value>
  description: |
    <description>

kevinslin · 2023-09-19T16:26:22Z

created an initial pr to go over some additional design decisions that have come up while doing the implementation > #27003

github-actions · 2023-11-20T03:30:10Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2024-01-19T05:19:26Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

theletterf added enhancement New feature or request needs triage New item requiring triage labels Jul 11, 2023

atoulme added documentation Improvements or additions to documentation and removed needs triage New item requiring triage labels Jul 11, 2023

theletterf mentioned this issue Jul 26, 2023

[Proposal] Automate reference documentation as YAML files open-telemetry/community#1610

Open

kevinslin mentioned this issue Sep 19, 2023

Auto Generation of Component Configuration from metadata.yaml #27003

Closed

github-actions bot added the Stale label Nov 20, 2023

github-actions bot added the closed as inactive label Jan 19, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2024

theletterf mentioned this issue Apr 4, 2024

Some content on the website seems a bit out of date open-telemetry/opentelemetry.io#4092

Open

LBF38 mentioned this issue Apr 5, 2024

Improve otel collector configuration w/ JSON schema open-telemetry/opentelemetry-collector#9769

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate reference documentation as YAML or JSON #24189

Automate reference documentation as YAML or JSON #24189

theletterf commented Jul 11, 2023 •

edited

atoulme commented Jul 11, 2023

theletterf commented Jul 12, 2023

mx-psi commented Jul 13, 2023 •

edited

kevinslin commented Aug 24, 2023

theletterf commented Aug 29, 2023

kevinslin commented Aug 29, 2023

theletterf commented Aug 29, 2023

kevinslin commented Aug 29, 2023 •

edited

mx-psi commented Aug 30, 2023

codeboten commented Aug 30, 2023

djaglowski commented Aug 30, 2023

dashpole commented Aug 30, 2023

bryan-aguilar commented Aug 31, 2023

kevinslin commented Sep 4, 2023

theletterf commented Sep 5, 2023

mx-psi commented Sep 5, 2023

theletterf commented Sep 5, 2023

mx-psi commented Sep 5, 2023

theletterf commented Sep 6, 2023

mx-psi commented Sep 6, 2023

theletterf commented Sep 7, 2023

kevinslin commented Sep 19, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Jan 19, 2024

Automate reference documentation as YAML or JSON #24189

Automate reference documentation as YAML or JSON #24189

Comments

theletterf commented Jul 11, 2023 • edited

atoulme commented Jul 11, 2023

theletterf commented Jul 12, 2023

mx-psi commented Jul 13, 2023 • edited

kevinslin commented Aug 24, 2023

theletterf commented Aug 29, 2023

kevinslin commented Aug 29, 2023

theletterf commented Aug 29, 2023

kevinslin commented Aug 29, 2023 • edited

mx-psi commented Aug 30, 2023

codeboten commented Aug 30, 2023

djaglowski commented Aug 30, 2023

dashpole commented Aug 30, 2023

bryan-aguilar commented Aug 31, 2023

kevinslin commented Sep 4, 2023

theletterf commented Sep 5, 2023

mx-psi commented Sep 5, 2023

theletterf commented Sep 5, 2023

mx-psi commented Sep 5, 2023

theletterf commented Sep 6, 2023

mx-psi commented Sep 6, 2023

theletterf commented Sep 7, 2023

kevinslin commented Sep 19, 2023

github-actions bot commented Nov 20, 2023

github-actions bot commented Jan 19, 2024

theletterf commented Jul 11, 2023 •

edited

mx-psi commented Jul 13, 2023 •

edited

kevinslin commented Aug 29, 2023 •

edited