Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8: Content negotiation (encoding side) #570

Merged
merged 1 commit into from Feb 6, 2024

Conversation

ywwg
Copy link
Contributor

@ywwg ywwg commented Jan 18, 2024

This PR builds on #537 by adding content negotation during scraping.

Scrapers can announce that they support utf-8 names by adding "escaping=allow-utf-8" in the Content-Type header. In cases where utf8 is not available, metric providers can be configured to escape names in a few different ways: values (U__ utf value escaping for perfect round-tripping), underscores (all invalid chars become _), dots (dots become _dot_, _ becomes __, all other values become __). Escaping can either be a global default (model.NameEscapingScheme) or can also be specific in Content-Type with the "escaping=" term, which can be "allow-utf-8" (for UTF8-compatible), "underscores", "dots", or "values".

This is still a noop for existing configurations because scrapers will not be passing the escaping key in the Content-Type header. Existing functionality is maintained.

@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from eef9af4 to f4d7f87 Compare January 23, 2024 15:59
@ywwg ywwg marked this pull request as ready for review January 23, 2024 16:34
@ywwg
Copy link
Contributor Author

ywwg commented Jan 23, 2024

This will be squashed :)

expfmt/encode.go Outdated
}
}
return FmtText
return FmtText_0_0_4 + Format(fmt.Sprintf("; escaping=%s", model.EscapeValues))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this specifies a default escaping scheme when none other is specified. not sure if this is the right thing to do

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General direction looks OK. I will only manage to do a code level review next week.

Some broad thoughts:

First the versioning. See the individual comment below. It feels weird to pretend to support a version that doesn't exist yet. (But as said, I'm not an expert and I don't know how this is usually done.)

Another gut feeling: validchars and escaping never show up together. Maybe they can be merged into one option? escaping=none could take the role of validchars=utf8, and escaping=values is the default. (Maybe that default should be part of the (upcoming) standard and hardwired in the code. It feels weird that the default is set via a package variable and that would change the behavior for the same accept header.)

expfmt/expfmt.go Outdated
OpenMetricsType = `application/openmetrics-text`
OpenMetricsVersion_0_0_1 = "0.0.1"
OpenMetricsVersion_2_0_0 = "2.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for the classic text format, anything goes because it hasn't really been formally specified, but here I would definitively add a comment that this doesn't imply that we are really implementing OM v2 from the future.

On a more general note, I'm wondering if pretending we are supporting a version of the standard that doesn't exist yet is the right way to go. I'm not an expert in those content-type shenanigans. Maybe we should add some "alpha" or "pre" in here?

expfmt/expfmt.go Outdated Show resolved Hide resolved
@beorn7
Copy link
Member

beorn7 commented Jan 25, 2024

Another thought: Maybe we don't need to change version numbers at all right now. Any escaping that is not none will produce valid classic text format or OMv1. So we could just document the escaping=none (currently validchars=utf8, see comment above) as an experimental feature that creates a slightly different format, that will be part of a future standard. We can later decide if we want to keep supporting the "illegal" combination of OMv1 and escaping=none, but that's still better than asking clients to negotiate OMv2 although OMv2 will certainly look much differently once it really exists.

@ywwg
Copy link
Contributor Author

ywwg commented Jan 26, 2024

I like the idea of using escaping to also select for utf8 mode. maybe instead of "none" we call it escaping=utf8-ok or something like that, to make it more explicit?

@ywwg
Copy link
Contributor Author

ywwg commented Jan 26, 2024

can't remember if dashes are allowed in content-type parameters though

@beorn7
Copy link
Member

beorn7 commented Feb 1, 2024

Yes, let's do that. I.e. only one parameter escaping that defines the behavior (missing = old behavior, any escaping schema = do the escaping, allow-utf-8 or something = use the new format).

I would also say, for now, that we don't change the version numbers but "tolerate" that the client can ask for allow-utf-8 and we pragmatically create an exposition format that is technically invalid. Just add a comment in the code that says that this is just for experimentation and that there will be new versions of the expositions formats eventually that properly describe the new syntax and should be selected whenever UTF-8 names are requested.

And yes, dashes are allowed in the content type, we use it for charset=utf-8 and encoding=compact-text.

The former reminds me that we should try to be consistent and spell UTF-8 as UTF-8 where possible.

@beorn7
Copy link
Member

beorn7 commented Feb 1, 2024

The Go test errors seem to have to do with Go modules. Probably need to run go mod tidy or something.

Could you fix that and work in the ideas above? I think the code otherwise is technically fine. I'll do a final review once that's done.

@ywwg
Copy link
Contributor Author

ywwg commented Feb 5, 2024

notes should all be addressed

@ywwg
Copy link
Contributor Author

ywwg commented Feb 5, 2024

oh except for adding comments

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I assume you will squash the commits yourself and then take care of DCO signing.

expfmt/encode.go Outdated Show resolved Hide resolved
model/metric.go Outdated Show resolved Hide resolved
Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from 78b2c9a to 9b1ede5 Compare February 6, 2024 16:10
@ywwg
Copy link
Contributor Author

ywwg commented Feb 6, 2024

squashed

@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from 9b1ede5 to 319c62c Compare February 6, 2024 16:11
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.

@beorn7 beorn7 merged commit 773d566 into prometheus:main Feb 6, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants