Limit HTTP request method cardinality: original_method (option A) #17

lmolkova · 2023-05-12T18:15:31Z

Moving in open-telemetry/opentelemetry-specification#3478.

Changes

restricts http.request.method values to well-known names
requires to use other for unknown methods to prevent cardinality explosion if a malicious (or buggy) client sends custom and dynamic methods.
allows recording actual method value in the http.request.original_method

There was a discussion on the original PR and at HTTP SIG meeting, the summary and options are in the comments.

lmolkova · 2023-05-12T18:16:15Z

Option A (currently implemented in this PR)

http.request.method represents the canonical method, values are restricted on all signals.

Example:

Invalid method:
- http.request.method = other
- http.request.original_method = XYZZ (only if different from method)
Valid method:
- http.request.method = GET

Cons:

breaking for existing ECS users who populate it on logs.
it's necessary to check two fields to find the method: http.request.original_method first and if missing, check http.request.method

Possible solution: Different requirement levels for different signals

Logs: http.request.original_method could be required
Traces http.request.original_method could be conditionally required only if different from http.request.method

I.e. example of a valid request would be

http.request.method = GET
http.request.original_method is empty on traces and is GET on logs

lmolkova · 2023-05-12T18:16:32Z

Option B: `http.request.method` represents original method, `http.request.canonical_method` captures canonical one

PR: #34

Invalid method:

http.request.method = XYZZ
http.request.canonical_method = other

Valid method:

http.request.method = GET
http.request.canonical_method = GET

Cons: duplication (otherwise http.request.method would capture invalid methods only).

lmolkova · 2023-05-12T18:16:59Z

Option C: Single attribute `http.request.method`, but special rules for span names and metrics

PR: #35

Invalid method:

Traces:
- http.request.method = XYZZZ
- span name: HTTP other
Metrics: http.request.method = other
Logs: http.request.method = XYZZZ

Valid method:

Traces:
- http.request.method = GET
- span name: HTTP GET
Metrics: http.request.method = GET
Logs: http.request.method = GET

Cons:

Slightly different attribute semantics for metrics
Harder to do traces/logs-to-metrics aggregation

ruflin · 2023-05-15T10:52:28Z

Different requirement levels for different signals

I picked this from A but was thinking of option C when I first read this. I agree, different signals have potentially different requirements but still the same field should be used. For a trace, only valid fields should be accepted / are part of the standard. For logs the same fields is more lenient but it is still encouraged, to be strict.

With the above, we could get the best of both worlds. There is a way to express in the schema requirements for different signals but still optimise for bringing all the different signals together. It is then up to the storage engine / query engine of the different implementations to see how they deal with the mixed data if signals should be brought together.

The part I'm most worried about A is:

it's necessary to check two fields to find the method: http.request.original_method first and if missing, check http.request.method

I expect if we go with A, we will apply this solution in many different places. I'm not against having an additional field to store the original or canonical one if needed, but there must be a field where everything can be queried together.

HTTP SIG meeting,

@lmolkova I unfortunately couldn't join, so hopefully the above does not just repeat what was already discussed, otherwise please ignore.

lmolkova · 2023-05-15T20:54:53Z

Discussed during Semconv SIG meeting:

we should preserve attribute semantics consistency across signals (eliminating option C)
we can make orinigal_method required on HTTP logs semconv (or perhaps security events semconv?) once they are added

based on this, I updated the PR to make original_method conditionally required for traces. Once we have a log convention that should reference it, we can move the attribute to a common yaml.

@ruflin do you think the following approach could work:

a specific semconv for HTTP access/security log would define a set of required attributes
it can require http.request.original_method to always be present

I understand it's still a breaking change for ECS users since the attribute is being renamed. It seems the alternatives listed above would not work in the long term once we take other signals, cardinality, and minimal duplication into consideration.

ruflin · 2023-05-16T08:58:06Z

I'm thinking of the sem-conv like an onion. In the center, we have the shared semconv across all signals. It is the most lenient version of it. Each layer of the onion can become more strict (but not more lenient). What this means, http.request.method would be specified as SHOULD use GET etc. and not MUST. Then traces or metrics could use MUST. I tried to create a quick (ugly) diagram for this:

For me this adheres to the base principle mentioned above: "we should preserve attribute semantics consistency across signals".

The main lines in the PR we are discussing are the following. The discussion is if it should be MUST or SHOULD.

If the HTTP request method is not known to the instrumentation, it MUST set the `http.request.method` attribute to `other` and SHOULD populate the exact method passed by client on `http.request.original_method` attribute.

HTTP method names are case-sensitive and `http.request.method` attribute value MUST match a standard (or documented elsewhere) HTTP method name exactly.

If there is agreement in the group to proceed with the proposal, I don't want to block it. One of the reason is, that from my perspective the schema can become more lenient without a breaking change but not the other way around. So my proposed change could still be made later on.

jsuereth · 2023-05-16T11:56:40Z

@ruflin I think there's a few points maybe not getting across from the semconv WG here.

There are two MAIN uses cases in OTEL that need to be addressed here, and are different than ECS.

X-Signal joins. If signals have different meaning/values for the SAME attributes between logs/metrics/traces it makes cross signal queries and dashboarding much harder. This is one of the big "wins" for using OTEL over individual instrumentation is that we offer signals that can be joined and understood in unison.
Instrumentation. When you go to produce metrics,traces,logs it can be MUCH easier to write a common set of signals to Context in OpenTelemetry and the pull those attributes as you generate each signal. For this, attributes MUST have the same value wherever they're used.

As such, in terms of design space, we CAN play with the following:

Some signals have more attributes than others. E .g. metrics require low cardinality, so we expect their attribute set to always be a subset of similar logs/events/spans.
We CAN specify having multiple attributes with nuanced meaning, e.g. http.method vs. http.raw_method

But the "layer of the onion" with strictness analogy doesn't really work here. I like the layer of the onion, but think about it in terms of common key-value that can be shared across signals, with each signal being ALLOWED to have more attributes if needed, and each "layer" of your o11y pipeline being able to add more attributes (e.g. otel collector enhancing telemetry with extra information).

Given the need for metric cardinality concerns, I think http.method needs to be the core in the onion with low-cardinality restrictions (what @lmolkova has in this PR). We can then add additional attributes to Logs (where higher cardinality is acceptable) without breaking metric<->log correlation.

joaopgrassi · 2023-05-16T11:57:14Z

I read through the original PR discussions and will try to summarize what I understood

http.request.method can receive non well-known values
Which leads to cardinality problems on metrics BUT also to spans, since we recommend using that for the span name
If I do aggregation on my span names, then this will also break.

I like the direction of option A here, except, and I quote/agree with @ruflin's comment on the original PR:

having Get under OTHER sounds also confusing to me.

I would be OK with XYZ becoming other but Get I'm not sure. Putting myself in a OTel user shoes, I would rather expect the library/backend to figure this out for me and "map" Get to GET or something standard.

Then I'd like to maybe bring again the suggestion from @Oberon00 as I think it has good value and it was something that also came to mind while I was reading all. With this, we essentially introduce a new attribute and keep recording http.request.method as-is today:

Instead of adding original_method with the unaltered method, I'd suggest adding method_kind with the normalized method and keeping method aligned with ECS and old OTel. Then we only need an adjustment to the span name wording to be generated like method_kind, and probably requirement levels for metrics. This would also align with the proposed http.status_code_class from

Option D

http.request.method represents the original "as-is". We introduce a new attribute http.request.method_canonical that represents the normalized/standard/canonical and basically has the value from https://www.rfc-editor.org/rfc/rfc9110#name-method-definitions

Example:

Invalid method:
- http.request.method = XYZZ
- http.request.method_canonical = INVALID
Non-normalized:
http.request.method = Get
http.request.method_canonical = GET
Valid method:
- http.request.method = GET
- http.request.method_canonical = GET

Pros

Metric semconv is experimental, so we can still change it to use http.request.method_canonical instead of
We address the cardinality problem also present in span names with http.request.method today
We don't diverge from ECS, if I got it right, no problem with ecs users and logs with this approach

Cons:

In the case of Valid method we have a duplication
We will have to change the recommendation of how we populate Span name. It might be "non-breaking" for users, if libraries were already reporting the "canonical" values but maybe not

For the name of the "canonical" attribute, chatGPT suggested

http.request.method.normalized
http.request.method.standard
http.request.method.canonical
http.request.method.normalized_value
http.request.method.normalized_method

ruflin · 2023-05-16T12:18:32Z

There are two MAIN uses cases in OTEL that need to be addressed here, and are different than ECS.

I think Otel and ECS are perfectly aligned here. ECS was built to exactly allow these cross signal queries. It is more a question on how we approach it. @jsuereth Please let me know if I miss here some fundamental differences.

For Elasticsearch the lenient approach works really well, because even in the data is mixed on ingest time (get, GET, GeT) I can still do strict queries. I understand, other storage engines might not support this that is why there is a preference to be strict in the core and instead deal with it by adding additional attributes. My concern with the attributes addition is that assuming we will do this for quite a few fields, the number of fields in the standard will explode quickly.

If the "addition" of attributes is the path forward, I like Option D proposed by @joaopgrassi

@lmolkova @jsuereth Sorry for the extra rounds on this one. But I think if we settle this it will become easy to apply the same logic to all the future changes by just referring to the discussion here.

Oberon00 · 2023-05-16T12:53:55Z

@joaopgrassi

I would be OK with XYZ becoming other but Get I'm not sure. Putting myself in a OTel user shoes, I would rather expect the library/backend to figure this out for me and "map" Get to GET or something standard.

I think somebody else has mentioned this already but it depends on the particular HTTP implementation if Get is treated like GET or like an unknown method. In the latter case, it would be extremely confusing if instrumentations mapped it to GET. However, if the instrumentation has the knowledge whether the instrumented server maps it itself, or ideally even has access to a pre-normalized method, then it could do some sort of "smart" mapping.

epixa · 2023-05-16T14:53:27Z

I'm strongly in favor of Option D. This addresses the concerns I raised in the previous PR about unexpected behavior for security users, but most critically I think it's also the ideal user experience when consuming this data. I hate the idea of users having to make decisions about which fields they query based on the value of other fields, so I like that if a user wants to query the original value then they use the base field, and if they want to accept some context loss in order to benefit from low cardinality, they use that field. In both cases, they just use the one field that makes sense and it just works.

This does mean that we're duplicating data, but I think that should be acceptable if it's truly in the best interests of the overall user experience of consuming the data. Storage mechanisms like Elasticsearch could implement solutions to ensuring duplicate data like this does not incur twice the storage requirements overall - that many don't should be a factor to consider, but it shouldn't be the deciding factor IMO.

lmolkova · 2023-05-16T16:44:35Z

As a comparison, I have implemented option B (which seems to be option D from @joaopgrassi proposal, but canonical method stays case-sensitive) in #34. PTAL and provide feedback or approve if you're onboard.

reyang · 2023-05-16T16:47:26Z

Option E: not doing anything special - if we get a weird HTTP method "XYZ", just keep it.

And here are the reasons:

Doing nothing special is simple.
For logs/traces, keeping the original value would help the user to understand what's going on, if they see something like "OTHER" / "other" / "INVALID" - it'll be hard to dig deeper unless the information is stored somewhere else (e.g. another key-value pair).
For metrics, cardinality problem would exist for any metrics and dimensions, it'll be better to have a generic solution instead of special solutions for each type of data or dimensions (e.g. SQL "INSERT" / "UPDATE" / "DELETE"). I don't think semantic conventions should mandate the cardinality (we can encourage best practices).

lmolkova · 2023-05-16T17:05:55Z

Option E: not doing anything special - if we get a weird HTTP method "XYZ", just keep it.

@reyang it's probably fine for clients, but makes it easy to create some form of ddos attack on metrics when server does not sanitize it. Anyone can send unauthorized requests with random methods. Presumably, there are just a few points for an invalid method and they're not really interesting to anyone (i.e. do not skew any other data), how bad would it be to have a lot of (but quite short) time series?

reyang · 2023-05-16T17:17:32Z

@reyang it's probably fine for clients, but makes it easy to create some form of ddos attack on metrics when server does not sanitize it. Anyone can send unauthorized requests with random methods. Presumably, there are just a few points for an invalid method and they're not really interesting to anyone (i.e. do not skew any other data), how bad would it be to have a lot of (but quite short) time series?

Like I explained in #17 (comment), metrics cardinality is a general problem - nothing specific to HTTP methods. I think we should tackle this issue in the metrics API/SDK space so it can lift all boats (e.g. #2960 is a great example / starting point).

lmolkova · 2023-05-16T18:11:16Z

For logs/traces, keeping the original value would help the user to understand what's going on, if they see something like "OTHER" / "other" / "INVALID" - it'll be hard to dig deeper unless the information is stored somewhere else (e.g. another key-value pair).

For metrics, cardinality problem would exist for any metrics and dimensions, it'll be better to have a generic solution instead of special solutions for each type of data or dimensions (e.g. SQL "INSERT" / "UPDATE" / "DELETE"). I don't think semantic conventions should mandate the cardinality (we can encourage best practices).

@reyang essentially what you're saying is Option C, but generalized beyond this specific attribute with solution postponed till after HTTP semconv stability? correct?
The proposal might look like "Metric instrumentations SHOULD populate canonical method and populate unknown methods as 'other'" Correct?

Without it, cardinality limits would protect from DDOS attacks, but can make time series that got through less useful.

reyang · 2023-05-16T18:26:27Z

@reyang essentially what you're saying is Option C, but generalized beyond this specific attribute with solution postponed till after HTTP semconv stability? correct?

Yes. I think this is how semantic conventions should be positioned:

HTTP method has a preferred list of values - e.g. "GET", "POST", "PUT", etc.
The typical cardinality is around 10.
The semconv should not say something like "values not listed here MUST be converted to ..." or "values not listed here MUST be dropped" or anything strong (with "MUST" wording).
In the future we might say "instrumentations should provide hint/advice about the preferred list of values".

The API/SDK spec should be positioned to solve the cardinality/reliability problem:

The API should provide a way (e.g. hint/advice) to indicate the cardinality of certain attributes, and what are the preferred values vs. noise.
The SDK should provide mechanisms to control how to deal with noise and cardinality pressure (e.g. dim-capping).
The SDK should provide a reasonable default behavior.

lmolkova · 2023-05-16T18:58:41Z

@reyang essentially what you're saying is Option C, but generalized beyond this specific attribute with solution postponed till after HTTP semconv stability? correct?

Yes. I think this is how semantic conventions should be positioned:

HTTP method has a preferred list of values - e.g. "GET", "POST", "PUT", etc.

The typical cardinality is around 10.

The semconv should not say something like "values not listed here MUST be converted to ..." or "values not listed here MUST be dropped" or anything strong (with "MUST" wording).

In the future we might say "instrumentations should provide hint/advice about the preferred list of values".

The API/SDK spec should be positioned to solve the cardinality/reliability problem:

The API should provide a way (e.g. hint/advice) to indicate the cardinality of certain attributes, and what are the preferred values vs. noise.

The SDK should provide mechanisms to control how to deal with noise and cardinality pressure (e.g. dim-capping).

THe SDK should provide a reasonable default behavior.

There was a general push-back to have different semantics for different signals. Dim-capping on metrics can be considered as such inconsistency. cc @jsuereth @arminru

Still, I drafted it here: open-telemetry/opentelemetry-specification#35 with the caveat that I don't believe instrumentation should provide hints on HTTP methods since we have a standard and common 10 methods that are used in the absolute majority of requests.

the drawback of this change is that span name is not quite as specific as it was before, but I don't see a huge harm even if the span name is of high cardinality for invalid requests. group-by, top-N queries should still work great

semantic_conventions/http-common.yaml

joaopgrassi

I guess we need a changelog for this now :)

trask · 2023-06-19T19:49:22Z

@Oberon00 do you have any remaining concerns that you'd like to see addressed before merging?

semantic_conventions/http-common.yaml

Oberon00 · 2023-06-20T07:13:48Z

I guess we are aware that this will basically pre-determine the general "_original" naming convention and the "_OTHER" value that open-telemetry/opentelemetry-specification#117 asks for, as soon as HTTP is marked stable + a release is cut. I think that's fine from a priority perspective.

…ditionally required

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

Co-authored-by: Christian Neumüller <christian+github@neumueller.me>

lmolkova requested review from a team as code owners May 12, 2023 18:15

github-actions bot assigned reyang May 12, 2023

lmolkova mentioned this pull request May 12, 2023

BREAKING: Limit HTTP method values to closed set (for cardinality reason) open-telemetry/opentelemetry-specification#3478

Closed

lmolkova force-pushed the http-method-cardinality branch from 74f8313 to c750799 Compare May 15, 2023 20:32

lmolkova requested a review from a team as a code owner May 15, 2023 20:32

mateuszrzeszutek approved these changes May 16, 2023

View reviewed changes

carlosalberto approved these changes May 16, 2023

View reviewed changes

lmolkova mentioned this pull request May 16, 2023

Limit HTTP request method cardinality: canonical method (Option B/D) #34

Closed

lmolkova changed the title ~~Limit HTTP request method cardinality~~ Limit HTTP request method cardinality: original_method (option A) May 16, 2023

lmolkova mentioned this pull request May 16, 2023

Limit HTTP request method cardinality: use one attribute (option C/E) #35

Closed

jsuereth mentioned this pull request May 22, 2023

Publish a first release #37

Closed

trask reviewed Jun 15, 2023

View reviewed changes

semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved

ruflin mentioned this pull request Jun 15, 2023

What capitalization of field names to use for trace context in non-OTLP Log Formats open-telemetry/opentelemetry-specification#3518

Closed

lmolkova force-pushed the http-method-cardinality branch from 69adeaf to c5ddb40 Compare June 15, 2023 20:26

lmolkova mentioned this pull request Jun 15, 2023

Document general attribute normalization conventions #117

Open

joaopgrassi approved these changes Jun 16, 2023

View reviewed changes

lmolkova force-pushed the http-method-cardinality branch from ce7fa6b to 0c77dee Compare June 16, 2023 16:36

reyang mentioned this pull request Jun 19, 2023

What to do with non well-known http.method values? open-telemetry/opentelemetry-specification#3470

Closed

Oberon00 reviewed Jun 20, 2023

View reviewed changes

semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved

semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved

lmolkova and others added 12 commits June 20, 2023 08:55

Limit HTTP request method cardinality

f73fe33

lint

4c5dfe8

move http.request.original_method to trace attributes and make it con…

83646aa

…ditionally required

Update specification/trace/semantic_conventions/http.md

06952bb

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

clean up and add env var

773d1fc

lint

60c2096

addresses part of the feedback

90865e9

other to _OTHER?

1350010

changelog

18dd006

Update semantic_conventions/http-common.yaml

58ca441

Co-authored-by: Christian Neumüller <christian+github@neumueller.me>

Update semantic_conventions/http-common.yaml

e2c6724

Co-authored-by: Christian Neumüller <christian+github@neumueller.me>

generate md

35a44be

lmolkova force-pushed the http-method-cardinality branch from 68a7d63 to 35a44be Compare June 20, 2023 15:57

Oberon00 approved these changes Jun 20, 2023

View reviewed changes

reyang merged commit be0fab1 into open-telemetry:main Jun 20, 2023
9 checks passed

mateuszrzeszutek mentioned this pull request Jun 21, 2023

Support the http.request.method_original attribute open-telemetry/opentelemetry-java-instrumentation#8779

Merged

lmolkova mentioned this pull request Oct 11, 2023

Solving http.request.method cardinality (and similar) with metrics hints API #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit HTTP request method cardinality: original_method (option A) #17

Limit HTTP request method cardinality: original_method (option A) #17

lmolkova commented May 12, 2023

lmolkova commented May 12, 2023 •

edited

lmolkova commented May 12, 2023 •

edited

lmolkova commented May 12, 2023 •

edited

ruflin commented May 15, 2023

lmolkova commented May 15, 2023

ruflin commented May 16, 2023

jsuereth commented May 16, 2023 •

edited

joaopgrassi commented May 16, 2023

ruflin commented May 16, 2023

Oberon00 commented May 16, 2023 •

edited

epixa commented May 16, 2023 •

edited

lmolkova commented May 16, 2023

reyang commented May 16, 2023 •

edited

lmolkova commented May 16, 2023 •

edited

reyang commented May 16, 2023 •

edited

lmolkova commented May 16, 2023

reyang commented May 16, 2023 •

edited

lmolkova commented May 16, 2023

joaopgrassi left a comment

trask commented Jun 19, 2023

Oberon00 commented Jun 20, 2023

Limit HTTP request method cardinality: original_method (option A) #17

Limit HTTP request method cardinality: original_method (option A) #17

Conversation

lmolkova commented May 12, 2023

Changes

lmolkova commented May 12, 2023 • edited

Option A (currently implemented in this PR)

lmolkova commented May 12, 2023 • edited

Option B: http.request.method represents original method, http.request.canonical_method captures canonical one

lmolkova commented May 12, 2023 • edited

Option C: Single attribute http.request.method, but special rules for span names and metrics

ruflin commented May 15, 2023

lmolkova commented May 15, 2023

ruflin commented May 16, 2023

jsuereth commented May 16, 2023 • edited

joaopgrassi commented May 16, 2023

Option D

ruflin commented May 16, 2023

Oberon00 commented May 16, 2023 • edited

epixa commented May 16, 2023 • edited

lmolkova commented May 16, 2023

reyang commented May 16, 2023 • edited

lmolkova commented May 16, 2023 • edited

reyang commented May 16, 2023 • edited

lmolkova commented May 16, 2023

reyang commented May 16, 2023 • edited

lmolkova commented May 16, 2023

joaopgrassi left a comment

Choose a reason for hiding this comment

trask commented Jun 19, 2023

Oberon00 commented Jun 20, 2023

lmolkova commented May 12, 2023 •

edited

lmolkova commented May 12, 2023 •

edited

Option B: `http.request.method` represents original method, `http.request.canonical_method` captures canonical one

lmolkova commented May 12, 2023 •

edited

Option C: Single attribute `http.request.method`, but special rules for span names and metrics

jsuereth commented May 16, 2023 •

edited

Oberon00 commented May 16, 2023 •

edited

epixa commented May 16, 2023 •

edited

reyang commented May 16, 2023 •

edited

lmolkova commented May 16, 2023 •

edited

reyang commented May 16, 2023 •

edited

reyang commented May 16, 2023 •

edited