Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added remaining ECS fields to the useragent #452

Merged
merged 15 commits into from
Jan 24, 2024

Conversation

trisch-me
Copy link
Contributor

@trisch-me trisch-me commented Oct 26, 2023

Changes

This PR adds remaining 3 fields from ECS to the user-agent namespace

Merge requirement checklist

@trisch-me
Copy link
Contributor Author

@jsuereth to answer your questions from my previous PR about possible multiple names/versions.
I have read some docs about user-agent formatting and creating. It can present multiple information - about browser, OS, device etc, but there is only one browser name/version which we should put into fields. It's a bit complicated and there are specific libs to extract that data from user-agent but from what I have read there should be no multiple versions or names.

@trask
Copy link
Member

trask commented Oct 26, 2023

cc @open-telemetry/sandbox-web-js-maintainers

@martinkuba
Copy link
Contributor

There are some existing definitions that might be duplicated (or at least not clear which one should be used) -

  • There is the definition for device.model.name in the device resource namespace.
  • And there are also the attributes in the browser namespace. These are intended to work with the User-Agent Client Hints API, but contain the same information.

@trask
Copy link
Member

trask commented Nov 8, 2023

There are some existing definitions that might be duplicated (or at least not clear which one should be used) -

  • There is the definition for device.model.name in the device resource namespace.
  • And there are also the attributes in the browser namespace. These are intended to work with the User-Agent Client Hints API, but contain the same information.

based on this, it's not clear to me whether the new attributes in this PR are needed

@AlexanderWert
Copy link
Member

AlexanderWert commented Nov 8, 2023

There are some existing definitions that might be duplicated (or at least not clear which one should be used) -

  • There is the definition for device.model.name in the device resource namespace.
  • And there are also the attributes in the browser namespace. These are intended to work with the User-Agent Client Hints API, but contain the same information.

based on this, it's not clear to me whether the new attributes in this PR are needed

I think the important difference here is that both device.* (device.model.name) and also browser.* attributes are describing the resource on which the data is being collected. user_agent.* attributes are rather meant to describe the user agent properties (broken down / parsed into individual properties) on signals that happen outside that browser or device.

A great example is NGINX (or webserver in general) access logs: As a user you would like to group the access logs / requests by device name / version, etc.

Plus, we are adding attributes to the registry here and not proposing their usage in existing semantic conventions with that PR. But, rather enable logging use cases and migration of ECS-based use cases into OTel.

@trask
Copy link
Member

trask commented Nov 8, 2023

I think the important difference here is that both device.* (device.model.name) and also browser.* attributes are describing the resource on which the data is being collected. user_agent.* attributes are rather meant to describe the user agent properties (broken down / parsed into individual properties) on signals that happen outside that browser or device.

could the browser.* attributes be used also on spans/logs? (it seems like this could improve correlation with client-side telemetry)

@martinkuba
Copy link
Contributor

The NGINX use case is a great example, and I think these attributes make more sense there than re-using the browser namespace, so 👍 to adding these. I mostly wanted to clarify if there is still a use case for user-agent name/version, given that relying on the user-agent string in browsers is problematic and is being replaced with the new User-Agent Client Hints API.

could the browser.* attributes be used also on spans/logs? (it seems like this could improve correlation with client-side telemetry)

I suppose the browser attributes could be used as a source for the user-agent name/value attributes, which is better than parsing the user-agent string.

I am not sure about device name. That still seems like a duplicate to me.

@trisch-me
Copy link
Contributor Author

given that relying on the user-agent string in browsers is problematic and is being replaced with the new User-Agent Client Hints API.

I see that this is still a draft, which means it might change/take a long time to stabilise and even after release it might take some (long) time to update existing use cases to rely on it instead of existing user-agent string.

I suppose the browser attributes could be used as a source for the user-agent name/value attributes, which is better than parsing the user-agent string.

How we could use browser attributes as a source if we only have user-agent string in logs for example? Like in nginx case described before? Maybe I'm missing something there?

In addition these fields are extracted from user-agent string to make later search or analysing the data easier, because the string is parsed already and most important data is extracted. Also these fields are added into registry so people could use them if needed and that gives ECS use cases a green light. Please correct me if I'm missing something here

@martinkuba
Copy link
Contributor

How we could use browser attributes as a source if we only have user-agent string in logs for example? Like in nginx case described before? Maybe I'm missing something there?

This was in response to @trask's question of using browser attributes on spans/logs to correlate client telemetry. I am not sure if it makes sense to copy the browser attributes, but adding user_agent name/version would; and the values could come from either parsing the user_agent string or the browser resource attributes. This is probably a separate conversation.

All that to say, I am fine with adding these attributes 👍

@AlexanderWert
Copy link
Member

All that to say, I am fine with adding these attributes 👍

Thanks for the clarification, @martinkuba . In that case would you mind to approve the PR technically?

@martinkuba
Copy link
Contributor

Thanks for the clarification, @martinkuba . In that case would you mind to approve the PR technically?

I am fine with user_agent.name and user_agent.version, but I am still not clear whether the device name is duplicated.

@trisch-me trisch-me requested a review from a team as a code owner December 7, 2023 13:54
@trisch-me
Copy link
Contributor Author

@martinkuba I have moved it out of this PR not to block other fields to be added. I will create a dedicated PR for this field to discuss it separately

@jsuereth
Copy link
Contributor

@jsuereth to answer your questions from my #418 (comment) about possible multiple names/versions.
I have read some docs about user-agent formatting and creating. It can present multiple information - about browser, OS, device etc, but there is only one browser name/version which we should put into fields. It's a bit complicated and there are specific libs to extract that data from user-agent but from what I have read there should be no multiple versions or names.

Unfortunately, this isn't true for gRPC, see e.g. https://stackoverflow.com/questions/60527092/how-to-specify-a-user-agent-header-for-grpc-java-client

It's expected gRPC users will add their app version and gRPC will append its version.

@trisch-me
Copy link
Contributor Author

It's expected gRPC users will add their app version and gRPC will append its version.

I have checked what will be returned for provided user-agent in example from SO post YourApp/1.0.0 grpc-java-okhttp/1.27.2 - existing user-agent parsing libraries can't parse it, because they are looking for browser's data.
I have also a note about browser in brief section. Can we/should we restrict those name and version to only browser if available?

Or do you think we should expand user-agent to support all possible products inside user-agent? I think it might depend on implementation of particular product on how do they present it in user-agent.

@joaopgrassi
Copy link
Member

Or do you think we should expand user-agent to support all possible products inside user-agent? I think it might depend on implementation of particular product on how do they present it in user-agent.

@trisch-me given this comes from ECS, how was the gRPC "problem" handled there? I'm sure probably folks using ECS ran into that or?

@jsuereth
Copy link
Contributor

It's expected gRPC users will add their app version and gRPC will append its version.

I have checked what will be returned for provided user-agent in example from SO post YourApp/1.0.0 grpc-java-okhttp/1.27.2 - existing user-agent parsing libraries can't parse it, because they are looking for browser's data. I have also a note about browser in brief section. Can we/should we restrict those name and version to only browser if available?

Or do you think we should expand user-agent to support all possible products inside user-agent? I think it might depend on implementation of particular product on how do they present it in user-agent.

I actually think both are viable solutions here. At a minimum if you currently restrict what you have to just the browser, and we have an open ticket to expand further to gRPC-like use cases, then I'd be ok merging, and we can look at the broader problem later.

My ideal solution would be we define use agent name version to be "the outer most" or "most important" name/version found in the string. For YourApp/1.0.0 grpc-java-okhttp/1.27.2 I think this means YourApp and 1.0.0. However, it's ok to make progress and open a ticket to resolve this later if we tighten up the current wording.

@trisch-me
Copy link
Contributor Author

given this comes from ECS, how was the gRPC "problem" handled there? I'm sure probably folks using ECS ran into that or?

@joaopgrassi I have talked with other folks from Elastic, and we found no such usage in our codebase. However, this doesn't necessarily mean that our customers haven't used it in that way, they might have chosen, for example, the "most dominant" or "most important" out of possible options.

@trisch-me
Copy link
Contributor Author

@jsuereth thanks for clarification. I would vote then to have this PR to be limited to browser first and will file new PR to support other apps, I have created an issue to track it.

That way we could have already those field inside schema and discuss further development.

@trisch-me
Copy link
Contributor Author

@AlexanderWert @jsuereth I have fixed conflicts. This one is good to go and I need these changes to be merged in order to have another PR opened for discussions.

CHANGELOG.md Outdated Show resolved Hide resolved
trisch-me and others added 2 commits January 24, 2024 09:34
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
@jsuereth jsuereth merged commit c3e35ee into open-telemetry:main Jan 24, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants