-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support array of strings as a value for tags/labels #82
Comments
I don't think that this is compatible with ECS: {
"tags": {
"teams": ["team1","team2"]
}
} Not sure about that version: {
"labels": {
"teams": ["team1","team2"]
}
} I would argue it's not because it's not a key/value pair. But it's probably debatable if non-scalar values are allowed. However, this version would be compatible with ECS: {
"tags": ["team1","team2"]
} This version would also solve the specific use case this has been requested for, which is to consistently |
@felixbarny , I think this is the same to what I called "alternative approach" in the issue description. So to be clear the second approach is to support both labels for key/values and tags for string arrays, is that what you meant? It seems redundant to me to support both, but we definitely can do that as well! |
@ruflin Do you know about prior art regarding labels with multiple values? Is that something you'd consider legal from an ECS perspective? |
@felixbarny I would consider the following "legal" from an ECS perspective:
The reason is that any field in Elasticsearch (Lucene) can be/is an array. Even What I would consider "illegal" (😆 ) from an ECS perspective is:
This changes the mapping type of ECS. |
I agree with @felixbarny and @ruflin that the suggested way of storing tags is not ECS compatible. When the argument is consistency with ECS, I don't see why having both, |
I'm assuming that we will at some point rename anything with "tags" to "labels" on the agent side, i.e.
I'm ok with supporting both Tags and Labels as well, however, supporting ECS style tags is a breaking (if we keep the same name e.g. We can take the following path: We add |
If I recall correctly, the reason why we renamed tags to labels was that tags in ECS were just an array of values with no keys, whereas our definition of tags was "key: value". For that format, the labels in ECS was a better fit - hence the rename. If labels support that one label key can have multiple values stored as an array, we could technically add support for it, but I'm not sure what the use-case for our users would be? I find having multiple label values for the same label key confusing 🤔 |
No suggestion yet, but let's separate terminology and review how agents, server and ES behave at the moment. Agents and Server currently only process and index Agents
python:
ruby:
rum:
go
java
.Net
APM Server Intake API
ES
This means that the |
What @watson said (replace |
I meant to give a neutral summary of where we are right now, as I found the arguments on what is used where and what would be a breaking change hard to follow. (No need for explanation or defense for already changing to |
Maybe this helps: Having an array here would be a cleaner solution. |
@webmat We discussed this issue in our agents meeting yesterday. We were wondering if you could comment on the compatibility of the proposed format for @simitt Would supporting the format mentioned in this comment entail a lot of work on the server side? |
I concur with @ruflin, nesting keys under In recent months there's been an increasing demand to clearly identify which ECS fields are expected to be arrays. Even if it's not necessary for Elasticsearch itself (there's no way to identify that in a mapping), it's helpful to clarify that expectation for consumers of the data, as well as for libraries that are mapping the schema to various programming languages (see https://github.com/elastic/ecs-logging). A PR for this is in progress (elastic/ecs#727), and Sorry for the unexpected wall of text here :-) But long story short, supporting arrays in Another point: the mapping for |
Makes sense Mat. In APM, label values can be of type string, Boolean, or number. This allows, for example, to create graphs based on label values. Any plans to support that with ECS? |
The current approach of having labels be The reality is that both the sample ECS Elasticsearch template and the Beats templates I've looked at have this field as "labels" : {
"type" : "object"
} So I confirmed the behaviour of subkeys of
The mapping generated is "labels" : {
"properties" : {
"label1" : {
"type" : "long"
},
"label2" : {
"properties" : {
"foo" : {
"type" : "keyword"
}
}
}
}
}, So in effect, the label values can already be numerics or booleans (or even nested objects), provided the values the right format, in the document's |
FWIW, this is the Which gets converted to this dynamic template: "dynamic_templates" : [
{
"labels" : {
"path_match" : "labels.*",
"mapping" : {
"type" : "keyword"
},
"match_mapping_type" : "string"
}
},
{
"labels" : {
"path_match" : "labels.*",
"mapping" : {
"type" : "boolean"
},
"match_mapping_type" : "boolean"
}
},
{
"labels" : {
"path_match" : "labels.*",
"mapping" : {
"scaling_factor" : 1000000,
"type" : "scaled_float"
},
"match_mapping_type" : "*"
}
},
...
] The APM Server also makes sure that the |
Sorry for the late reply. Adding support for arrays of string, boolean and numbers on the APM Server side would be relatively straight forward. Are agent devs considering to release this support as major version change? If an agent sending an array of values the request would be rejected by an older APM Server. So this is not a breaking change for the APM Server, as it still works with older agents, but I think it would need to be for the agents. The server cannot simply ignore payloads with arrays, as the field and the validation against it already exists in older versions. |
When agents check the version of the APM Server there is no Breaking change. But for the RUM agent, that might not be possible. |
@felixbarny what would you do if the user sets an array for a label, but the server version doesn't support it? Only sending the first value seems dangerous e.g. for the security use case lined out above, as is completely ignoring that label. |
There's currently an API for If it's critical for users that all values get sent, they just have to ensure that they are running on a certain version of APM Server. We can also log a warning when discarding values due to a version mismatch. |
Hi @felixbarny, I am not sure, if the following should work in the python client,
I am using python apm client version ==6.0.0 and the apm server version 7.10.2
An I doing something wrong, or it is not working as intended .... |
I found a work-around, by adding an extra APM pipeline, which splits required fields. |
Description of the issue
Currently tags/labels are stored as key/value pairs in ES, however, we have seen users expecting tags to be stored as an array of strings, this is consistent with ECS-style tags.
I'm proposing to support array of strings for adding tags/labels both on the APM server as well as the agent APIs.
The alternative to this approach is to have both labels for key/values and tags for arrays but since supporting just labels (with support for arrays) is more generic, IMO, we should implement the proposed solution.
An example for JavaScript:
What we are voting on
Is this an acceptable solution for agents/apm server?
cc @elastic/apm-agent-devs , @elastic/apm-server
Vote
The text was updated successfully, but these errors were encountered: