Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only store defined attributes from context in ES. #411

Closed
simitt opened this issue Dec 19, 2017 · 31 comments · Fixed by #1863
Closed

Only store defined attributes from context in ES. #411

simitt opened this issue Dec 19, 2017 · 31 comments · Fixed by #1863
Assignees
Milestone

Comments

@simitt
Copy link
Contributor

simitt commented Dec 19, 2017

Json validation allows agents to send up additional information without failing. Ensure that only the defined information is stored in ES, except for context.custom and context.tags where we specifically allow for additional values.

@roncohen
Copy link
Contributor

roncohen commented Jan 3, 2018

We've previously decided to keep context open in order to give us the freedom to dynamically add namespaces under context in agents if needed, without requiring an upgrade to the Elastic Stack. I believe this is an edge case, but lets keep it open for 6.2 and then consider locking it down for 6.3.

@simitt
Copy link
Contributor Author

simitt commented Feb 7, 2018

@roncohen should we look into this now for 6.3?

@simitt
Copy link
Contributor Author

simitt commented Feb 9, 2018

@roncohen can we please revisit this for 6.3 as we realized that e.g. @jahtalab is sending additional data within context for the frontend agent. It might be legit to do so but we should discuss this, as for example right now we would store the userAgent information additionally there, although there is already another defined place in the context to store this information.

In case we decide to add more defined attributes in the future that should be indexed, this would be a breaking change afaict, as it could break existing indices if the new attribute was stored with a different type.

Also we might end up with a completely different schema for different agents.

@simitt
Copy link
Contributor Author

simitt commented Mar 26, 2018

update: as those fields won't get indexed, changes to the schema should not break anything. However this could still lead to a different data structure in ES than defined or expected.

@watson
Copy link
Member

watson commented Jul 4, 2018

From an agent developers point of view, I'm ok with locking down the properties under context.

From time to time we need a new root property under context. But in those cases we probably need to involve the APM UI team anyway to make sure it's displayed as expected. So I'm fine with adding that overhead.

@felixbarny
Copy link
Member

Especially with the unauthorized endpoint for the RUM agent, a malicious user could send arbitrary data and flood ES. This is a general problem we might want to address with rate limiting, but we might also want to limit which types of fields can be sent. Together with an upper limit for the values (for example 1024 chars), 50 stacktrace elements etc. we effectively have an upper limit for one span, which probably helps.

@simitt
Copy link
Contributor Author

simitt commented Aug 23, 2018

@roncohen can we revisit if this is a feature or a bug for V2, and then decide and close this issue.

@roncohen
Copy link
Contributor

thanks @simitt.
@jahtalab which additional information are you sending today?

@hmdhk
Copy link
Contributor

hmdhk commented Sep 11, 2018

@roncohen ,

Currently we're sending the following additional fields:

{
  page: { referer: "",  url: "" }
}

I'm ok with locking down the context and/or setting limits on the values.

@simitt
Copy link
Contributor Author

simitt commented Sep 14, 2018

I suggest all @elastic/apm-agent-devs are listing the currently added attributes, so we can formalize them on the JSON schema and then stop processing additional data sent within context.
@alvarolobato, @roncohen can we please move forward with this.

@roncohen
Copy link
Contributor

sounds good to me

@mikker
Copy link

mikker commented Sep 14, 2018

No added attributes in Ruby agent.

@felixbarny
Copy link
Member

No added attributes in the Java agent.

@axw
Copy link
Member

axw commented Sep 17, 2018

None for Go.

@alvarolobato
Copy link

alvarolobato commented Sep 17, 2018

We are going to aim to do this for V2, but needs more investigation and probably will also be applied to V1. This isn't considered a blocker for V2 GA.

@hmdhk
Copy link
Contributor

hmdhk commented Sep 24, 2018

I need to add these additional fields to the span context:
note: the http context is already defined with url as it's only field.

{
            "http": {
                "method": "GET",
                "sync": true,
                "status_code": 200
            }
}

I think we should start adding these additional fields to the v1 and v2 schemas for documentation purposes even though this is not enforced yet. Should I make PR?

Also a general question, will the the whole request fail if there are additional fields or will they just not get stored?

@simitt
Copy link
Contributor Author

simitt commented Sep 24, 2018

@jahtalab feel free to open a PR on this.

In case additional fields are sent the server will just ignore them, but process the request otherwise. This is already the case for all other fields, to keep the upgrade path easy. (Otherwise the server would need to be updated before the agents can update).

@jalvz
Copy link
Contributor

jalvz commented Oct 15, 2018

We are going to aim to do this for V2

Trying to catch up - what is the current status, and how is this going to look like with ECS?

@simitt
Copy link
Contributor Author

simitt commented Oct 15, 2018

We haven't implemented that yet, but according to #411 (comment) it would be fine to add it if we manage in time.

@jalvz
Copy link
Contributor

jalvz commented Jan 17, 2019

@roncohen @alvarolobato I would appreciate a decision on this to either close the issue or plan/prioritize 7.0 work accordingly

@roncohen
Copy link
Contributor

thanks @jalvz

Reading through the issue, I'm just making sure: Is the intention to change the server so that it would reject unknown fields under context or just that we should stop storing them?

I'm fine to stop storing them, but I think rejection could be problematic. We're also not rejection unknown fields anywhere on the root, for example, so it would be strange to start doing it here.

@simitt
Copy link
Contributor Author

simitt commented Jan 17, 2019

In case additional fields are sent the server will just ignore them, but process the request otherwise. This is already the case for all other fields, to keep the upgrade path easy. (Otherwise the server would need to be updated before the agents can update).

Thanks for the update @roncohen. This issue is about stop storing unknown attributes, not rejecting them.
We are trying to get this in for 7.0 then (cc @elastic/apm-agent-devs ).

@simitt
Copy link
Contributor Author

simitt commented Jan 30, 2019

@elastic/apm-agent-devs as discussed offline, the plan is to NOT store additional, not defined attributes sent up by the agents from 7.0 on.
If there are no concerns or counter proposals raised we will move forward with this implementation.

@simitt
Copy link
Contributor Author

simitt commented Jan 31, 2019

We decided to only store objects defined in the json spec for the Intake API sent under the key context from 7.0 on.
On ES level we don't store any information within context anymore, that is defined on Intake API level. Therefore the only necessary change left is to ensure context itself is not stored on ES level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants