Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional field transaction.sampled #441

Merged
merged 5 commits into from
Jan 5, 2018
Merged

Conversation

roncohen
Copy link
Contributor

@roncohen roncohen commented Dec 24, 2017

This adds a required field transaction.sampled to the intake API and the Elasticsearch output document for transactions.

This doesn't change the requirements around what data is included when sampled is false/true. That is, agents can still send spans and set sampled: false. We might want to add this enforcement later, but I wanted to get this in now as it's a breaking change.

Why required?
I considered defaulting it to true when spans is missing in the payload in order to maintain backwards compatibility with current agents, but decided against it because now is the time to make breaking changes and because i believe having it default to true when spans is missing in the payload will add complexity in the long run. Happy to discuss though.

@roncohen
Copy link
Contributor Author

feedback appreciated @elastic/apm-agent-devs @elastic/apm-ui

@watson
Copy link
Contributor

watson commented Dec 25, 2017

@roncohen you mention considering defaulting it to true - you meant false right? update: I had too much Christmas food yesterday, and can't think straight 🍗

Your consideration related to the default value made me think that we have other fields with default values where we are currently allowed to omit the property if we just want the default value. This reasoning should apply to all such fields right?

But in general, if we want the property to be present with its default value in the database, I think it would be more user-friendly if it was the job of the APM Server to populate it in case it wasn't present in the intake JSON. Wouldn't that be better?

@mikker
Copy link

mikker commented Dec 25, 2017

I like it 👍

Copy link
Member

@graphaelli graphaelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings either way on making apm-server default to true when spans are present. One thought would be to fail validation in the only invalid case - where spans are present and sampled is false. 👍 for getting this in now and making those decisions as needed.

@roncohen
Copy link
Contributor Author

roncohen commented Dec 27, 2017

we are currently allowed to omit the property if we just want the default value.

@watson which fields are those? 🤔

@graphaelli
Copy link
Member

library_frame for one will default to false if omitted

@beniwohli
Copy link
Contributor

FYI, my sampling PR (elastic/apm-agent-python/pull/116) already contains this field. I hope to merge it soon.

As for default value vs. required, I more or less agree with @watson. Fields that have an obvious default value shouldn't be required by the API. The more required fields we have, the harder it is to interact with the APM server through things like curl, which makes it harder to play around with, which is important for authors of new agents.

@simitt
Copy link
Contributor

simitt commented Jan 2, 2018

What exactly does the transaction.sampled field indicate? I am not sure I fully understand the value of this.
In case it only indicates whether or not the transaction includes spans then I would not add this to the public API but rather just set the flag accordingly in the transforming process.

@beniwohli
Copy link
Contributor

@simitt the absence of spans doesn't necessarily indicate that the transaction wasn't sampled. It could just mean that the transaction didn't have any spans. Having a boolean makes it explicit, which is good IMO.

We could say that spans: null means "not sampled", while spans: [] means "sampled, but didn't have spans", but that seems like somewhat magic to me, and is easy to get wrong in the agent.

@simitt
Copy link
Contributor

simitt commented Jan 2, 2018

Thanks @beniwohli for explanation.
In this case I suggest to not set a default value (in case we make the field optional) as this could only be a rough estimation.
However, I have no strong feelings about making it required or not.


type: boolean

Unsampled transactions have no spans.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I the only one who's brain explodes making sense of double negatives? Sampled transactions have spans.

@roncohen
Copy link
Contributor Author

roncohen commented Jan 2, 2018

Fields that have an obvious default value shouldn't be required by the API.

@beniwohli To be clear, you're arguing that sampled should default to true no matter what the rest of the transaction contains?

@beniwohli
Copy link
Contributor

@roncohen yes. My concern with adding more required fields is that the API becomes less explorable with tools like curl.

Specifically in this case, I think it's ok to assume that a transaction has been sampled unless the agent specifically tells us otherwise. If an agent doesn't support sampling, by definition all transactions are sampled. If the agent supports sampling, it'll know what value to set sampled to.

@hmdhk
Copy link
Contributor

hmdhk commented Jan 3, 2018

I also agree with @beniwohli about not making it required.
Regarding the naming, people (including myself) seem to be confused about sampled. Personally I think we should reverse the meaning and use "partial" or "optimized" or "reduced" for the name (something like that but I'm not sure). This is also consistent with assuming if the field is not there then the agent doesn't support sampling.

@mikker
Copy link

mikker commented Jan 3, 2018

The logic also confused me at first. Agree with @jahtalab .

@beniwohli
Copy link
Contributor

I like sampled. It says what it is on the tin. partial or optimized don't describe why data is missing, sampled: False does IMO.

@roncohen
Copy link
Contributor Author

roncohen commented Jan 4, 2018

Made 'sampled' optional and it now defaults to true. Also rebased to fix conflicts.

@sorenlouv
Copy link
Member

@roncohen When are you planning to merge this?
I'm working on https://github.com/elastic/x-pack-kibana/issues/3739 but can't verify my ES queries until the documents have the sampled field.

@roncohen
Copy link
Contributor Author

roncohen commented Jan 4, 2018

Thanks for your input @mikker, @jahtalab and @beniwohli!
I'd like to hear from @watson @Qard @graphaelli @simitt (and whoever has an opinion) on "sampled" vs. "partial", "optimized", "reduced".

@watson
Copy link
Contributor

watson commented Jan 4, 2018

I think sampled is still the best option

@simitt
Copy link
Contributor

simitt commented Jan 4, 2018

To me the confusing part is that sampled indicates whether the context and the spans have been sampled, rather than the transaction itself, as I'd expect a non-sampled transaction not to show up at all.

But overall I'd also go with sampled.

@watson
Copy link
Contributor

watson commented Jan 4, 2018

I think the details of the missing context and spans are purely an implementation detail. For Node.js at least, there's a lot more that we're not doing when a transaction is not sampled. As I see it, the main reason why we're still sending up transaction objects for non-sampled transactions, is because we can't afford to change the API in a breaking way right now (this might change in the future). So I still think it's important that we convey the reason behind the missing data - i.e. that the transaction isn't sampled.

@@ -115,6 +115,7 @@
},
"name": "GET /api/types",
"result": "success",
"sampled": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove from the minimal example as it is not a required field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a minimal example?

@@ -12,7 +12,8 @@
"name": "GET /api/types",
"type": "request",
"duration": 32.592981,
"timestamp": "2017-05-09T15:04:05.999999Z"
"timestamp": "2017-05-09T15:04:05.999999Z",
"sampled": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove from the minimal example as it is not a required field.

@@ -15,7 +15,8 @@
"name": "GET /api/types",
"type": "request",
"duration": 32.592981,
"timestamp": "2017-05-30T18:53:27.154Z"
"timestamp": "2017-05-30T18:53:27.154Z",
"sampled": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove from the minimal example as it is not a required field.

@@ -13,6 +13,7 @@
"type": "request",
"duration": 32.592981,
"timestamp": "2017-05-30T18:53:27.154Z",
"sampled": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove from the minimal example as it is not a required field.

@@ -31,14 +31,16 @@
"timestamp": "2017-05-30T18:53:27.154Z",
"result": "200",
"context": null,
"spans": null
"spans": null,
"sampled": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rset to null to test that you actually allow null values (this test will fail, as there is a null option missing in the spec).

@@ -16,7 +16,8 @@
"duration": 32.592981,
"start": 24.01,
"timestamp": "2017-05-09T15:04:05.999999Z",
"spans": []
"spans": [],
"sampled": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove as it is not a required field.


type: boolean

Sampled transactions have spans. Defaults to true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: `If the transaction is sampled, then all available information has been recorded, otherwise information about spans and context are discarded.

@@ -55,6 +55,10 @@
}
},
"additionalProperties": false
},
"sampled": {
"type": "boolean",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type": ["boolean", "null"] otherwise you cannot send the key with an empty value.

- name: sampled
type: boolean
description: >
Sampled transactions have spans. Defaults to true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about description in json spec.

@@ -4,5 +4,6 @@
"type": "request",
"duration": 1.0,
"timestamp": "2017-05-30T18:53:27.154Z",
"spans": []
"spans": [],
"sampled": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove as it is not a required field, also from the other test json files.

@simitt
Copy link
Contributor

simitt commented Jan 4, 2018

@roncohen you probably didn't run approval, as those still include default values for minimal examples.

@roncohen
Copy link
Contributor Author

roncohen commented Jan 4, 2018

@simitt "approved" files deal with the output right? those should have the sampled value as it defaults to true if it's not supplied in the intake api.

EDIT: nevermind ;)

@simitt
Copy link
Contributor

simitt commented Jan 4, 2018

@roncohen yeah right, weird that some specs are still failing for approved files. Is there an issue with files that you had added and are removed again?

@roncohen
Copy link
Contributor Author

roncohen commented Jan 4, 2018

Is there an issue with files that you had added and are removed again?

Not sure what you mean. Could you elaborate?

@roncohen roncohen merged commit 84aad2b into elastic:master Jan 5, 2018
@roncohen roncohen changed the title Add required field transaction.sampled Add optional field transaction.sampled Jan 5, 2018
simitt pushed a commit to simitt/apm-server that referenced this pull request Jan 8, 2018
@roncohen roncohen deleted the sampling branch April 23, 2018 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants