Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emission Heuristics to Maximize Ability to Measure Variance in Duration #86

Closed
kevinkreiser opened this issue Jul 12, 2017 · 3 comments · Fixed by #87
Closed

Emission Heuristics to Maximize Ability to Measure Variance in Duration #86

kevinkreiser opened this issue Jul 12, 2017 · 3 comments · Fixed by #87

Comments

@kevinkreiser
Copy link
Contributor

Currently the reporter if it gets 5 observations for a given segment-next-segment pair it averages those all together and reports that when its time. This means that we lose some of the ability to measure variance unless we get some observations for this pair later on in wall time (but for the same point in gps time). So what we'll want to do is not just average all the measurements together. We'll want to at the point when we go to emit these measurements group them in such a way as to still be able to measure variance but also not skew the averages.

Say you have 5 observations for a given segment-next-segment pair. You have your privacy setting to 2 which means you have enough data to emit these observations in some form. Today we average all of these into one measurement with a count of 5. But to preserve the ability to measure variance we should probably emit 2 measurements, one with a count of 2 and one with a count of 3. We need an heuristic to do that though. Lets say of the 5 observations we have durations: 10, 12, 20, 25, 65

How do we group these observations so that we most accurately represent the data?

@dnesbitt61
Copy link
Contributor

Why not just emit a histogram? A vector of count/duration?

@kevinkreiser
Copy link
Contributor Author

@dnesbitt61 because then non-anonymised data would leave the reporter, for example when the count is 1 for a given slot in the histogram

@kevinkreiser
Copy link
Contributor Author

just to be clear i think things would be vastly simplified if we do what @dnesbitt61 is suggesting, so yeah i hope when we get clarification the answer is make it so 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants