Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survey existing metrics definitions across existing libraries #3

Open
tsloughter opened this issue May 30, 2019 · 8 comments
Open

Survey existing metrics definitions across existing libraries #3

tsloughter opened this issue May 30, 2019 · 8 comments
Assignees

Comments

@tsloughter
Copy link
Collaborator

From the meeting notes where this action item was created:

  • Lower level Telemetry.Metrics interface in Erlang
    • Currently using Structs and Protocols, so hard to convert to Erlang
    • Docs might not be as good
    • Intention with Phoenix 1.5 is to include this by default, so might not be as seamless
    • The API needs to be really good because end-user developers are going to interact with it; not just library authors
    • Main issue is with reporters, because if the internal data structures are different, they’d need to support both - need some kind of abstraction that both can handle (like maps)
    • How will this interact with OpenTelemetry’s metrics feature set? Probably a lot of overlap, so we need to make sure that it’s not too confusing for people.
    • Action: Arkadiusz to Survey existing metrics definitions across existing libraries (Prometheus, OpenCensus, Statix, Telemetry.Metrics) before next meeting
@hauleth
Copy link
Collaborator

hauleth commented May 30, 2019

Currently using Structs and Protocols, so hard to convert to Erlang

I haven't found any usage of protocols in telemetry_metrics. Heave I missed something? About structs, as Elixir provides quite easy support for records (without support for protocols though). I think it shouldn't be much of the problem.

Intention with Phoenix 1.5 is to include this by default, so might not be as seamless

Erlang implementation still can provide Elixir-like API. BTW the same should be done for telemetry itself to provide more seamless migration for consumers.

How will this interact with OpenTelemetry’s metrics feature set?

I would suggest that we would ignore direct API in OT and instead "force" user to always use telemetry for sending data to OT which should be only consumer. In that way we would sacrifice some part of the OT specs for better user experience.


About existing metrics types, most common I am aware of are:

  • counter/sum - these two are equivalent
  • histogram + sometimes more specialized versions of it like timing
  • gauge/value - single value at the measurement time

Some other tools also provide metrics like meter which work like taking derivative of gauge, but I think it is out of scope for telemetry_metrics.

@arkgil
Copy link
Collaborator

arkgil commented Jun 6, 2019

BTW the same should be done for telemetry itself to provide more seamless migration for consumers.

Do you mean creating an Elixir module delegating to the Erlang one?

@hauleth
Copy link
Collaborator

hauleth commented Jun 6, 2019

@arkgil yes. It could even be written in Erlang, but in general it should be made easy for consumers to "migrate" to newer versions.

@arkgil
Copy link
Collaborator

arkgil commented Jun 6, 2019

@hauleth I'm not sure what you mean, or maybe I don't see the problem we're trying to solve here 😄

Regarding use of records, I would vote against it, because IMO they are problematic when they show up in stacktraces. I would say that if we aim to have a common structure for both Erlang and Elixir, then maps are the way to go (they might be structs on the Elixir side, although that too might confuse folks when debugging from Erlang).

@arkgil
Copy link
Collaborator

arkgil commented Jun 6, 2019

As Łukasz wrote in a comment above, metric types supported by the libraries around fall into following buckets:

  • metric counting the number of measurements. AFAIK this kind of counter is supported only by OpenCensus and Telemetry.Metrics, i.e. other libraries allow to increment/decrement the counter by arbitrary value
  • metric for summing up recorded measurements
  • metric keeping track of the last recorded measurement
  • metric building a histogram of recorded values
  • metric exposing a set of basic statistics about recorded values, like minimum, maximum, mean, chosen percentiles etc. The set of statistics vary depending on the library/system
  • other, more sophisticated time-series analyses, like moving weighted averages or derivatives

When it comes to defining metrics, most of the libraries use the approach with the "registry". You call a function, the metric is registered somewhere globally, and the registry is queried whenever the metric is updated or needs to be exported. I haven't found library other than Telemetry.Metrics which uses plain data structures for defining metrics and passing them around.

@bryannaegele
Copy link
Collaborator

I haven't found library other than Telemetry.Metrics which uses plain data structures for defining metrics and passing them around.

How many of those are attempting to interact with multiple implementations without the use of an agent though? I see one of the benefits of using data structures to define metrics is the flexibility they provide for simple migrations via reporters. OpenCensus is the only one I'm aware of that attempts abstracting the destination but moves that abstraction to the agent.

@arkgil
Copy link
Collaborator

arkgil commented Jul 5, 2019

exometer, folsom and metrics (which uses first two as backends) are all quite popular (assessing by number of downloads on Hex) and allow to export metrics to multiple external systems.
The idea is that reporters subscribe to metric updates and are notified every x seconds that they should export the metric.

@arkgil
Copy link
Collaborator

arkgil commented Jul 5, 2019

To me, the difference between using a registry and data structures boils down to these two things:

  1. With data structures, we need to tell the reporter which metrics it shall export. With the registry we can register metrics earlier and either tell it which ones it should use or which ones it should ignore.
  2. With data structures it's not possible for libraries to register metrics, only emit events using Telemetry, which gives more control to the user. With registry, libraries could register metrics directly so that the user can export them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants