Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metric type gauge supplier #42

Closed
mixalturek opened this issue Jan 22, 2019 · 17 comments
Closed

New metric type gauge supplier #42

mixalturek opened this issue Jan 22, 2019 · 17 comments

Comments

@mixalturek
Copy link
Contributor

It would be great to have a gauge that is able to call a function to get the value to report. Current gauges use push model, this would be pull. Inspired by https://metrics.dropwizard.io/3.2.3/manual/core.html#gauges.

Use cases:

  • various buffers sizes
  • dynamic thread pool sizes
  • system and process metrics, partially from /proc filesystem
    • application uptime
    • RAM allocated by process
    • number of threads
    • number of open files
  • many more

Reporting of the application uptime using the existing API:

// Setup
let app_start_time = std::time::Instant::now();
let bucket: AtomicBucket = ...;
let uptime = bucket.gauge("uptime");

// Manually schedule a timer to periodically call this code
uptime.value(app_start_time.elapsed().as_millis());

Proposal for the new API. (Fighting with borrow checker and multi-threading may occur, this is only to get the idea.)

// Setup
let app_start_time = std::time::Instant::now();
let bucket: AtomicBucket = ...;
// The callback is evaluated on each reporting (or somehow internally scheduled, if enabled).
bucket.gauge_supplier("uptime", || app_start_time.elapsed().as_millis());

I plan to implement the feature soon either directly to Dipstick or as its extension. If you like it I will be happy to contribute. I have already looked at the code and it would probably require significant internal changes due to the push model in InputMetric.write() so I'm asking in advance...

@fralalonde
Copy link
Owner

I agree this feature would be useful; I had it implemented in another metrics library some time ago. However I don't think a new metric type is required.

I would rather define a standard Watch closure type whose job is to observe a resource and report its value to a regularly defined metric (such as a gauge). This could then be either scheduled using a mechanism similar to flush_every, or set to be triggered when flushing an output (Bucket or I/O).

type Watch = Fn(u64) -> ()
let gauge = metrics.gauge("some_resource");
let watch: Watch = |now| gauge.value(resource.len());
let scheduled_watch = Schedule.watch_every(Duration::from_sec(10), watch);
scheduled_watch.cancel();
bucket.watch_on_flush(watch)

Maybe the target metric could be maintained externally to the closure and passed on every invocation, then it could also be used as map key to register / unregister watch on flush. Maybe the Watch signature could include returning a Result<bool> to simplify required error management in I/O resource monitoring and track internal metrics about how often watches actually record values.

As an extension to this, a dedicated Heartbeat (a constant-reporting Watch) could be enabled to make sure at least one metric is sent to downstream systems, marking the application as "active" even if no other activity is recorded. See #8.

@mixalturek
Copy link
Contributor Author

Ok, that's also possible, this would be the external implementation I was writing about. I'm starting to work on it.

mixalturek added a commit to mixalturek/dipstick that referenced this issue Jan 23, 2019
mixalturek added a commit to mixalturek/dipstick that referenced this issue Jan 23, 2019
- Use `MetricValue` instead of `isize`.
- Cancel handle in example.
- Configurable thread name.
@mixalturek
Copy link
Contributor Author

I have finished proof of concept which works but it is still quite dirty and only for AtomicBucket. What do you think about the API? I don't like the necessary Arc::new() when defining the callbacks and I'm not sure whether they should return MetricValue or Result.

I'm fighting with a generic support for all other types and a scheduling independent to flush_every() now. It's quite difficult for me and I'm uncertain in many things. Should I create new traits similar to Flush and ScheduleFlush? Is it the correct direction?

@fralalonde
Copy link
Owner

I had a quick look at the PR, I will try to find some time to play with it soon so I can give you feedback. I like the observe method, I think maybe it should be on the Input trait so that any implementer can benefit from it. As you mention (if I understood correctly), observe should be either be generic, taking in any type of Metric (Counter, Gauge, etc) and the observing closure would be required to take in Metric of the same type. Maybe another way of doing it would be to have observe defined on every Metric type so you could have let observer = my_gauge.observe(|gauge| gauge.value(resource)) and then either register the observer to be triggered on flush using a method on Input or schedule it using Scheduler.

@mixalturek
Copy link
Contributor Author

I like the gauge|timer.observe(...) but I would afraid of cyclic dependencies (borrow checker) that would probably emerge. Current implementation of Scheduler fires a new thread for each schedule, it's no-go for now. But we can switch to a different implementation backed by a single thread or a thread pool, there are multiple crates for that.

My colleague showed me today how to ged rid of the Arc in the client API with no dirty hack, it should work, I will get it a try. (I'm still pretty new to Rust...)

@fralalonde
Copy link
Owner

fralalonde commented Jan 24, 2019

The current scheduler is very basic and was always intended to be supplanted with a user supplied thread / threadpool in any non-trivial situation. The only thing is I'd be wary of is bringing in any non-optional heavy dependency set, especially tokio...

Be mindful that the Arcs appear in a lot of places because the configured components may be operating on different threads (using async queue), and there's no way to know in advance. Also, the Arc overhead should be minimal once the app is running, since they are only used for "persistent" components, not wrapping the metrics data itself. This excludes the the async queue, in which case the Arc overhead is expected to be offset by the extra threading. Again this could be optimized using a preallocated circular buffer or such other fancy struct.

@fralalonde
Copy link
Owner

The other place where Arc could show some minor overhead is on dynamic metrics creation, which I think you're doing. I would expect the total overhead of doing dynamic metrics to overshadow any Arc overhead. Another solution would be to use static metrics!()but with Labels and a custom output format, which is new in 0.7, but this would not work with AtomicBucket since the labels are discarded on aggregation.

mixalturek added a commit to mixalturek/dipstick that referenced this issue Jan 25, 2019
- If multiple callbacks are registered under the same conflicting key, only the last one will survive.
- It would be especially bad if the callback was re-registered periodically. The memory would increase and even all historical callbacks would be executed during each iteration.
mixalturek added a commit to mixalturek/dipstick that referenced this issue Jan 25, 2019
mixalturek added a commit to mixalturek/dipstick that referenced this issue Jan 25, 2019
@mixalturek
Copy link
Contributor Author

I have managed to remove the Arc that was wrapping closure in the client API, metrics.observe("answer", || 42); looks much better. And I have also fixed a memory leak that was in the initial implementation.

I have rethought the idea of counter.observe(...), timer.observe(...), etc. once again. I'm still pretty unsure, but the observing probably makes sense only for gauges - or at least for 99.99% of cases. I can't imagine a real metric to be passed to counter or timer this way.

If you are ok with the current API and code, there will be two next steps:

  • It will be probably beneficial to make observing independent to flushing and ScheduleFlush.flush_every(), generally there may be different periods.
  • Remove default implementation of InputScope.new_observer() and add concrete ones wherever necessary.

@fralalonde
Copy link
Owner

I'm having a look at your observer branch. Here are a few comments:

  • There should be a new trait, FlushObserve: Flush
  • AtomicBucket::new_observer(...) should be split into two methods, GaugeObserver::new() -> Self and FlushObserve::observe<GO: Into<GaugeObserver>>(obs: GO) -> CancelHandle. AtomicBucket would impl this new trait, as eventually all other Flush implementers would, maybe with a helper class.
  • I am unfamiliar with the pub(crate) syntax, is this Rust 2018? Let's stick with just pub for now.

@fralalonde
Copy link
Owner

continued:

  • AtomicBucket observers should be a Set, since the observers are anonymous, and handles are used to cancel them (see Scheduler).
  • (Maybe in another PR) GaugeObserver could have its own observe_every(Duration) -> CancelHandle method that would schedule it for recurrent observation, untied to any flushing activity.
  • An example program demoing / testing the feature(s) would be great.

If you have other ideas, find that my design isn't cutting it, or are just somehow uneasy with this, let me know, and we'll find a way to make this work. This is a valuable feature, which is why I'm raising the bar. I am glad to have someone take interest in this beside me.

@mixalturek
Copy link
Contributor Author

I'm sorry, I had different tasks with higher priority, I will continue probably this week.

mixalturek added a commit to mixalturek/dipstick that referenced this issue Feb 22, 2019
@mixalturek
Copy link
Contributor Author

mixalturek commented Feb 22, 2019

Hi Francis, it's quite difficult to start according to your high level description for me. I have still some difficulties with thinking in Rust and it's compositions, it isn't simple language and the enter barrier is quite high. Can you please write, in a hour, skeleton of traits and API with empty methods to make it clear? I hope a concrete code or at least pseudo-code will significantly help to see all the connections in your design. I don't want to waste my time slot for this by searching proper solutions if you already have the design in your head.

@fralalonde
Copy link
Owner

Sure, I'll try to find some time today or this weekend to do that.

@fralalonde
Copy link
Owner

Ok, finally started to work on it, I'll post a link to my branch soon.

fralalonde pushed a commit that referenced this issue Mar 8, 2019
fralalonde pushed a commit that referenced this issue Mar 8, 2019
- Use `MetricValue` instead of `isize`.
- Cancel handle in example.
- Configurable thread name.
fralalonde pushed a commit that referenced this issue Mar 8, 2019
- If multiple callbacks are registered under the same conflicting key, only the last one will survive.
- It would be especially bad if the callback was re-registered periodically. The memory would increase and even all historical callbacks would be executed during each iteration.
fralalonde pushed a commit that referenced this issue Mar 8, 2019
@fralalonde
Copy link
Owner

Here's a tentative PR #50

@mixalturek
Copy link
Contributor Author

I'm sorry, I don't have any time for this these days. I will continue hopefully in the soon future.

@fralalonde
Copy link
Owner

Addressed in version 0.7.2 with observe() from #50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants