Add queue monitoring #62

thomas-girotto · 2022-09-29T23:31:30Z

This PR implements issue #61 .

I changed my mind on the way to monitor the queue...

Instead of configuring just a threshold that would trigger one callback when going above and another one when going below as proposed in the issue, i figured it is more interesting to monitor it "real time" with a timer and a callback that takes the number of events in the queue.

The hard part was to have a not flaky unit test (combination of the new monitoring timer with the existing timer was quite hard to synchronize in tests). After a lot of effort to try to make it work, i decided to create a way to mock the timers and have a better control over the timing.

The impact of this refactoring seems reasonable to me as there is no change publicly, just a new internal constructor only visible in tests (so the PortableTimer is still not visible by users of the lib).

Let me know if you agree with the approach !

It was really hard to unit test the queue filling up too quickly and observe it via the monitoring callback in a not flaky unit test => prepare for manually controlling the timer's ticks

nblumhardt · 2022-10-10T00:58:06Z

Hey Thomas! Sorry about the slow response, I've just arrived home after a couple of weeks travelling - still a bit jet lagged today, but starting to catch up on things 😅

I think this approach is nice, but given how central this package is, how slow it moves, and how difficult it can be to change, I'd lean towards something much simpler/more minimal, if it's possible. I like how you've removed the need for the sink to track thresholds etc - definitely something developers can layer on top if needed 👍

Would it work for us to just add MonitoringCallbackAsync to the options, and then call await (_monitoringCallbackAsync?.Invoke(_queue.Count).ConfigureAwait(false) ?? Task.CompletedTask) in OnTick()?

thomas-girotto · 2022-10-10T23:02:01Z

Hello ! No worries, even open source maintainers can have holidays right ? :)

I actually thought about this, but:

Either we do it before the while(batchWasFull) loop -> i guess it wouldn't help much in monitoring because if the queue really grows a lot, it will most likely be because EmitBatchAsync takes longer than the addition of a full batch in the queue. In that case, we wouldn't allow the timer to tick again and just stays inside the loop, so we wouldn't call the monitoring callback while the queue is growing.

Either it's inside the loop, like right before or after EmitBatchAsync -> i was a bit worried that a callback where a user of the lib could do anything (like network calls to store monitoring data somewhere) would be called in the same thread than the one emptying the queue... We could harm the logging rate just by monitoring it. Even a fire and forget call to the monitoring callback in another thread could harm the system in a very sneaky way if for some reason the monitoring callback takes longer than EmitBatchAsync (by creating more and more threads).

I also liked to be able to desynchronize the monitoring from the actual logging, because we may need some intense logging while the monitoring could just peacefully check the queue state from time to time.

That were my thoughts, but if you still prefer only one timer, i could live with it and change my PR accordingly !

Thanks a lot for your time!

nblumhardt · 2022-10-20T02:54:40Z

Sorry about the slow reply on this again; looping back around ASAP.

nblumhardt · 2022-11-03T22:34:50Z

(Just realized that I left a null-coalescing operator out of my snippet illustrating the callback approach above.)

I think you have good points - I'm just not sure that we need to implement the sampling in this library - ideally, whatever hook we provide will give the consumer options to implement whatever monitoring scheme they need.

Perhaps instead of the callback providing the metrics directly, the callback could provide a user-callable way to retrieve the queue stats?

Something like:

// User creates timer-driven monitoring service, or some other way to surface
// info to the app:
var monitoringService = new MySinkMonitor();

// PeriodicBatchingSinkOptions callback that provides a way to sample the
// sink state on demand:
options.OnCreate = sinkInfo => {
    // This closure is called once so `sinkInfo` needs to be held for future use:
    monitoringService.StartMonitoring(sinkInfo);
};

The sinkInfo object would provide access to current queue information (probably just QueueCount or similar for now).

Still a few moving parts, but avoids us needing to worry about tasks/timers/leaks/threads/etc., and hopefully less to worry about maintaining in this very central/hard to update/deeply depended upon package :-)

Just trying to explore our options, so keen to know what you think.

thomas-girotto · 2022-11-15T21:43:16Z

Good idea, i guess that would work.
I was busy last week, hopefuly i'll give it a try next week.

One thing that may be painful is that we would need a monitoringService instance at bootstrap time before service injection is ready... But my proposition would have more or less the same issue.

I'll include an aspnetcore sample to check what it would look like. We may throw it away later if you think it's out of context, as this lib is not particularly intended to be used in aspnetcore context, but that would still be interesting to check how we would use it from a end user perspective.

thomas-girotto added 4 commits September 30, 2022 01:07

Fix dotnet tests for net6.0 that were not able to run

04300c0

Prepare the sink for mocking timers in unit test context

72efa3b

It was really hard to unit test the queue filling up too quickly and observe it via the monitoring callback in a not flaky unit test => prepare for manually controlling the timer's ticks

Add queue monitoring support

13fbb58

Test monitoring inputs

389ea4c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add queue monitoring #62

Add queue monitoring #62

thomas-girotto commented Sep 29, 2022

nblumhardt commented Oct 10, 2022 •

edited

thomas-girotto commented Oct 10, 2022

nblumhardt commented Oct 20, 2022

nblumhardt commented Nov 3, 2022

thomas-girotto commented Nov 15, 2022

Add queue monitoring #62

Are you sure you want to change the base?

Add queue monitoring #62

Conversation

thomas-girotto commented Sep 29, 2022

nblumhardt commented Oct 10, 2022 • edited

thomas-girotto commented Oct 10, 2022

nblumhardt commented Oct 20, 2022

nblumhardt commented Nov 3, 2022

thomas-girotto commented Nov 15, 2022

nblumhardt commented Oct 10, 2022 •

edited