Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add queue monitoring #62

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

thomas-girotto
Copy link

This PR implements issue #61 .

I changed my mind on the way to monitor the queue...

Instead of configuring just a threshold that would trigger one callback when going above and another one when going below as proposed in the issue, i figured it is more interesting to monitor it "real time" with a timer and a callback that takes the number of events in the queue.

The hard part was to have a not flaky unit test (combination of the new monitoring timer with the existing timer was quite hard to synchronize in tests). After a lot of effort to try to make it work, i decided to create a way to mock the timers and have a better control over the timing.

The impact of this refactoring seems reasonable to me as there is no change publicly, just a new internal constructor only visible in tests (so the PortableTimer is still not visible by users of the lib).

Let me know if you agree with the approach !

It was really hard to unit test the queue filling up too quickly and observe it via the monitoring callback in a not flaky unit test => prepare for manually controlling the timer's ticks
@nblumhardt
Copy link
Member

nblumhardt commented Oct 10, 2022

Hey Thomas! Sorry about the slow response, I've just arrived home after a couple of weeks travelling - still a bit jet lagged today, but starting to catch up on things 😅

I think this approach is nice, but given how central this package is, how slow it moves, and how difficult it can be to change, I'd lean towards something much simpler/more minimal, if it's possible. I like how you've removed the need for the sink to track thresholds etc - definitely something developers can layer on top if needed 👍

Would it work for us to just add MonitoringCallbackAsync to the options, and then call await (_monitoringCallbackAsync?.Invoke(_queue.Count).ConfigureAwait(false) ?? Task.CompletedTask) in OnTick()?

@thomas-girotto
Copy link
Author

Hello ! No worries, even open source maintainers can have holidays right ? :)

I actually thought about this, but:

Either we do it before the while(batchWasFull) loop -> i guess it wouldn't help much in monitoring because if the queue really grows a lot, it will most likely be because EmitBatchAsync takes longer than the addition of a full batch in the queue. In that case, we wouldn't allow the timer to tick again and just stays inside the loop, so we wouldn't call the monitoring callback while the queue is growing.

Either it's inside the loop, like right before or after EmitBatchAsync -> i was a bit worried that a callback where a user of the lib could do anything (like network calls to store monitoring data somewhere) would be called in the same thread than the one emptying the queue... We could harm the logging rate just by monitoring it. Even a fire and forget call to the monitoring callback in another thread could harm the system in a very sneaky way if for some reason the monitoring callback takes longer than EmitBatchAsync (by creating more and more threads).

I also liked to be able to desynchronize the monitoring from the actual logging, because we may need some intense logging while the monitoring could just peacefully check the queue state from time to time.

That were my thoughts, but if you still prefer only one timer, i could live with it and change my PR accordingly !

Thanks a lot for your time!

@nblumhardt
Copy link
Member

Sorry about the slow reply on this again; looping back around ASAP.

@nblumhardt
Copy link
Member

(Just realized that I left a null-coalescing operator out of my snippet illustrating the callback approach above.)

I think you have good points - I'm just not sure that we need to implement the sampling in this library - ideally, whatever hook we provide will give the consumer options to implement whatever monitoring scheme they need.

Perhaps instead of the callback providing the metrics directly, the callback could provide a user-callable way to retrieve the queue stats?

Something like:

// User creates timer-driven monitoring service, or some other way to surface
// info to the app:
var monitoringService = new MySinkMonitor();

// PeriodicBatchingSinkOptions callback that provides a way to sample the
// sink state on demand:
options.OnCreate = sinkInfo => {
    // This closure is called once so `sinkInfo` needs to be held for future use:
    monitoringService.StartMonitoring(sinkInfo);
};

The sinkInfo object would provide access to current queue information (probably just QueueCount or similar for now).

Still a few moving parts, but avoids us needing to worry about tasks/timers/leaks/threads/etc., and hopefully less to worry about maintaining in this very central/hard to update/deeply depended upon package :-)

Just trying to explore our options, so keen to know what you think.

@thomas-girotto
Copy link
Author

Good idea, i guess that would work.
I was busy last week, hopefuly i'll give it a try next week.

One thing that may be painful is that we would need a monitoringService instance at bootstrap time before service injection is ready... But my proposition would have more or less the same issue.

I'll include an aspnetcore sample to check what it would look like. We may throw it away later if you think it's out of context, as this lib is not particularly intended to be used in aspnetcore context, but that would still be interesting to check how we would use it from a end user perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants