-
-
Notifications
You must be signed in to change notification settings - Fork 515
Description
Summary
If we want to reliably submit events in order to prevent loss of any log messages, exceptions or whatever a user wants to submit, we probably need to
- assign a Global Unique ID to each event.
- filter out duplicate events on the server.
Do we need reliability?
As a developer, I probably won't care if sometimes an event that I didn't manually create somehow disappeared, e.g. an unhandled exception on a production client that I don't know of. After all, I don't expect the event to be there in the first place.
On the other hand, when I create events manually and some of them don't appear on the server, this will lead to confusion - I can't know for sure if there is a problem with Exceptionless or if my code is broken. This gets even more problematic if an event depends on another and one of them is lost. If this happens frequently, I will start questioning the quality of the service. Equally frustrating would be events that should have been sent only once but appear multiple times on the server, because the submission components messed it up.
Those reasons alone make me think that reliable submission shouldn't be opt-in, but default behavior. From a service provider perspective, we should even more care about getting all the events, because that means more events per month for that customer.
Can't the client figure it out?
There are situations where our request-response-pattern just doesn't work. We can't rely on having a useful response from the server if...
- there was a network problem (meaning the events have been submitted either successfully or partially or not at all).
- the application was killed while submitting or updating the persistant queue.
- there was a race condition (queue is already being processed, just when the app exits due to whatever reason and we try to force-submit any new events).
I tried to visualize that last point in the following diagram:
In the last step ("doesn't know what to send"), the queue can't just wait for the current submission operation to finish, because waiting would need to happen asynchronous - but asynchronous operations are not possible in the onExit handler, as the application will terminate right after the method execution ends.
Solutions
All those situations won't profit from client-side deduplication in any way. Either the client needs to ask the server if a message is already there until it gets a positive response, which would be chatty and doesn't feel right. Or the server must figure it out on its own.
In any case, events need an ID:
- We could just use the
reference_idfield that already seems to be supported by the server and API. - We could drastically increase the information density of the ID by using ASCII word characters in upper- and lower-case.
- The server would check for the existence of the ID and cancel the pipeline early on collision, or just let ElasticSearch handle it.
- To maximize performance, it could cache recent reference ids in a dictionary or a Trie.
The common solution to this kind of problem is adding an idempotent PUT method to the API, where the client can choose the resource ID and put it multiple times, always getting the same results.