Fix duplicate data frame writes in Aggregator agent #366
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds an unsubscribe step to the
OCSAgent
class'onJoin
method. We necessarily cache theSubscription
objects to do so. This avoids any duplicate subscriptions that might be left over after a disconnect and re-connection to the crossbar server.In my testing I've noticed some strange behavior of the subscriptions depending on the nature of the crossbar disconnection. The two possible outcomes of resubscribing without this patch are:
subscription_id
to the existing subscription.subscription_id
.Case 1 is where we end up getting duplicated data frames as observed in #365. Case 2, while it appears similar, does not result in duplicate data frames.
Case 1 seems to only occur when the reason for the disconnection is network related (i.e. the crossbar server stays online) and the crossbar server (container) was freshly created, and has never been restarted. If it has been restarted at least once, case 1 does not occur on future network interruptions, instead you get repeated instances of case 2. I don't have an explanation for this.
The one thing I'm now seeing though, that I would like to resolve is an "Unhandled error in Deferred:" message that I'm not sure the source of.
Here's an example of the log output after this patch in context:
Motivation and Context
Resolves #365.
How Has This Been Tested?
My test setup consists of a crossbar server container and two agents being run on my host system -- the aggregator agent and the fake data agent. It's important to run them on the host for the network connection interruption test, which I'll describe shortly.
Steps to run are:
docker-compose up -d
. 2. Start-up both agents withocs-agent-cli
docker network disconnect <network name> <container name>
. This breaks the connection to the agents.docker network connect <network name> <container name>
The same process has been tested with a
docker stop <crossbar container>
and adocker start <crossbar container>
to the same effect (though this doesn't duplicate the data.)Here's an example of performing a disconnect/reconnect, getting identical two subscriptions with identical
![times_1704397881](https://private-user-images.githubusercontent.com/7691438/294385943-4d51de16-f67e-45b7-a9c7-b0e929ae5518.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NDI1NDgsIm5iZiI6MTcxODQ0MjI0OCwicGF0aCI6Ii83NjkxNDM4LzI5NDM4NTk0My00ZDUxZGUxNi1mNjdlLTQ1YjctYTljNy1iMGU5MjlhZTU1MTgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MTVUMDkwNDA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2Y4NmMxNWM0YzBmY2ExYWI3ZmJhZTQ4OGFjY2U0ODBjY2MyNGYyZmMwZTkxNGMyZWY4MzZjMDg1ZTFiMDllYSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.XijbLSl2mBr6zAkrKnl_I2Q-jOdSjxuE8xsbUXOut3s)
subscription_id
and thus writing duplicate data:And here is the same example, with this patch:
![times_1704422191](https://private-user-images.githubusercontent.com/7691438/294386117-92385589-2328-48f2-9b60-c130c67b0abd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NDI1NDgsIm5iZiI6MTcxODQ0MjI0OCwicGF0aCI6Ii83NjkxNDM4LzI5NDM4NjExNy05MjM4NTU4OS0yMzI4LTQ4ZjItOWI2MC1jMTMwYzY3YjBhYmQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MTVUMDkwNDA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZmM3MjA2Mzg3N2E3ZTQ1ZjczMDBhNGFkMTM2MzFmM2YwNmI4NDIxNWY1NjU0ZWEyMjdiYmVkNjNiNGVjNmVhZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.rf-6Kl2w2lR5IZi_KEDT6UWxBtDJDQq8epM7f22WH-U)
Another example with this patch, but with a
![times_1704422353](https://private-user-images.githubusercontent.com/7691438/294386136-4bb466dc-3a75-40dd-8b4c-7cbfae93b540.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NDI1NDgsIm5iZiI6MTcxODQ0MjI0OCwicGF0aCI6Ii83NjkxNDM4LzI5NDM4NjEzNi00YmI0NjZkYy0zYTc1LTQwZGQtOGI0Yy03Y2JmYWU5M2I1NDAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MTVUMDkwNDA4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MmIzZjFhYmU3NmM3YmFmMTEzNjg2ZWM0ZTgwNGM3ZWQ1NTQxYTJhMDdmNzNjYjJjOWU1YzY0YjFjOTI4ZTAyMSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.w82Pjb6GFPKPfnizAe5hXSMSv0qRjSoImxOWKMjB5AE)
docker restart
of the crossbar container:Types of changes
Checklist: