Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcastify Calls Plugin Retries? #728

Closed
blantonl opened this issue Sep 16, 2022 · 11 comments
Closed

Broadcastify Calls Plugin Retries? #728

blantonl opened this issue Sep 16, 2022 · 11 comments

Comments

@blantonl
Copy link
Sponsor

Does the Broadcastify Calls Plugin have retry functionality that attempts to resend calls when a call send process fails?

I ask this because I'm seeing in the logs notifications about a retries and calls in queue, presumably for calls that are rejected by Broadcastify calls as duplicate. Broadcastify will reject a call if it's already receive the call from a different node.

If this is the case, I would recommend we have an option to enable/disable this functionality, with the default being disabled.

@robotastic
Copy link
Owner

robotastic commented Sep 16, 2022 via email

@blantonl
Copy link
Sponsor Author

We return a 200 code on the duplicate call rejection but the body of the message as a preceding "1" flag indicating in was an error.

Ex:

200 OK
Content-Type: text/plain; charset=utf-8
1 SKIPPED---ALREADY-RECEIVED-THIS-CALL

vs

0 [uploadUIRL]

@robertlynch3
Copy link

robertlynch3 commented Sep 17, 2022

I ask this because I'm seeing in the logs notifications about a retries and calls in queue, presumably for calls that are rejected by Broadcastify calls as duplicate. Broadcastify will reject a call if it's already receive the call from a different node.

TR does resend calls if another uploader fails. I.E. uploading to OpenMHZ fails, it will retry uploading to all player systems that are configured (even if it was already successfully uploaded to the other ones) until it finally fails out. I don’t know if that’s what you’re seeing.

@blantonl
Copy link
Sponsor Author

I'm going to strongly recommend that this feature be made optional and be disabled by default.

Given the realtime nature of a lot of call consuming applications, this causes problems if a client is retrying sending calls that might have been rejected for valid reasons (duplicate, etc)

In our case on Broadcastify, everything is real time in nature and we will reject calls if they are already received by a different node. If a node decides to wait and retry, then that will simply reintroduce the call into the system at a later time/date which then completely defeats the logic.

@robotastic
Copy link
Owner

I just dove into the code to refresh my memory. Each upload plugin defines when/if a call will be re-uploaded. In general, it looks like re-tries are only attempted if it was unable to reach the server at all and gets a 200 error. This is helpful if the local network temporarily goes down, otherwise all those calls would be lost.

In the case of the Broadcastify Plugin, it looks like it will not do a retry if it gets a response saying the file is a duplicate:
https://github.com/robotastic/trunk-recorder/blob/master/plugins/broadcastify_uploader/broadcastify_uploader.cc#L281

I am not sure about all the different cases in the plugin, but it looks like it will only retry if it gets a HTTP 200 or if the code the server returns is not of a certain value. You could further tune things here if codes have changed on the Broadcastify side.

@blantonl
Copy link
Sponsor Author

Yikes, I see what's happened.

I changed the skip error message to include some metadata about the duplicate call being rejected. it doesn't directly match that error message that is hard coded, so the clients are now retrying. I've gone ahead and reverted it back.

I've also implemented a new error message where we are rejecting calls that have a call skew of greater than 30 seconds. Call skew for us is defined as the time difference between the timestamp at which the call was captured on the node and when it was received in our systems. If that skew gets too large, we aren't able to properly detect and skip duplicate calls. This typically happens for two reasons 1) The time/date set on the system is incorrect or off 2) The system/network is underpowered and is falling behind in sending calls up to our infrastructure.

It might be helpful to just match the two words "SKIPPED" and "REJECTED" and not retrying sending any calls given an error message returned with those words.

@blantonl
Copy link
Sponsor Author

I'm not a C++ coder, however I think this would be the appropriate patch

if (code == "1" && (message.rfind("SKIPPED", 0) == 0)) {
    BOOST_LOG_TRIVIAL(info) << "[" << call_info.short_name << "]\tTG: " << call_info.talkgroup << "\tFreq: " << format_freq(call_info.freq) << "\tBroadcastify Upload Skipped: " << message;
    return 0;
}
if (code == "1" && (message.rfind("REJECTED", 0) == 0)) {
    BOOST_LOG_TRIVIAL(info) << "[" << call_info.short_name << "]\tTG: " << call_info.talkgroup << "\tFreq: " << format_freq(call_info.freq) << "\tBroadcastify Upload REJECTED: " << message;
    return 0;
}

@robotastic
Copy link
Owner

Good Catch!!

That code looks good. I created a branch with it in there, replacing the more specific SKIP detection one. I don't have a great way to test it. If you know someone who can give it a try, I can merge the branch in if it works.

https://github.com/robotastic/trunk-recorder/tree/broadcastify-fix

@blantonl
Copy link
Sponsor Author

blantonl commented Sep 25, 2022 via email

@blantonl
Copy link
Sponsor Author

blantonl commented Nov 3, 2022

I've compiled and tested this test fix branch and it's working fine from what I can see. I think we're good to release it.

@robotastic
Copy link
Owner

I merged in the changes - going to close things out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants