Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio in wrong WAV file - Pi 4 #731

Closed
aaknitt opened this issue Oct 26, 2022 · 65 comments
Closed

Audio in wrong WAV file - Pi 4 #731

aaknitt opened this issue Oct 26, 2022 · 65 comments

Comments

@aaknitt
Copy link
Contributor

aaknitt commented Oct 26, 2022

Raspberry Pi 4 is having issues with audio from the end of one call ending up in the start of the audio file of a subsequent call. Seems to get worse the longer trunk-recorder is running. Seems to be related to audio from a specific recorder getting stuck and not getting flushed to the current WAV file and ending up in the next WAV for that same recorder.

Issue does not appear to be present in release 4.3.2 but is in 4.4.0 and 4.4.1.

Mar 29 d616a86 commit seems ok
Apr 3 1697f7e commit seems ok
Apr 5 7036348 commit seems ok
Apr 7 71564fa commit seems ok
Apr 7 669ca00 commit has the issue
Apr 7 b2786c9 commit has the issue
Apr 9 dd3e6bf commit has the issue

Normal/correct transmission on paging TGID:
image

Transmission from another TGID on the same frequency ending up at the end of the same WAV file (this is sometimes present in 4.3.2 and is not the issue in question):
image

Audio from prior transmission of the same recorder at the beginning of WAV file (this is the issue in question):
image

@robotastic
Copy link
Owner

Thanks @aaknitt - If I got things in there correctly, this should be the diff showing what was added in the commit that broke things: 7036348...669ca00

Just to check - Is the audio ending up in only one of the Wav files or is there some duplication where the same audio is 2 different files?

Are you able to share the logs around when you are having the issue? They may help show what is happening. I can add in some additional debug messages if needed.

With the latest change it should be writing everything that is in the Recorder out to a wav... I wonder if it has something to do with the Pi.

Would you be able to run one of the versions that is not working on a more powerful computer?

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 1, 2022

@robotastic as best I can tell, the last commit without the issue is 71564fa and the first one with the issue is 669ca00.

A log file is attached that shows a recording of TGID 58909. Some of the highlights are pasted below. The TGID is granted a channel at 20:36:00 and the last grant update message is received at 20:36:34.5. 5 seconds later, the call is removed due to inactivity. So the call is active for 39 seconds, which is reflected in the JSON file (also attached). The actual length of audio is about 32 seconds, but only 24.2 seconds make it into the WAV associated with this call. The remaining 7.6 seconds end up in the WAV file for the next call on that TGID, which doesn't occur until 21:15. In this setup I'm only recoding that single TGID. If my memory is correct, if I'm set up to record more than one TGID, the "missing" audio would end up in the WAV file for the next call that uses the same recorder (vs. same frequency). I don't think there's duplication of audio going on...it seems like the end of one call gets put in the beginning of the WAV of the next call, and this continues and it gets worse and worse over time.

I'll try the Docker image for 4.4.1 on an Ubuntu machine when I get a chance.

[2022-10-31 20:36:00.101324] (debug) tsbk00 Chan Grant Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1 sa 6800152 Encrypt 0 Bandwidth: 12.5
[2022-10-31 20:36:00.101464] (trace) tsbk00 Chan Grant Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1 sa 6800152 Encrypt 0 Bandwidth: 12.5
[2022-10-31 20:36:00.101577] (info) [TrunkedSystem] �[0;34m154C�[0m TG: 58909 (�[35m All Pa�[0m) Freq: 852.800000 MHz �[32mStarting P25 Recorder Num [2]�[0m TDMA: false Slot: 0 QPSK: true
[2022-10-31 20:36:00.101775] (trace) [TrunkedSystem] �[0;34m154C�[0m TG: �[35m 58909�[0m Freq: 852.800000 MHz Starting wavfile sink SRC ID: 6800152
[2022-10-31 20:36:00.304967] (debug) tsbk00 Chan Grant Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1 sa 6800152 Encrypt 0 Bandwidth: 12.5
[2022-10-31 20:36:00.306301] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:00.734373] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:00.734990] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:00.766537] (trace) [TrunkedSystem] �[0;34m154C�[0m TG: �[35m 58909�[0m Freq: 852.800000 MHz Starting new Transmission Src ID: 6800152
[2022-10-31 20:36:01.087434] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:01.087978] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:01.487394] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
[2022-10-31 20:36:01.945488] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
...
...
[2022-10-31 20:36:34.489489] (trace) tsbk02 Grant Update Channel ID: 287 Freq: 852.800000 MHz ga 58909 TDMA -1
...
...
[2022-10-31 20:36:40.008910] (trace) [TrunkedSystem] �[0;34m154C�[0m TG: 58909 (�[35m All Pa�[0m) Freq: 852.800000 MHz �[36m Removing call that has been inactive for more than 5 Sec �[0m Rec last write: 1 State: 1
[2022-10-31 20:36:40.009423] (trace) last_update: 1667266594 diff: 6
[2022-10-31 20:36:40.009397] (trace) [TrunkedSystem] �[0;34m154C�[0m TG: 58909 (�[35m All Pa�[0m) Freq: 852.800000 MHz �[33mConcluding Recorded Call�[0m - Last Update: 6s Call Elapsed: 40
[2022-10-31 20:36:40.010248] (trace) stop_recording() - stopping wavfile sink but recorder state is: 1
[2022-10-31 20:36:40.010429] (info) [TrunkedSystem] �[0;34m154C�[0m TG: �[35m 58909�[0m Freq: 852.800000 MHz - Transmission src: 6800152 pos: 0 length: 24.192
[2022-10-31 20:36:40.011073] (trace) last_update: 1667266599 diff: 1
[2022-10-31 20:36:40.091923] (trace) Converting: recordings/TrunkedSystem/2022/10/31/58909-1667266560_852800000-call_154.m4a
[2022-10-31 20:36:40.092081] (trace) Command: sox recordings/TrunkedSystem/2022/10/31/58909-1667266560_852800000-call_154.wav --norm -t wav - | fdkaac --silent -p 2 --ignorelength -b 8000 -o recordings/TrunkedSystem/2022/10/31/58909-1667266560_852800000-call_154.m4a -

10-31-2022_2022_00.log

58909-1667266560_852800000-call_154.zip

@robotastic
Copy link
Owner

Thanks for sharing - I will go look through the log files. In the snippet you shared, it looks like it is not getting the End Of Transmission message on the voice channel.... or more likely, the voice channel is really far behind the control channel. So when the Control Channel stops sending Updates for that call, the voice channel is only half way through processing the buffer for that call. The Pi 4 should be fast enough to handle one channel, easily. I wonder if you have a very slow memory card if that could be holding things back?

One thing to try is using a RAM FS instead of the SD for the Capture dir and see if that changes things.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 1, 2022

Thanks for sharing - I will go look through the log files. In the snippet you shared, it looks like it is not getting the End Of Transmission message on the voice channel.... or more likely, the voice channel is really far behind the control channel. So when the Control Channel stops sending Updates for that call, the voice channel is only half way through processing the buffer for that call. The Pi 4 should be fast enough to handle one channel, easily. I wonder if you have a very slow memory card if that could be holding things back?

One thing to try is using a RAM FS instead of the SD for the Capture dir and see if that changes things.

I was thinking along the same lines, but prior versions work well on the same Pi with the same SD card, etc.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 1, 2022

@robotastic I may be totally barking up the wrong tree here since I'm not well-versed in understanding the diffs and commit history, but it looks like perhaps the change from nonstop_wavfile_sink_impl.cc to transmission_sink.cc could be related?

In transmission_sink.cc it only calls wav_write_sample() when state == RECORDING but that wasn't the case in nonstop_wavfile_sink_impl.cc. nonstop_wavfile_sink_impl.cc may even hold it in the RECORDING state until everything gets written? I'm not very confident I'm interpreting any of this correctly, but it seems like this may be a logical place for things to go wrong.

@tadscottsmith
Copy link
Contributor

If you're comfortable modifying the code, changing this line from 0 to 10, rebuilding, and capturing the additional logging while only recording that one TG, would probably be super helpful.

int verbosity = 0; // 10 = lots of debug messages

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 1, 2022

If you're comfortable modifying the code, changing this line from 0 to 10, rebuilding, and capturing the additional logging while only recording that one TG, would probably be super helpful.

int verbosity = 0; // 10 = lots of debug messages

Will do

@robotastic
Copy link
Owner

I created a debug branch and that turns on some additional debug statements. I have been getting some reports on my DC system that some calls are getting cut off.... and the audio is just getting lost. Give this branch a try:
https://github.com/robotastic/trunk-recorder/tree/debug-call-mismatch

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 2, 2022

Here's a log from the debug-call-mismatch branch. Search for 58909, which is the only TGID being recorded.
11-01-2022_2238_00.zip

@tadscottsmith
Copy link
Contributor

If you're able to rebuild it with the verbosity cranked up, I have a hunch.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 3, 2022

If you're able to rebuild it with the verbosity cranked up, I have a hunch.

verbosity was set to 10 when I built.

@tadscottsmith
Copy link
Contributor

Can you double check? I know he set it back to 0 in his last commit on the debug branch. With it at 10, you should see much more detailed logs similar to this...

11/02/22 19:10:01.109880 [0] IMBE 9a 3f 22 ef 7e 2f ff fc a3 10 3b errs 0
11/02/22 19:10:01.111191 [0] IMBE 9a bf 63 83 b2 03 ff f0 8e cc 0a errs 0
11/02/22 19:10:01.112511 [0] IMBE 8e c1 f4 8d bf 93 00 09 1b 65 bf errs 0
11/02/22 19:10:01.113527 [0] IMBE 9a bf b7 e6 36 32 87 f4 7a 50 56 errs 0
11/02/22 19:10:01.114212 [0] IMBE 9a 66 ea 42 e6 26 ff f0 b5 96 7f errs 2
11/02/22 19:10:01.115181 [0] IMBE 99 fe 44 72 39 41 f8 1e 8c e9 9c errs 0
11/02/22 19:10:01.115993 [0] IMBE 96 a3 dc 7b c5 d4 ff 8a 09 43 4f errs 0
11/02/22 19:10:01.116801 [0] IMBE 96 cf 94 79 ad d4 ff b8 37 09 aa errs 0
11/02/22 19:10:01.138303 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=1, lco=34 : 62 00 34 0f 01 02 01 7d 70, gly_errs=0
11/02/22 19:10:01.138893 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=1, lco=2 : 42 00 2d 01 f8 01 09 0e a6, ch_A=45, grp_A=504, ch_B=265, grp_B=3750, gly_errs=0
11/02/22 19:10:01.139487 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=0, lco=0 : 00 00 04 00 09 01 00 00 00, srcaddr=0, grpaddr=2305, gly_errs=0
11/02/22 19:10:01.190265 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=0, lco=0 : 00 00 04 00 69 ef 00 00 00, srcaddr=0, grpaddr=27119, gly_errs=0
11/02/22 19:10:01.190432 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=0, lco=0 : 00 00 04 00 69 ef 00 00 00, srcaddr=0, grpaddr=27119, gly_errs=0
11/02/22 19:10:01.214195 [0] NAC 0x40d HDU:  ESS: tgid=11558, mfid=0, algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, gly_errs=0, rs_errs=0
11/02/22 19:10:01.219082 [0] NAC 0x40d LDU2: ESS: algid=80, keyid=0, mi=00 00 00 00 00 00 00 00 00, rs_errs=3
11/02/22 19:10:01.219181 [0] IMBE 92 d7 54 53 0e 69 ff c9 e2 c9 af errs 0
11/02/22 19:10:01.219925 [0] IMBE 92 d7 74 79 ca 68 ff f9 71 60 d2 errs 0
11/02/22 19:10:01.220608 [0] IMBE 8e 6f 6a 73 43 48 ff ee 52 77 4f errs 0
11/02/22 19:10:01.221346 [0] IMBE 8e 5e 91 1e bd c7 ff c1 e9 50 08 errs 0
11/02/22 19:10:01.221993 [0] IMBE 89 da 58 2e 7e f6 ff cd 0f 79 27 errs 0
11/02/22 19:10:01.222598 [0] IMBE 89 db ed 8a 7c 14 f9 c5 6d 90 3e errs 0
11/02/22 19:10:01.223178 [0] IMBE 89 db cc 81 2e d0 f8 02 51 95 c7 errs 0
11/02/22 19:10:01.223659 [0] IMBE 99 a5 f3 c6 a6 e2 98 0a c2 cc e4 errs 2
11/02/22 19:10:01.224155 [0] IMBE a5 28 f9 94 63 a9 80 0a f9 4c 0b errs 0
11/02/22 19:10:01.247667 [0] NAC 0x40d TDU3:
11/02/22 19:10:01.248300 [0] NAC 0x40d TDU3:
11/02/22 19:10:01.300848 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=0, lco=0 : 00 00 04 00 69 ef 00 00 00, srcaddr=0, grpaddr=27119, gly_errs=0
11/02/22 19:10:01.300994 [0] NAC 0x40d TDU15:  LCW: ec=0, pb=0, sf=1, lco=34 : 62 00 34 0f 01 01 05 c5 70, gly_errs=0
11/02/22 19:10:01.301051 [0] NAC 0x40d TDU3:
11/02/22 19:10:01.301084 [0] NAC 0x40d TDU3:
11/02/22 19:10:01.301115 [0] NAC 0x40d TDU3:
11/02/22 19:10:01.321408 [0] NAC 0x40d LDU1: LCW: ec=0, pb=0, sf=1, lco=34 : 62 00 34 0f 01 01 05 c5 70

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 3, 2022

I'm not sure what I might be doing wrong, any ideas? Anything else I need to change or have set in a certain way in the config.json file? I changed verbosity to 10 prior to build but I'm not seeing anything like that in the logs.
image

@tadscottsmith
Copy link
Contributor

That looks right. Only think I can think of is that WinSCP (or whatever tool you're using) isn't working with Git? Can you SSH directly in and verify?

cat /home/pi/trunk-recorder/trunk-recorder/recorders/p25_recorder_decode.cc | grep 'int verbosity'

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 3, 2022

That looks right. Only think I can think of is that WinSCP (or whatever tool you're using) isn't working with Git? Can you SSH directly in and verify?

cat /home/pi/trunk-recorder/trunk-recorder/recorders/p25_recorder_decode.cc | grep 'int verbosity'

image

@tadscottsmith
Copy link
Contributor

@aaknitt That's weird! That should be displaying a ton of debug messages.

@robotastic Just thought I'd jot this theory down while it's in my head. I don't have a good way to know if this is really the issue or not...

  1. The system isn't sending a lot of TDU/TDULC messages after the end of the transmission.
  2. Too few of TDUs is causing this section of code not to be called, since there's never a TDU while amt_produce == 0.
  3. The transmission_sink never gets a tag to end_transmission().
  4. Back in main.cc, manage_calls() is setting the call state to COMPLETED and calls conclude_call().
  5. Conclude_call() attempts to stop() the recorder.
  6. Things make their way down to the transmission_sink::stop_recording() but by this time samples have all been written and this line never calls end_transmission() and the wav file isn't closed. d_fp also isn't reset, so some of the error checking on opening the next wav file isn't successful.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 3, 2022

@tadscottsmith I'm not sure I totally follow all of that, but would that theory align with the first WAV file after startup being too short, with some if its audio making its way into the start of the next file? It seems like samples from the transmission aren't making their way into the WAV file before it gets closed out.

@tadscottsmith
Copy link
Contributor

My theory is that things are writing to the WAV file correctly, but the WAV files aren't always being closed correctly, so that multiple transmissions from the same recorder might be getting dumped into the same WAV file.

@robotastic
Copy link
Owner

robotastic commented Nov 3, 2022 via email

@robotastic
Copy link
Owner

Checkout out the logs - it looks like there was only one TDU detected in the voice channel. The rest of the calls ended because they had timed out - which means that they were still recording but an update message hadn't been heard in the past 5 seconds. Let me go look at that code path and see if there is something weird there. It might be a less used code path.

@tadscottsmith
Copy link
Contributor

I'd add a check similar to this. I know there are instances on my system where transmissions aren't ended correctly because there's never a TDU string that doesn't also contain voice fames.
https://github.com/tadscottsmith/trunk-recorder/blob/29943bb6562afa9175e0a431286724e659128811/lib/op25_repeater/lib/p25_frame_assembler_impl.cc#L216-L218

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 5, 2022

I added in a second talkgroup to troubleshoot further. Some logs and recordings are attached.

It seems like things work pretty well for short transmissions. It's the long transmissions (which is what is always occurring on the paging talkgroup) that seem to cause issues.

However, even on with the short transmissions there's some funny business going on.

For example, Recorder 2 is used for the first transmission on TGID 58910 after startup. That transmission eventually times out. While it's in the process of timing out, Recorder 3 is spooled up for additional traffic in TGID 58910.

The next time that Recorder 2 is used, a TDU is detected immediately after the recorder is started. I think it's likely that this is actually the TDU from the end of the previous transmission that got stuck somehow?

The part that puzzles me is why Recorder 2 stops getting or processing data prematurely for the original call. It eventually times out, but its TDU seems to show up the next time that recorder is used.

I haven't really pieced all of this together (long recordings and misplaced TDUs) into a coherent theory as to what's actually going on yet, but I'm wondering if they're related as both seem to be examples of recorders ending processing of data prematurely, and that data ending up being processed the next time that recorder is used.

image

logs_recordings.zip

@robotastic
Copy link
Owner

TLDR; I made a small change and added some more logs. Prob won't fix things but may give some more clues.

Just paging through logs and taking note of interesting things. From that initial instance, it does look like Rec #2 just stopped for some reason, like there should be audio and it is not recording it. I have some Transmission start with a TDU though, when Rec 3 starts, it has a TDU initially, so things may not be getting stuck in the buffer, it may just be part of the P25 process:

[2022-11-04 18:24:49.461960] (info)   [TrunkedSystem]	�[0;34m41C�[0m	TG: �[35m     58910�[0m	Freq: 852.350000 MHz	Termination Voice Channel Flag - rec sample count 0 pos: 0 offset: 0

One weird thing is that the 1st call should have probably Concluded after this.... record_more_transmissions is set to false, I am going to go check the code:

[2022-11-04 18:24:50.180473] (info)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Termination Voice Channel Flag - rec sample count 36000 pos: 36000 offset: 61927
[2022-11-04 18:24:50.180787] (trace)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Spike Count: 3 pos: 36000 offset: 61927
[2022-11-04 18:24:50.181054] (trace)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Error Count: 45 pos: 36000 offset: 61927
[2022-11-04 18:24:50.181257] (info)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	TERM - record_more_transmissions = false, setting Recorder More: false - count: 36000

OK - Checked the code... right now, I don't have a block of code checking for when a Call is Inactive (hasn't received a Call Update in > 1) and the Recorder has received a TDU. It looks like I have code to do that... no idea when I commented it out though!

// In this case, the Call is inactive and was waiting for the recorder to finish. In this
// case you can now conclude the call.
/*if ((recorder->get_state() == IDLE) || (recorder->get_state() == STOPPED)) {
//BOOST_LOG_TRIVIAL(info) << "Recorder state: " << format_state(recorder->get_state());
//BOOST_LOG_TRIVIAL(info) << "[" << call->get_short_name() << "]\t\033[0;34m" << call->get_call_num() << "C\033[0m\tTG: " << call->get_talkgroup_display() << "\tFreq: " << format_freq(call->get_freq()) << "\t\u001b[36mCompleting Call, its state is INACTIVE and its recorder state is STOPPED or IDLE\u001b[0m";
// since the Call state is INACTIVE and the Recorder has reached a state of IDLE or STOPPED, we can now
// set the Call state to COMPLETED
call->set_state(COMPLETED);
call->conclude_call();
if (recorder != NULL) {
plugman_setup_recorder(recorder);
}
// The State of the Recorders has changed, so lets send an update
ended_call = true;
it = calls.erase(it);
delete call;
continue;
}*/

It looks like the Control Channel is already ahead of the Voice Channel here. The Chan Grant comes in before the TDU on the voice channel. Interestingly, this cause the Source ID to be missed initially, since it is in the Grant

2022-11-04 18:24:38.783009] (debug)   tsbk00	Chan Grant	Channel ID:   257	Freq: 852.612500 MHz	ga   58910	TDMA -1	sa 6822658	Encrypt 0	Bandwidth: 12.5
[2022-11-04 18:24:38.783227] (trace)   nac 691 type 7 size 12 mesg len: 12
[2022-11-04 18:24:38.783373] (trace)   TSBK: opcode: $2
[2022-11-04 18:24:38.783552] (trace)   tsbk02	Grant Update	Channel ID:   245	Freq: 852.537500 MHz	ga   58524	TDMA -1
[2022-11-04 18:24:38.783852] (trace)   tsbk02	Grant Update	Channel ID:   245	Freq: 852.537500 MHz	ga   58524	TDMA -1tsbk02	Grant Update	Channel ID:   335	Freq: 853.100000 MHz	ga   58265	TDMA -1
[2022-11-04 18:24:38.784222] (trace)   nac 691 type 7 size 12 mesg len: 12
[2022-11-04 18:24:38.784360] (trace)   TSBK: opcode: $9
[2022-11-04 18:24:38.784475] (debug)   tsbk09: Telephone Interconnect Voice Channel Grant Update
[2022-11-04 18:24:38.785806] (trace)   nac 691 type 7 size 12 mesg len: 12
[2022-11-04 18:24:38.786395] (trace)   TSBK: opcode: $0
[2022-11-04 18:24:38.787532] (debug)   tsbk00	Chan Grant	Channel ID:   257	Freq: 852.612500 MHz	ga   58910	TDMA -1	sa 6822658	Encrypt 0	Bandwidth: 12.5
[2022-11-04 18:24:38.789733] (trace)   nac 691 type 7 size 12 mesg len: 12
[2022-11-04 18:24:38.790881] (trace)   TSBK: opcode: $2
[2022-11-04 18:24:38.791944] (trace)   tsbk02	Grant Update	Channel ID:   287	Freq: 852.800000 MHz	ga    1073	TDMA -1
[2022-11-04 18:24:38.793515] (trace)   tsbk02	Grant Update	Channel ID:   287	Freq: 852.800000 MHz	ga    1073	TDMA -1tsbk02	Grant Update	Channel ID:   245	Freq: 852.537500 MHz	ga   58524	TDMA -1
[2022-11-04 18:24:38.797463] (trace)   nac 691 type 7 size 12 mesg len: 12
[2022-11-04 18:24:38.797665] (trace)   TSBK: opcode: $2
[2022-11-04 18:24:38.798325] (trace)   tsbk02	Grant Update	Channel ID:   335	Freq: 853.100000 MHz	ga   58265	TDMA -1
[2022-11-04 18:24:38.798611] (info)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Termination Voice Channel Flag - rec sample count 17280 pos: 17280 offset: 17280
[2022-11-04 18:24:38.798824] (trace)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Spike Count: 6 pos: 17280 offset: 17280
[2022-11-04 18:24:38.799049] (trace)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Error Count: 90 pos: 17280 offset: 17280
[2022-11-04 18:24:38.799267] (info)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	TERM - record_more_transmissions = false, setting Recorder More: true - count: 17280
[2022-11-04 18:24:38.982979] (info)   [TrunkedSystem]	�[0;34m37C�[0m	TG: �[35m     58910�[0m	Freq: 852.612500 MHz	Termination Voice Channel Flag - rec sample count 0 pos: 0 offset: 17281

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 7, 2022

For what it's worth, I finally got around to building and running on my Ubuntu machine today, and got the same results.

Caveat is that while doing testing I was running two instances...one for my feed (running a much, much older build) while at the same time running a build of the debug-call-mismatch branch.

So processor load was higher than it would be normally, but the old build continued to perform normally while the debug-call-mismatch build had the same results that I'm seeing on the Pi (if anything, it's worse). Really puzzling why no one else seems to be seeing the same thing.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 7, 2022

Here are logs from the latest build. This is just recording the single paging TGID. Let me know if it's more useful to collect data with just that single TGID or with another (non-paging) one also moving forward.

Edit: I just saw you added another commit around mid-day. The recordings attached are from the morning commit, I'll rebuild and try again.

6NovSingleTGID.zip

@robotastic
Copy link
Owner

yea - it is all sort of weird. On that one paging channel one clear issue is that there are no TDUs being received ... but that shouldn't matter too much, it should just revert back to the old way of doing things and time out after X number of seconds if it hasn't seen an Update message. It is good that it is also happening on the Ubuntu box. It at least means that is a general problem and not some weird Pi corner case... which might have explained why other people have not reported it.

The only other thing I can thing to try is turning on the debug for the control channel.

Change this line to 10:

int verbosity = 0; // 10 = lots of debug messages

And set this line back to 0:

This should give us everything that is happening on the Voice Channel and maybe give us a sense of why it is doing that. Maybe...

I am going to try some experiments on my side too.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 8, 2022

ok it's getting a bit weirder. With the latest build, Ubuntu now seems to be behaving while the Pi is still having issues. Starting to make me doubt my sanity. Log files for each are attached with some overlapping time periods. The call at 9:42 p.m. on TG 58909 (the paging talkgroup) is captured by both. The Ubuntu instance is recording 58909 (paging) and 58910 (dispatch), while the Pi is only capturing 58909.

Ubuntu.zip
Pi.zip

@robotastic
Copy link
Owner

Hi @aaknitt I wanted to check back and see how things were looking. Is there any chance that the signal could be not tuned correctly? Trunk Recorder should still handle it better, but that might explain the changing behavior.

@aaknitt
Copy link
Contributor Author

aaknitt commented Nov 29, 2022

@robotastic sorry I've been out of town a bunch and haven't been doing any more digging on this lately. I should be able to get back into it if needed, but I've kind of lost track of what the current theory is and what additional data/logs are needed to help track it down.

The fact that I can turn the problem on and off by using different build versions makes me think that it's not a hardware/tuning issue, but may I'm misunderstanding what you're getting at.

Let me know what you need from me to keep the investigation moving.

@robotastic
Copy link
Owner

No problem - I sort of lost the bubble on it to. If you are able to give it one more try with the debug-call-mismatch branch, that would be great. It has some additional messages turned on and I disabled the verbose logging from the Control Channel.

Good point about different versions handling it differently. It probably is not a tuning thing... unless the newer version has some problems where if a message gets dropped it starts misbehaving.

(For future me - this looks like the correct link for comparing changes between the 2 versions: 669ca00...71564fa )

One thing I noticed from the last batch you uploaded, the big difference between the Pi version and the Ubuntu one is that the Pi is not getting any Termination flags on the Voice Channel. The Recorder on the Pi Channel is abruptly stopped. I wonder if there is some decoding problem? Or maybe it is falling behind a ton?

Pi:

[2022-11-06 21:42:29.016069] (info)   [TrunkedSystem]	�[0;34m793C�[0m	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz	�[36m Removing call that has been inactive for more than 5 Sec �[0m Rec last write: 1 State: 1
[2022-11-06 21:42:29.016196] (info)   [TrunkedSystem]	�[0;34m793C�[0m	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz	�[33mConcluding Recorded Call_impl�[0m - Last Update: 6s	Call_impl Elapsed: 30
[2022-11-06 21:42:29.016390] (info)   [TrunkedSystem]	�[0;34m793C�[0m	- Stopping P25 Recorder Num [2]	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz 	TDMA: false	Slot: 0	Hz Error: 494
[2022-11-06 21:42:29.016796] (trace)   stop_recording() - stopping wavfile sink but recorder state is: 1
[2022-11-06 21:42:29.016901] (info)   [TrunkedSystem]	�[0;34m793C�[0m	TG: �[35m     58909�[0m	Freq: 852.800000 MHz	- Transmission src: 6800152 pos: 0 length: 17.388
[2022-11-06 21:42:29.034563] (trace)   Converting: recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_793.m4a
[2022-11-06 21:42:29.034778] (trace)   Command: sox recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_793.wav --norm=-.01 -t wav - | fdkaac --silent  -p 2 --moov-before-mdat --ignorelength -b 8000 -o recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_793.m4a -

Ubuntu:

[2022-11-06 21:42:24.011059] (trace)   wav - setting record_more to: false, State: idle sample count: 0
[2022-11-06 21:42:24.011146] (info)   [TrunkedSystem]	�[0;34m199C�[0m	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz	�[36m Setting Call state to INACTIVE last update: 2
[2022-11-06 21:42:24.011206] (info)   Recorder state: idle
[2022-11-06 21:42:24.011225] (info)   [TrunkedSystem]	�[0;34m199C�[0m	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz	�[36mCompleting Call, its state is INACTIVE and its recorder state is STOPPED or IDLE�[0m
[2022-11-06 21:42:24.011252] (info)   [TrunkedSystem]	�[0;34m199C�[0m	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz	�[33mConcluding Recorded Call_impl�[0m - Last Update: 2s	Call_impl Elapsed: 25
[2022-11-06 21:42:24.011294] (info)   [TrunkedSystem]	�[0;34m199C�[0m	- Stopping P25 Recorder Num [0]	TG:      58909 (�[35m                 All Pa�[0m)	Freq: 852.800000 MHz 	TDMA: false	Slot: 0	Hz Error: -1496
[2022-11-06 21:42:24.011333] (info)   [TrunkedSystem]	�[0;34m199C�[0m	TG: �[35m     58909�[0m	Freq: 852.800000 MHz	- Transmission src: 6800152 pos: 0 length: 22.32
[2022-11-06 21:42:24.015309] (trace)   Converting: /home/andy/trunk-build-debug/recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_199.m4a
[2022-11-06 21:42:24.015362] (trace)   Command: sox /home/andy/trunk-build-debug/recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_199.wav --norm=-.01 -t wav - | fdkaac --silent  -p 2 --moov-before-mdat --ignorelength -b 8000 -o /home/andy/trunk-build-debug/recordings/TrunkedSystem/2022/11/6/58909-1667792519_852800000-call_199.m4a -

In the Pi Logs, there is only 1 Voice Termination Message:

[2022-11-06 21:01:25.588224] (info)   [TrunkedSystem]	�[0;34m462C�[0m	TG: �[35m     58909�[0m	Freq: 852.537500 MHz	Termination Voice Channel Flag - rec sample count 0 pos: 0 offset: 0

But on the Ubuntu Logs, there are > 1300!!

I have not idea what could be causing this....

@robotastic
Copy link
Owner

Ah! - I must have messed up the search. Good find! So that is one obvious difference. The question is whether there was a change that stopped the TDUs from being generated/processed or if there was a change in the code that makes it fail when TDUs don't come in.

@tadscottsmith
Copy link
Contributor

I can't seem to get the hyperlink quite right, but there are a few instances of call->stop_call(); in main.cc that seemed to be commented out in the new version. I'm having a hard time stepping all the way through it, but any chance they were closing up the previous WAV file when new calls came in?

@robotastic
Copy link
Owner

Interesting - I had to go through the code to refresh my memory. It looks like call->stop_call(); doesn't change things much. It just sets the Call's State based upon whether the recorder has reach a termination flag yet or not. This gets handled anyhow in the manage_calls() loop, so it wasn't need anymore.

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 5, 2022

Oh boy.

I think I'm going to owe everyone some beer for all of the collective time spent on this.

I spooled up another Pi from scratch and that one is working perfectly on both master and debug-call-mismatch

I swapped SDRs, power supplies, and SD cards between the two Pis to try to isolate the issue. The problem follows the SD card, nothing else. However, I suspect that it may not be an issue with the physical SD card but rather something with the OS/build configuration on that card. I say this because I did get the problem to occur on Ubuntu once (but was unable to reproduce it). Hard to say for sure though.

Still puzzling to me is that I could turn the problem off and on on the troublesome Pi/SD card by switching between older and newer versions.

I'd love to be able to home in on the root cause so that others can avoid it moving forward, so if anyone has any ideas on what might be different in the OS, Pi settings, etc. I'm open to doing more experimentation. However, since this doesn't seem to be widespread and doesn't duplicate/reproduce easily, I'm leaning towards closing the issue for now.

Edit: Note that I had the same issue on the troublesome Pi/SD card when running via Docker. Not sure what that means for isolating the issue but it's pertinent, I think.

@sally-yachts
Copy link

I think compared to other resources, disk performance is one that hasn't historically been inspected very closely and it would be a good idea to try and get a handle on what performance TR needs before things start going sideways.

It would be nice if we could configure a work/temp directory for active recordings; this would let us mount some small but quick storage/a ramdisk for the performance-sensitive task and then move it to slower storage once the call has been concluded. Just spitballing here of course.

@robotastic
Copy link
Owner

No worries at all @aaknitt - it was a real good exercise and always helpful for me to go back and check things. This is definitely really weird either way! Just to double, was the GNU Radio version different on any of the installs? you can check with gnuradio-config-info -v

That is a great point @sally-yachts ! It is probably most noticeable on SD cards, which I think can be pretty slow... but if you have a lot of recorder writing at once, that could also hit a disk io threshold. Having the documentation for using a Ramdisk is good idea. Especially if you are just uploading everything and not planning on storing it locally long term.

@jessm33
Copy link

jessm33 commented Dec 8, 2022

I have the same issue, but it only seems to effect my system when I reduce the bandwidth on my sdr's. At 2.3 mhz or 2.56mhz I don't have the issue, I drop the bandwidth to 2mhz or below I start having this issue. I'm also running decently powerful system, a AMD Ryzen 5 PRO 5650GE Processor, and a m.2 pcie ssd, using docker on ubuntu 22.

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 10, 2022

@jessm33 is on to something here. I think I've got my "problem" SD card behaving by messing with the sample rate. Currently working to tease out whether it's an issue with the sample rate itself or related to which frequencies in the system are covered by which SDR, how close a frequency is to edge of SDR BW, etc.

@jessm33 can you post the details of your SDR config (center frequency & BW) for when it has the issue and when it doesn't, along with the frequencies of the system you're monitoring?

@jessm33
Copy link

jessm33 commented Dec 12, 2022

@aaknitt I'm monitoring 2 systems:

Fairfax County: https://www.radioreference.com/db/sid/6957
MWAA: https://www.radioreference.com/db/sid/10697

This issue is very noticeable on the Fairfax County system, as fire dispatches are announced by an automatic voice on 2 different talk groups, each dispatch is at least 30 seconds long, and are very easy to detect when they end up in the wrong talk groups recording. Active control channel on Fairfax County is 856.262500 MHz.

Working centers & rates:

            "center": 851803125,
            "rate": 2560000,
            "center": 854890625,
            "rate": 2560000,
            "center": 857434375,
            "rate": 2560000,
            "center": 859759375,
            "rate": 2560000,

Not working centers & rates:

            "center": 851803125,
            "rate": 1531256,
            "center": 854590625,
            "rate": 1606256,
            "center": 856759375,
            "rate": 1618752,
            "center": 859259375,
            "rate": 2018752,

@tadscottsmith
Copy link
Contributor

@jessm33 Could you run the debug branch with the not working rates? It'll be pretty verbose logging, but if you could capture the logs surrounding an instance of it happening, it might really help.

https://github.com/robotastic/trunk-recorder/tree/debug-call-mismatch

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 12, 2022

I'll post more details later tonight, but there seems to be some correlation with choosing a sample rate that results in getting this message on startup and having the issue being present in the newer versions:

[2022-12-11 22:28:51.253444] (info) [oasiswauk] Started with Control Channel: 853.712500 MHz [2022-12-11 22:28:51.254846] (error) p25 trunking Decim: Nothing found [2022-12-11 22:28:51.255042] (info) P25 Trunking single-stage decimator - Initial decimated rate: 0 Second decimated rate: 0 Initial Decimation: 80 System Rate: 2012504

On the other hand, if I choose a sampling rates that produces this message on startup, the issue does not seem to be present:

[2022-12-11 13:17:59.420498] (info) P25 Trunking two-stage decimator - Initial decimated rate: 100000 Second decimated rate: 25000 FA: 6250 FB: 12500 System Rate: 1400000 [2022-12-11 13:17:59.439656] (info) P25 Trunking ARB - Initial Rate: 1400000 Resampled Rate: 25000 Initial Decimation: 14 System Rate: 24000 ARB Rate: 0.96
On the older (4.3.2 release) versions, the issue is not present even with the "single state decimator" message on startup. I could be chasing ghosts here as this is correlation not causation. I'll most more details later.

Sample rates that seem ok include 0.9 MHz, 1.4 MHz, 2.4 MHz. An example of a sample rate that's problematic for the newer versions but not the older versions is 2.212504 MHz.

@taclane
Copy link
Contributor

taclane commented Dec 12, 2022

Sample rates that seem ok include 0.9 MHz, 1.4 MHz, 2.4 MHz. An example of a sample rate that's problematic for the newer versions but not the older versions is 2.212504 MHz.

The two-stage decimator should kick in when a sample rate is an even multiple of 24000, 25000, or 32000. Do things seem ok if you try 2050000 or 2016000?

@jessm33
Copy link

jessm33 commented Dec 12, 2022

@tadscottsmith here's debug log from around a time when this issue happened:

debuglog.txt

at 17:18:29 calls start on tgs 1519 and 12 (the automated dispatch channels) call should be aprox 30 seconds.

audio on recordings for both tg 1519 and 12 has the end cut off and audio from one those talk groups ends up at the beginning of a tg 1107 call.

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 12, 2022

Sample rates that seem ok include 0.9 MHz, 1.4 MHz, 2.4 MHz. An example of a sample rate that's problematic for the newer versions but not the older versions is 2.212504 MHz.

The two-stage decimator should kick in when a sample rate is an even multiple of 24000, 25000, or 32000. Do things seem ok if you try 2050000 or 2016000?

Yes, those rates are working correctly.

@jessm33
Copy link

jessm33 commented Dec 13, 2022

These rates 1664000, 1792000, 2176000 are working for me without the error message @aaknitt mentioned or this audio in the wrong wav issue.

@robotastic
Copy link
Owner

Well that is pretty wild! I would totally believe there could be something up with sample rate and those filters... the weird part is that it works in one version and not the other. I just double checked using FileMerge on the Mac and there weren't any changes to code that touches the actual signal.

@aaknitt Can you just double check that these are the version you are using for the one that works and the broken one?

@robotastic as best I can tell, the last commit without the issue is 71564fa and the first one with the issue is 669ca00.

@jessm33 Any chance you could try using this commit ( 71564fa ) and seeing if it works better on those rates/centers you were having trouble with?

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 13, 2022

@robotastic so far that still seems to be the case, even though it doesn't make much sense to me. A couple more data points:

8e9e770 (June 5) which uses p25_recorder_impl.cc and is prior to the OP25 update (tested on Ubuntu) does have the issue
193054e (May 8) which uses p25_recorder.cc does have the issue (tested on a Pi)

@robotastic
Copy link
Owner

robotastic commented Dec 13, 2022

@aaknitt Thanks for doing all this testing. And just to double, double check my understanding:
with 71564fa you are able to use all of the following sample rates: 0.9 MHz, 1.4 MHz, 2.4 MHz, 2.212504 MHz, 2050000, 2016000?

With 669ca00, it no longer works with 2.212504 MHz?

Would it be possible to use the Ubuntu machine and keep everything the same and just change between those 2 versions and see which sample rates work? The goal is to double check it is a code thing and not a HW thing. It could also be that some RTL-SDR dongle can handle being tuned to sample rates that are not easily divide.

I think you can just do a git checkout 71564fabe1d33974e27ebb754bc43b7c7902a910 and git checkout 669ca00ad10c652bacc5b2d7c1a112192c85a8af and recompile after each time.

669ca00 71564fa
900000
1400000
2016000
2048000
2050000
2212504
2016000

@tadscottsmith
Copy link
Contributor

tadscottsmith commented Dec 15, 2022

This is actually very easily reproducible on my PI using a sample rate (2400100) that can't be decimated and a single recorder. One thing I've noticed is that the transmission sink is really struggling to process the samples in 669ca00. They then just get left in the output_queue and wait for the next call on the same recorder.

Here you can see in 71564fa that the transmission_sink easily clears the output_queue and there are plenty of additional TDUs to pass the termination tag and cause the transmission_sink to end the transmission.

[2022-12-15 15:22:56.809054] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 5760 noutput_items: 864
[2022-12-15 15:22:56.924356] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 4896 noutput_items: 864
[2022-12-15 15:22:57.118709] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 4032 noutput_items: 864
[2022-12-15 15:22:57.303387] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 3168 noutput_items: 864
[2022-12-15 15:22:57.466899] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 1728 output_queue: 2304 noutput_items: 1728
[2022-12-15 15:22:57.655950] (info)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 576 output_queue: 576 noutput_items: 864
[2022-12-15 15:22:57.827356] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:57.992084] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:58.175882] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:58.344793] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:58.537923] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:58.702614] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:58.865858] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.053204] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.213210] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.411530] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.575128] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.740086] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:22:59.932850] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.089218] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.277325] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.449207] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.614498] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.798882] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:00.964182] (info)   P25 Frame Assembler - ADDING TERMINATOR - Amt Prod: 0 output_queue: 0 noutput_items: 864
[2022-12-15 15:23:01.007694] (trace)   [ramsey] 186C    TG:               EMMRCMAIN (      2306)        Freq: 852.662500 MHz    Stopping Recorded Call, setting call state to COMPLETED - Last Update: 4s
[2022-12-15 15:23:01.008176] (trace)   [ramsey] 186C    TG:               EMMRCMAIN (      2306)        Freq: 852.662500 MHz    Concluding Recorded Call - Last Update: 4s      Call Elapsed: 7

And here you can see on 669ca00, that when the call is concluded, there are almost 80,000 samples left in the queue.

[2022-12-15 15:45:02.359108] (info)   [ramsey]  Decoding System ID 40F WACN: BEE07 NAC: 40D
[2022-12-15 15:45:56.695791] (info)   [ramsey]  26C     TG:            RE FIRE NRTH (     11524)        Freq: 851.762500 MHz    Starting P25 Recorder Num [0]   TDMA: false     Slot: 0 QPSK: true
[2022-12-15 15:46:01.997155] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 13824 noutput_items: 864
[2022-12-15 15:46:30.177295] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 102528 noutput_items: 864
[2022-12-15 15:46:30.385119] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 101664 noutput_items: 864
[2022-12-15 15:46:30.554366] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 100800 noutput_items: 864
[2022-12-15 15:46:30.721797] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 99936 noutput_items: 864
[2022-12-15 15:46:30.883628] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 99072 noutput_items: 864
[2022-12-15 15:46:31.047425] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 98208 noutput_items: 864
[2022-12-15 15:46:31.277916] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 97344 noutput_items: 864
[2022-12-15 15:46:31.427602] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 96480 noutput_items: 864
[2022-12-15 15:46:31.593052] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 95616 noutput_items: 864
[2022-12-15 15:46:31.756447] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 94752 noutput_items: 864
[2022-12-15 15:46:31.943579] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 93888 noutput_items: 864
[2022-12-15 15:46:32.176322] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 93024 noutput_items: 864
[2022-12-15 15:46:32.304989] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 92160 noutput_items: 864
[2022-12-15 15:46:32.467908] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 91296 noutput_items: 864
[2022-12-15 15:46:32.632048] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 90432 noutput_items: 864
[2022-12-15 15:46:32.824196] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 89568 noutput_items: 864
[2022-12-15 15:46:33.034516] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 88704 noutput_items: 864
[2022-12-15 15:46:33.179190] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 87840 noutput_items: 864
[2022-12-15 15:46:33.341737] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 86976 noutput_items: 864
[2022-12-15 15:46:33.503860] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 86112 noutput_items: 864
[2022-12-15 15:46:33.690932] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 85248 noutput_items: 864
[2022-12-15 15:46:33.921135] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 84384 noutput_items: 864
[2022-12-15 15:46:34.050850] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 83520 noutput_items: 864
[2022-12-15 15:46:34.217784] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 82656 noutput_items: 864
[2022-12-15 15:46:34.381271] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 81792 noutput_items: 864
[2022-12-15 15:46:34.562996] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 80928 noutput_items: 864
[2022-12-15 15:46:34.766436] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 80064 noutput_items: 864
[2022-12-15 15:46:34.926677] (error)   P25 Frame Assembler - IGNORING TERMINATOR - Amt Prod: 864 output_queue: 79200 noutput_items: 864
[2022-12-15 15:46:35.008126] (trace)   [ramsey] 26C     TG:            RE FIRE NRTH (     11524)        Freq: 851.762500 MHz    Concluding Recorded Call - Last Update: 4s      Call Elapsed: 39
[2022-12-15 15:46:35.009047] (info)   [ramsey]  26C     TG:      11524  Freq: 851.762500 MHz    - Transmission src: 0 pos: 0 length: 22.248

That being said, I haven't been able to find a smoking gun as to what change between the two versions is causing the issue.

@tadscottsmith
Copy link
Contributor

Still, digging, but I'm wondering if somewhere there's a STATE mismatch. The newer version only writes if the recorder is in a RECORDING state.

if (state == RECORDING) {
for (nwritten = 0; nwritten < noutput_items; nwritten++) {
for (int chan = 0; chan < d_nchans; chan++) {
// Write zeros to channels which are in the WAV file
// but don't have any inputs here
if (chan < n_in_chans) {
int16_t **in = (int16_t **)&input_items[0];
sample_buf_s = in[chan][nwritten];
} else {
sample_buf_s = 0;
}
wav_write_sample(d_fp, sample_buf_s, d_bytes_per_sample);
d_sample_count++;
}
}
}

@robotastic
Copy link
Owner

oohhh.... good call! I will just go spin up a local version too so I can do some debugging too.

I am not sure if it is a state mis-match. If it was just that we would only end up with some gaps in the files, it wouldn't get backed up. The function always return with the total number of items it was input, which essentially always clears the items it was given.... I think. The warning message should also fire because the number of written would be less than the number of samples that were input.

if (nwritten < noutput_items) {
char formattedTalkgroup[62];
snprintf(formattedTalkgroup, 61, "%c[%dm%10ld%c[0m", 0x1B, 35, d_current_call_talkgroup, 0x1B);
std::string talkgroup_display = boost::lexical_cast<std::string>(formattedTalkgroup);
BOOST_LOG_TRIVIAL(error) << "[" << d_current_call_short_name << "]\t\033[0;34m" << d_current_call_num << "C\033[0m\tTG: " << formattedTalkgroup << "\tFreq: " << format_freq(d_current_call_freq) << "\tFailed to Write! Wrote: " << nwritten << " of " << noutput_items;
}
return noutput_items;

@Dygear
Copy link
Contributor

Dygear commented Dec 16, 2022

@tadscottsmith That was one hell of a find!

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 16, 2022

I think I have the breaking change identified:
06e8011
image

06e8011 - Feb 23 - NOT OK
85810ff - Feb 23 - OK

image

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 16, 2022

Update:

Changing this in the latest master commit makes the issue go away:

use:
if (recorder->since_last_write() > config.call_timeout) {
instead of:
if (call->since_last_update() > config.call_timeout) {

However, I'm sure there may have been a reason for making that change in the first place, so I'm not sure what else it might be breaking by changing back.

@robotastic
Copy link
Owner

WHOA!! Amazing find - let me go trace things backwards and see what where that change hits. Nice work and thank you so much for keeping at this!

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 16, 2022

@robotastic the commit where that change hits is listed in my comment above with the screenshots

#731 (comment)

@aaknitt
Copy link
Contributor Author

aaknitt commented Dec 21, 2022

Agreeing with everyone in the Discord chat...the changes made in the last 48 hours seem to have resolved the issue as best I can tell. Nice work @robotastic !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants