-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not emitting frame issues #87
Comments
Thanks for the well structured description. I am sure we will find out what is going on in your setup. Judging by the logs, your frame is definitely placed into the radio TX buffer, but the TX status verification is failing. These are time critical operations. from the high level description is sounds to me that the TX timing parameters are not properly tuned to the expected SPI delays for your gateway: Lines 190 to 193 in bd17e53
To find out, please do the following:
|
Thanks for the quick reply:
I will work on libloragw in DEBUG and using a spec an to see if the packet actually sent. I would say it is not, since our end device sees a frame on 100% of the success cases, and does not see the frame on 100% of the 'frame not emitted' cases |
@beitler Here is a logfile again, with XDEBUG. I've been working with Multitech on the debug libloragw. They sent me a new ipk for station, let me know if this has the appropriate level. This one failed on the 9th frame I sent down. I added a space between each frame for readability Also working on that spec an - I need to borrow one from the office so might be a few days |
Thank you for the details. Let me walk through each line of the relevant part of the log:
The HAL prints this this line as part of the lgw_send setup sequence:
This is the Websocket reporting that a downlink message came in. Notice, that a class C unsolicited downlink does not require
RXDelay parameter was not passed, so 1 second is assumed.
This is the xtime related warning. Basically the timestamp cannot be properly mapped to any time reference and cannot be converted to radio time. However, since class C unsolicited downlinks are scheduled 'asap' the xtime value is ignored anyway and this warning does not have an effect on the downlink processing.
The received
Line 635 in bd17e53
The error message indicates that the radio was found not to be in TXSTATUS_EMITTING state.
The retry happens Unfortunately, I was not able to reproduce the problem on the hardware that is available to me. The following flow diagram may help to understand how these values work together: Also, it would be good to print out the actual tx_status of the radio at the point where Line 236 in bd17e53
|
Thanks for the very descriptive reply! I will start on this right away and ping MT about the rebuild with some extra debug statements. Seeing that no frame is seen on the RF detector, would you expect "TXCHECK_FUDGE" to have an impact? I know I am getting beyond my understanding of the internals, but if this error is that we are checking at the wrong time, it would seem like I would still see the packet at the RF out. |
Would this patch file work to output the status, or would you need it to be more specific? `--- ral_lgw.c 2021-02-11 11:20:53.683043806 -0600
|
@pferland-multitech Indeed, this patch would help to visualize the radio status when station expects the EMITTING state. Also, what struck me in the log is that the retries are scheduled only diff --git a/src/s2e.c b/src/s2e.c
index f8fd7f5..8d74d18 100644
--- a/src/s2e.c
+++ b/src/s2e.c
@@ -637,6 +645,7 @@ ustime_t s2e_nextTxAction (s2ctx_t* s2ctx, u1_t txunit) {
// Something went wrong - should be emitting
LOG(MOD_S2E|ERROR, "%J - radio is not emitting frame - abandoning TX, trying alternative", curr);
ral_txabort(txunit);
+ curr->txflags &= ~TXFLAG_TXING;
goto check_alt;
}
// Looks like it's on air This patch will not fix the issue you are observing. It will just fix the retries after the initial TX failure. While it is still puzzling why the TX is failing in the first place, this patch will at least prevent the retries from failing. Please apply this patch for your further tests. It will flow into the next release as well. |
I thought I would post the first of the 6 combinations of GAP/FUDGE, since it has the patch and TXSTATUS debug print, currently working on the rest of the combinations |
Great, thanks for posting the data, which is already quite revealing. Have a look at these lines:
At time 08:27.644 The downlink command is received from the LNS and scheduled for transmission ~20ms later. This indicates that at the time when the status is checked, the frame did not start emitting yet. This points at severe drift between MCU clock and radio clock.
but 100 ppm is not enough to drift away more than 5ms between sync events (occurring every TIMESYNC_RADIO_INTV - 2100ms by default). Still, it would be interesting to see the behavior with larger values for TXCHECK_FUDGE. If you could, please incorporate the patch from my previous comment into your tests going forward. Thanks! |
OK, these tests took awhile today, but here is the 9 combinations of TXFUDGE & GAP, and their results. The test consists of a script that sends down 200 frames per test, spaced out by 5 seconds.
Logs for reference: |
Thank you for your effort. These are good results. Apparently the default timing parameters are too tight for your hardware platform. I would suggest you configure your installation with @pferland-multitech Maybe multitech could evaluate the findings in this thread and check if this is an isolated issue or whether the default (compile time) timing parameters should be adjusted for all multitech installations. For example, if the SPI speed is clocked down relative to the default, or if the SPI driver is introducing delays, this will require adjustments on the timing parameters. |
I'm going to close this issue as it seems the fix on the retry bug has been tested enough on our end. Further timing research on the conduit can occur outside this issue |
* deps: Updated sx1302_hal dependency to version 2.1.0 (no LBT yet) (Resolves #89, #103, #121, #130) * deps: Added sx1302_hal patch for handling of latched xticks rollover * deps: Updated mbedTLS dependency to version 2.28.0 (LTS) * deps: Fixed lgw patch causing IQ inversion in 500kHz channel (Resolves #81) * s2e: Added support for AU915 (Resolves #43) * s2e: Added support for LoRaWAN Regional Parameters Common Names (Resolves #18) * s2e: Fixed dnchnl2 issue (Resolves #79) * s2e: Fixed class C backoff logic (Resolves #87) * s2e: Fixed class B beacon format (Resolves #129, #131) * s2e: Fixed DR range check in upchannels list parser (Resolves #141) * ral: Changed handling of xticks for lgw1302 * ral: Fixed radio in use issue (Resolves #53, #62) * ral: Fixed types in txpow assignment (master/slave) (Resolves #118) * ral: Fixed class B beacon parameters (Resolves #132) * sx130xconf: Fixed parsing of rssi_tcomp values for sx1302 (Resolves #144) * tls: Fixed TLS cert parsing issue (Resolves #76) * sys_linux: Added support for usb/spi prefix in radio devname * sys_linux: Added mbedTLS version to startup log * sys_linux: Changed version to be printed to stdout (Resolves #51) * sys_linux: Changed default max dbuf size (Resolves #95) * sys_linux: Fixed relative home path handling (Resolves #140) * sys_linux: Fixed memory corruption during system command execution (Resolves #146) * tc/cups: Fixed sync on credset file IO (Resolves #94) * timesync: Fixed UTC to PPS alignment * log: Changed verbosity of XDEBUG log level * log: Changed logging experience for improved clarity * log: Added HAL log integration into logging module * make: Changed makefiles for more space-friendliness (Resolves #66) * net: Changed strictness on line-endings in key files (Resolves #68) * gps: Fixed parsing of ublox NAV-TIMEGPS message * Restored LICENSE file (Resolves #63, #67)
We've been banging our heads for a bit on this issue, and I'd like to get some insight into what might be going on in the s2e_nextTxAction state machine to cause this "radio is not emitting frame" error. Firstly, the issue is intermittent (one out of every 10-30 or so downlinks). A bit about the setup
Environment
The Test
We are manually issuing an unsolicited downlink frame in class C mode (so no RX1/RX2 timing windows come into play here). The device is muted after joining the LNS, so as to not interfere with the test. We manually send a 2 byte dummy frame to this device every 5-10 seconds, though we have seen the error occur up to an hour after the previous packet, so I can safely say this is not a "back to back" frame issue. The attached logfile shows 7 successful downlinks, then the 8th fails. (primary error logfile: successes_then_failure.txt) When this occurs, it will never succeed in the "retry" (it always goes through 10 retries and eventually fails); maybe that nuance of behavior is important.
At this point I am not convinced it is a server issue, so I'd like to get some insight on this as a potential issue in basic station.
Also, I am very curious why the retries don't seem to function - for one test, I set CLASS_C_BACKOFF_BY setting in the station conf, to allow more time between retries. It still failed all 10 attempts across 5 seconds: emitting_frame_error_500ms.txt. Maybe this information is also useful
The text was updated successfully, but these errors were encountered: