(v2) occasional Rx incomplete telegrams #460
Comments
In 1.9.5 its not there. here it started somewhere around V2b9. I have about 2 to 5% incomplete telegrams. |
@bbqkees I only tested from b9 onwards. I'll try b8 |
Tested the latest travis built 8266 without any error for 4 h.
before i was running v2_cmd in tx_mode 1 compiled here over 12h without errors. |
ok, thanks for testing. Same error in b8. In all earlier versions the error was suppressed which is why we never saw it before. So I guess it's always been like that. Same on the ESP32 where the first byte of each telegram would be a 0xFE. Need to get the logic analyzer out and leave it running for 1-2hrs to capture the log |
I have tested v2b12 and v2b13 with esp32 for longer times with logging.
It was 4 errors in 40000 telegrams, not much, but i think its better to use a small queue again. I added a 3x queue for the next tests. With the esp32 i see some crc-errors, around 1 per 10000 telegrams, but not as @proddy shortend telegrams mainly as reply, i see other errors mainly in broadcast telegrams:
i don't see this with esp8266 before. I get a mqtt disconnect every 12032 seconds. The broker logs: This is the log: esp32.log |
Before I removed the Rx queue and ran some tests to see the max size of the queue. It never got above 1 which was I removed it. But this was only on the ESP8266. I guess the ESP32 can process faster. I'll add back the queue, no problem. I'm seeing the same behavior on the ESP32. Garbage telegrams after startup and FE is front of normal telegrams. I don't remember seeing this before and not sure where it is coming from. That MQTT disconnect isn't good. I'll run some tests here and check the logs to see if I see the same. |
Seems garbage-telegrams on startup and mqtt-disconnets are ESP32 specific. Im running now esp8266 for 3h40min without disconnect and no rx-errors until now. |
try going back to use the stable version of the core. In |
Tried, but same result, garbage on startup and mqtt reconnect after 3h20 :
With the 8266 before i don't get reconnects and had only one rx-error after 20h.
|
Regarding the mqtt-disconnects i've found this issue of AsyncMqttClient |
No change, also with ths lib, Mqtt reconnects every 3h20m.
|
I looked at dx168b's fork a while back. Another one worth trying is https://github.com/bertmelis/async-mqtt-client although I neither really address the disconnects. And another is kleini/async-mqtt-client@f1b4205. I have been also actively following https://github.com/philbowles/PangolinMQTT which attempts to fix a lot of the short-comings of Marvin's original library and perhaps worth a try but would require a re-write in some areas. I lost interest in Pangolin because or the maintainer's arrogance, poor code quality and his library is not open to feedback or contributions unless you join one of his facebook clubs. If MQTT is disconnecting because of TCP errors it may be related to the AsyncTCP library and not MQTT. Do you get the same errors when NTP is disabled (NTP=0 in build). It could be the UDP interrupting transmission. me-no-dev/AsyncTCP#94 |
Disable NTP (FT_NTP=0) does not change the mqtt disconnects. Since the period is 3h20, i.e. 200 minutes it can be 200 hearbeats (every 60 sec) or 400 publishes (every 30 sec). I check now with hearbeat disabled and 10 sec publish time if it influences the reconnects. |
Should i make a new issue for the esp32-mqtt-disconnects? I have no idea what causes it. What i know about this issue:
since mqtt reconnects immediately it does not harm much. |
I'm still getting 10% Rx Errors (BBQKees) so going to reserve some time to look again into the Rx logic. And perhaps create a new TxMode based on the 1.9 way of working without interrupts. |
Hm, i'm getting zero rx and tx errors with 8266 and only 0.01% rx errors on esp32. It's difficult for me to reproduce and understand what is different. |
I'll start today by looking at previous UART Rx logic until I get one the works then figure out the difference. I've tried 10 versions up until July 27th and they all produce the same results. This is using the same wiring & same circuit as I used in 1.9 so that's at least consistent. The telegrams are cut short, probably because of a false break detection. Interestingly the results are more stable when using USB powered instead of Bus-powered. USB powered, 25268 and 10 fails in a 12 hour period It's only 0.2% so no huge deal. |
If it is always the missing break. it can be a timing issue: The break interrupt is triggered, but the 0x00 is not actual in the fifo. The next received telegram, which is normaly a poll, starts than with 0 and is ignored. |
that may work. first I need to do some debugging and see what is happening. didn't have any time today |
I saw the rx break change. I'm out for the next few days but will check this weekend when I'm back. |
I've updated the uart branch to this check (and lift the branch to b4). Please check if it helps. |
for info, |
@giovanne123: What tx_mode do you use? Have you tried other modes? (You can change from web-settings, save, no reboot required). A small number of rx_fails is common and were not measured in v1.9.5, butnumber of tx_fails is much to high. |
As far as i see in #554 you get the data. Are all your devices detected? Maybe a device is not correct recognized and ems-esp sends conmmades that are rejected by the device, this will also giv tx-errors. |
devices:
two tx errors in log debug:
|
Ok, you have a modem device at 0x0D and ems-esp requests version information of this device every minute, but the device does not answer. You can ignore these errors. The bus communication is fine.
|
You are right, I have an Easycom in the ems-bus. Ist there a possibility to prevent the calls and therefore the errors? Looks not so fine to have errors ;-) |
Is this an original easycom from a Bosch-brand or a 3.party device like hometop? |
Original from Buderus with serial connection to PC running Eco-Soft Software. |
Ok, original Buderus does not reply to version request. Than it makes sense to skip the version request for this device. |
@giovanne123: I've made the change, update to get rid of these tx-errors. |
FYI, |
On the ESP8266 and ESP32 I'm seeing the occasional Rx error with the latest v2.
In the logs I get
So I went back to earlier v2 releases, to b9 from July and the same error is there too. Why it wasn't noticed before baffles me.
what I've tried, but didn't help:
Some users are not seeing this behavior. Like Michael and some HT3 users.
what I need to do is use a scope and see why the BREAK is being detected too early on.
The text was updated successfully, but these errors were encountered: