Skip to content
This repository has been archived by the owner on Oct 4, 2021. It is now read-only.

(v2) ESP8266 & ESP32 UART optimizations #398

Closed
proddy opened this issue Jun 13, 2020 · 107 comments
Closed

(v2) ESP8266 & ESP32 UART optimizations #398

proddy opened this issue Jun 13, 2020 · 107 comments
Assignees
Labels
enhancement New feature or request

Comments

@proddy
Copy link
Collaborator

proddy commented Jun 13, 2020

We can merge later. My system is also EMS 1.0 and all modes working. Maybe it is uncritical because all components are on the boiler and the ems-bus lines are less than 1m (no roomcontroler, only weather controlled), No emc problems, no reflections, etc.
Can you check with less delay, set emsTxWait = 5 * EMSUART_BIT_TIME * 11; for 11 bittimes.
Also you can check with break ended by timer in the isr uncommend the last lines but set the timer to timer1_write(5 * EMSUART_TX_BRK_WAIT.

Originally posted by @MichaelDvP in #397 (comment)

@proddy
Copy link
Collaborator Author

proddy commented Jun 13, 2020

@MichaelDvP With the latest version 2.0.0a18 with your UART changes I can only get tx_mode 1 working. tx_mode 2 gives me:

[telegram] Sending read Tx [#12], telegram: 0B 97 02 00 20 44
[telegram] [DEBUG] New Tx [#25] telegram, length 1
[emsesp] [DEBUG] Last Tx operation failed. Retry #1. Sent: 0B 97 02 00 20, received: 09
[telegram] Sending read Tx [#25], telegram: 0B 97 02 00 20 44
[telegram] [DEBUG] New Tx [#26] telegram, length 1
[emsesp] [DEBUG] Last Tx operation failed. Retry #2. Sent: 0B 97 02 00 20, received: 09
[telegram] Sending read Tx [#26], telegram: 0B 97 02 00 20 44
[emsesp] [DEBUG] Last Tx operation failed. Retry #0. Sent: 0B 97 02 00 20, received: 09
[emsesp] Last Tx operation failed after 3 retries. Ignoring request.

EMS Bus info:
  Bus protocol: Buderus
  #telegrams received: 41
  #read requests sent: 0
  #write requests sent: 0
  #incomplete telegrams: 0 (0%)
  #tx fails (after 3 retries): 9

I noticed there is also no echo. With tx_mode 1 I would get the echo after the send, like

[telegram] Sending read Tx [#64], telegram: 0B 97 06 00 20 54
[emsesp] [DEBUG] Echo: 0B 97 06 00 20 54
[emsesp] Last Tx read successful

@proddy
Copy link
Collaborator Author

proddy commented Jun 13, 2020

Michael, are you also using the same platformio.ini file with the ESP8266 running at 160Mhz instead of 80Mhz (board_build.f_cpu = 160000000L). Was thinking this could perhaps interfere with the timing?

@MichaelDvP
Copy link
Collaborator

Yes i'm running 160 MHz, but i tested the timer before with dummy counts,
Have you tried mode 2 or 3? 3 gives no echo, since it clears the fifos before sending break. The timer controlled mode 2 should echo.
With mode 3 i see a strange thing, the device answers, but ems-esp says no response. But it's only in mode 3.
ems:/ems# 000+00:10:01.547 D 653: [telegram] Sending read Tx [#101], telegram: 0B 88 33 00 20 78
ems:/ems# 000+00:10:01.670 D 654: [telegram] New Rx [#166] telegram, length 11
ems:/ems# 000+00:10:01.670 T 655: [emsesp] Boiler(0x08) -> Me(0x0B), UBAParameterWW(0x33), data: 08 FF 30 FB FF 28 FF 07 46 00 00
ems:/ems# 000+00:10:01.670 D 656: [emsdevice] Processing UBAParameterWW...
ems:/ems# 000+00:10:02.087 D 657: [telegram] Sending read Tx [#110], telegram: 0B 88 33 00 20 78
ems:/ems# 000+00:10:02.170 D 658: [telegram] New Rx [#167] telegram, length 11
ems:/ems# 000+00:10:02.170 T 659: [emsesp] Boiler(0x08) -> Me(0x0B), UBAParameterWW(0x33), data: 08 FF 30 FB FF 28 FF 07 46 00 00
ems:/ems# 000+00:10:02.170 D 660: [emsdevice] Processing UBAParameterWW...
ems:/ems# 000+00:10:02.182 D 661: [mqtt] Publishing topic stat/ems/boiler_data (#102, attempt #1, pid 1)
ems:/ems# 000+00:10:02.566 D 662: [telegram] Sending read Tx [#111], telegram: 0B 88 33 00 20 78
ems:/ems# 000+00:10:02.575 E 663: [emsesp] Last Tx operation failed after 3 retries. Ignoring request.
ems:/ems# 000+00:10:02.667 D 664: [telegram] New Rx [#168] telegram, length 11
ems:/ems# 000+00:10:02.667 T 665: [emsesp] Boiler(0x08) -> Me(0x0B), UBAParameterWW(0x33), data: 08 FF 30 FB FF 28 FF 07 46 00 00

@proddy
Copy link
Collaborator Author

proddy commented Jun 13, 2020

only tx_mode 1 is working. 2,3 4 give failures. not even a single Tx makes it through. I think I just need to bring out the scope and see what is actually happening on the line.

For tx_mode 3 this was a special modification I worked with @philrich on last year. See #103 (comment). The timings are slightly different for Junkers/HT3.

@MichaelDvP
Copy link
Collaborator

In this comment he wrote:

For me this patch works and i get close to none Corrupted Telegrams when i set EMS_TX_WAIT_GAP to 7 Bits (728us). (3, 4, 5, 6 Bits also works, when setting this to 10 or more Bits i get many Corrupted Telegrams)

So we need a gap of 3-9 bittimes for HT3, 10 for EMS+, and a low value for EMS1.0?
I do this to the uart, revert the mode 2, it's never good to change a working mode. Now there are:

  • mode 1..3 as before, exept i do not clear buffers on break to get the echo in rx.

  • mode 4 as before, only hardwarecontroled

  • mode 5, same as mode 4, but 1,5 stopbits, I see in Support for Junkers CerastarComfort ZWR and Junkers Bosch CR 100 #103 that the time between master-poll and master-break is 1,5 bittimes.

  • mode 11..30: timer controlled modes with number of bittimes, ems+ should work with mode 20, ht3 with mode 17

  • mode 6..10 same as modes 12, 14, 16, 18, 20, but timer controlled clearing of break-flag.

  • for esp32 i have also implemented timermodes, 4, 6..30 works the same way, 1..3 is also timermode and 5 is the same as 4. I have to look how to change the stopbits.

All modes can be changed without reboot. You know, i don't like the modes 1-3, where the processor making millions of nop-loops to wait for tx done, There are other usefull things to do.

I've also merged your latest commit. Should i make a pr, or wait for feedback from hans?

In my system all modes work. Mode 1 has very few rx-crc-errors only if roomcontrol is activated and a roomcontrol poll-ack is right before a incoming telegram (it's always the UBAMonitorFast to all) . Then i get a 19 as additional first value in this telegram.

@proddy
Copy link
Collaborator Author

proddy commented Jun 14, 2020

I like the approach. Plenty of modes to test with, and we can ask some of the Junkers users to also test. I'd say push the PR with the changes you made for Hans as you also fixed a lot of silly mistakes I made so thanks again for that. I agree the original rx/tx code is not efficient and blocking other important cycles and your approach is cleaner and makes use of the hardware interrupts.

@MichaelDvP
Copy link
Collaborator

As long as you never need a timer, 8266 have only 2 timers and one is used by wifi, the other here. Esp32 has more resources.

@proddy
Copy link
Collaborator Author

proddy commented Jun 14, 2020

did a quick test, still only tx_mode 1 works on my system. I'll try all the other combinations later tonight.

@MichaelDvP
Copy link
Collaborator

something strange, seems the timer is now used, blocked, or reconfigured by logger for debug and trace. In both log modes tx fails in all timer-controlled modes, with log info the timer works, also if watch is on. As soon i switch log to info or lower tx works again and show emsbus counts no more errors, also i can see the reply to me in watch on/raw.

This log shows only the relevant lines.
20200614_163200.log

I checked the logs from before merging your latest changes, all modes works in debug/trace modes.

@proddy
Copy link
Collaborator Author

proddy commented Jun 14, 2020

Hmm, can’t think of what that could be. And Log Level INFO with ‘watch on’ works you say? I’ll check the code when I’m back home.

@proddy
Copy link
Collaborator Author

proddy commented Jun 15, 2020

did some more testing with ranges 4, 5, 6-10, 11-30 and only tx_mode 1 and tx_mode 5 work in my environment. See screenshots:

Capture_1

Capture_5

I can't see how the debug would affect the timings. It's best not to include any logger objects in the uart code is it may be blocking.

@MichaelDvP
Copy link
Collaborator

Oh, good, mode 5 working.

My issues with mode 1 come from the poll. Poll uses tx_brk also for mode 1, but here is no delay for mode 1. Found also a issue with HT3.

I deleted logging from emsuart, but that doesn't help. It is the LOG_DEBUG(F("Sending.. near the transmit() call, if i delete these loggings, all works. Seems the logger uses the timer for output. ESP32 has more timers and no conflict.
I'll push + pr the changes after testing. I've also adapted esp32 uart completly, but not tested yet.

@proddy
Copy link
Collaborator Author

proddy commented Jun 15, 2020

tx_mode 5 uses the same code as tx_mode 1 (the legacy stuff) but with 1.5 bitstops?

I can't see why the logs would affect the timings, the LOG_DEBUG just appends the message to a message queue. I think these messages are important (for debugging!) so does it work if you move them after the transmit() ?

@MichaelDvP
Copy link
Collaborator

No, mode 5 uses the same code as mode 4, but with 1,5 stopbits: Pushing all in the fifo and the hardware sends it out and appends the break. The transmit-function doesn't have to wait and returns instantly. I think it is the best mode for ems1.0.
Putting the sending.. after transmit-call doesn't help (done that), because the timer-mode sending in the background aren't finished. There are other debug-messages if tx fails and on receive Last Tx read successful which doesn't conflict, since the tx is finished then. So there is a debug feedback in any way.

@proddy
Copy link
Collaborator Author

proddy commented Jun 15, 2020

ah, I see what I did wrong now. I modified some of the esp8266 uart code, let me revert and test again. I don't expect tx_mode 5 will work sadly.

@proddy
Copy link
Collaborator Author

proddy commented Jun 15, 2020

ok, tx_mode 5 doesn't work. Only tx_mode 1 on my setup.

@proddy proddy changed the title ESP8266 & ESP32 UART optimizations (tx_mode 2 and 4) ESP8266 & ESP32 UART optimizations (tx_mode 4 and above) Jun 16, 2020
@MichaelDvP
Copy link
Collaborator

I have tested a bit more and think i understannd better. I tried to measure the time from start sending to complete echo received, but with confusion results. I realize, that you havve modified the get_uptime() and the time is only updated in set_uptime once per loop-cycle, so all in one loop gives the same time (tx-modes 1-3 executes in 0 ms ;-)). Is there a special reason for this modification?
I used ::millis() instead and can see the time.
The esp uart seems to have a lag of 1 bytelength when receiving a byte, maybe there is a shiftregister for rx and after stopbit it is shifted to fifo in the next cycle, tx_mode 1 needs 16 ms for 6 bytes+break, same as tx_mode 2 or tx_mode 21. tx_mode 4 executes in 9 ms.
so it's 1dummy+6+break for mode 4, 6dummy+6+break for mode 1. Your ems1.0 seems to be more ems+ than mine. Try with tx_mode 21 or 22 if it works for you.

Therefore i changed the modes once again, leaving 1..4, but set from 5 on timer as number of delay between bytes: mode 5 is 1bytetime+5bittimes until next byte, If there is no delay needed, mode 4 will do, HT3 will work with 5,6,7, ems+ 20, 21, 22. My system resposes to all modes, also 50 works.
In mode 1 i added back the timeout, if there is no loopback from the bus, it will hang forever.

With esp32 i can't get the modes with delayMicroseconds working, but 4..50 without any collision with logger.

@proddy
Copy link
Collaborator Author

proddy commented Jun 16, 2020

The reason for implementing get_uptime() is because I thought millis() was quite expensive and used many times in the code. I decided to control the timestamp once in the loop() cycle.

https://github.com/esp8266/Arduino/blob/1bfb29395f71da2caa5d14cbc6bdf8cf9c092d7a/cores/esp8266/core_esp8266_wiring.cpp#L166

I'll test your changes, thanks

@proddy
Copy link
Collaborator Author

proddy commented Jun 16, 2020

quickly tested. I couldn't get any of the modes to work. tx_mode 1 also now shows errors so I need to check what was changed

[telegram] Sending read Tx [#29], telegram: 0B 97 06 00 20 54
[emsesp] [DEBUG] Echo after 11 ms: 0B 99 F0 C0
[telegram] [DEBUG] New Tx [#62] telegram, length 1
[emsesp] [DEBUG] Last Tx operation failed. Retry #1. Sent: 0B 97 06 00 20, received: 89
[telegram] [DEBUG] New Rx [#80] telegram, message length 25
[emsdevice] Processing UBAMonitorFast...
[telegram] [DEBUG] New Rx [#81] telegram, message length 21
[emsdevice] Processing MC10Status...
[telegram] [DEBUG] New Rx [#82] telegram, message length 19
[emsdevice] Processing UBAMonitorWW...
[telegram] [DEBUG] New Rx [#83] telegram, message length 13
[mqtt] Publishing topic homeassistant/climate/ems-esp/state (#45, attempt #1, pid 1)
[telegram] Sending read Tx [#62], telegram: 0B 97 06 00 20 54
[emsesp] [DEBUG] Echo after 33 ms: 0B 9F F0 C8 D4 FC
[telegram] [DEBUG] New Tx [#63] telegram, length 1
[emsesp] [DEBUG] Last Tx operation failed. Retry #2. Sent: 0B 97 06 00 20, received: 89
[telegram] Sending read Tx [#63], telegram: 0B 97 06 00 20 54
[emsesp] [DEBUG] Echo after 12 ms: 0B 99 F0 C0
[emsesp] [DEBUG] Last Tx operation failed. Retry #0. Sent: 0B 97 06 00 20, received: 89
[emsesp] Last Tx operation failed after 3 retries. Ignoring request.

EMS Bus info:
  Tx mode: 1
  Bus protocol: Buderus
  #telegrams received: 257
  #read requests sent: 39
  #write requests sent: 0
  #corrupted telegrams: 0 (0%)
  #tx fails (after 3 retries): 23

Also I think it's time we have other people test to rule out something strange in my environment.

@MichaelDvP
Copy link
Collaborator

Change in tx mode 1 since a19 is the old timeout (22 bittimes) back and change the poll-ack to the same logic, before poll uses tx_brk, but tx_brk has not wait defined for mode 1, causing some crc errors for me.
Strange that the echo is corrupted in that way, thats not a timing-issue, the bits aren't shifted, they are puzzled. Bad contacts? Cold solderings?

@MichaelDvP
Copy link
Collaborator

Here is a log with nearly all modes on my system, you can see the different timings. The roomcontroller simulation is on, the 0xAF from 0x19 is from that function, so ems-esp answers all polls to 0x0B and 0x19.
20200616_165225_EMS.log
I'm surprised that the master does not interrupt the very long modes.

@proddy
Copy link
Collaborator Author

proddy commented Jun 16, 2020

nice can clean. You know, perhaps its my circuit. I have a few older and newer ones which bbqkees gives me to try. I'll go back to an earlier board and try the jack interface with a shorter cable and see if it makes any difference.

@proddy
Copy link
Collaborator Author

proddy commented Jun 16, 2020

@bbqkees we have a similar setup. When you find time could you grab the latest v2 and test tx_mode 1, 4 and 5? Instructions are in https://github.com/proddy/EMS-ESP/tree/v2#uploading-the-firmware

@bbqkees
Copy link

bbqkees commented Jun 17, 2020

I'm already running v2 with tx_mode 1 on a Premium II Gateway with a Lolin ESP8266.

This is from about 12 hours or so:
EMS Bus info: Tx mode: 1 Bus protocol: Buderus #telegrams received: 24713 #read requests sent: 4522 #write requests sent: 0 #corrupted telegrams: 0 (0%) #tx fails (after 3 retries): 0

Will do some logging when I'm at home.

@proddy
Copy link
Collaborator Author

proddy commented Jun 17, 2020

I'm already running v2 with tx_mode 1 on a Premium II Gateway with a Lolin ESP8266.

is this using the latest a21 version with Michael's new tx modes?

@bbqkees
Copy link

bbqkees commented Jun 17, 2020

No you're too fast for me :-).
Its on a19 from yesterday morning. Will update it tonight.

@bbqkees
Copy link

bbqkees commented Jun 17, 2020

on a21 all modes give TX errors.
log-mode-1-4-5-bbqkees-17062020.log

@MichaelDvP
Copy link
Collaborator

Oh, i'll look what is the change in mode 1, i thought nothing was changed, can please also try mode 10, or 12 with longer timer delays?

@joanwa
Copy link
Contributor

joanwa commented Jul 27, 2020

for my EMS2 system, the least amount of errors with b8 are now apparently with tx_mode 22 (tried 0-5, 18, 22). On b7, tx_mode 2 & 3 didn't produce any errors but on b8 now quite regular.

@proddy
Copy link
Collaborator Author

proddy commented Jul 27, 2020

@MichaelDvP as mentioned above. I think for tx_mode 1 we stick to the new EMSUART_TX_WAIT_BRK of 10 bits as in 2.0.0.b8 now and for tx_mode 2 go back to the previous 11 bits. Is that worth trying? I can't see any other changes between b7 and b8 that would give Johannes errors on his EMS2 system.

@joanwa are you comfortable changing a file, building and upload the firmware yourself?

@proddy
Copy link
Collaborator Author

proddy commented Jul 27, 2020

On thing I did notice is that the display of the boiler turns on sometimes and the boiler restarts.
But the ESP never reboots. So it seems it locks up the bus occasionally.

@bbqkees when the boiler starts you'll see a UBAMaintenanceData message appear in the console (telegram type 0x15). You could watch for this telegram in a console so see how often it happens (watch raw 15) or use syslog and check the logs the next day.

@joanwa
Copy link
Contributor

joanwa commented Jul 27, 2020

@MichaelDvP as mentioned above. I think for tx_mode 1 we stick to the new EMSUART_TX_WAIT_BRK of 10 bits as in 2.0.0.b8 now and for tx_mode 2 go back to the previous 11 bits. Is that worth trying? I can't see any other changes between b7 and b8 that would give Johannes errors on his EMS2 system.

@joanwa are you comfortable changing a file, building and upload the firmware yourself?

yes, already forked and committed some cosmetics in my fork. Just seeing that you build a newer b8 binary. Shall I gives this a go? Or what are you interested in that I can build myself?

@proddy
Copy link
Collaborator Author

proddy commented Jul 27, 2020

Or what are you interested in that I can build myself?

change EMSUART_TX_WAIT_BRK back to using 11 bits in emsuart_esp8266.h

#define EMSUART_TX_WAIT_BRK (EMSUART_TX_BIT_TIME * 11)

and test with tx_mode 2

@MichaelDvP
Copy link
Collaborator

@MichaelDvP as mentioned above. I think for tx_mode 1 we stick to the new EMSUART_TX_WAIT_BRK of 10 bits as in 2.0.0.b8 now and for tx_mode 2 go back to the previous 11 bits. Is that worth trying?

Yes, i updated the uart branch. For we know:

  • mode 1 works better with break less than 11 bit
  • Junkers needs more thn 10 bit break
  • EMS+ unknown, but from v1.9 11 bit should work.

For the timer modes i set the break to a bit more than 10 bit, maybe this works on both worlds.
I've also tested esp32 and esp8266 now with analyzer and checked the timing.

@joanwa
Copy link
Contributor

joanwa commented Jul 28, 2020

Or what are you interested in that I can build myself?

change EMSUART_TX_WAIT_BRK back to using 11 bits in emsuart_esp8266.h

#define EMSUART_TX_WAIT_BRK (EMSUART_TX_BIT_TIME * 11)

and test with tx_mode 2

I switched to EMSUART_TX_BIT_TIME * 11 on the commit that is now the b9 build.
tx_mode 2 still produces Tx read errors. And on a side note, not sure if somehow related: I think since using your second b8 build, USB power didn't do it anymore, the esp basically starts up, blinks once or a few times and than the LED just goes dark and stays like this, becomes unreachable. Via bus power it works.
The issue is only by using USB power I had 0 Tx read errors. Since I now have to use bus power, I can't exclude this one being responsible for the Tx read errors...

Update: tx_mode 1 appears to produce no Tx read errors. tx_mode 1 never worked for me before:

┌──────────────────────────────────────────┐
│ EMS-ESP version 2.0.0b9_joanwa               │
│ https://github.com/proddy/EMS-ESP        │
│                                          │
│ type help to show available commands     │
└──────────────────────────────────────────┘

ems-esp:/$ show ems
EMS Bus is connected.

EMS Bus info:
  Tx mode: 1
  Bus protocol: HT3
  #telegrams received: 364
  #read requests sent: 73
  #write requests sent: 0
  #corrupted telegrams: 0 (0%)
  #tx fails (after 3 retries): 0

Rx Queue is empty

Tx Queue is empty

@proddy
Copy link
Collaborator Author

proddy commented Jul 28, 2020

ok that looks promising, although not sure why tx_mode 1 works better than tx_mode 2 on your HT3 system. I'll look into the USB/LED issue tonight.

@MichaelDvP tx_mode 1 works fine with both the ESP8266 and ESP32 on my system EMS1.0 system. This is bus-powered. When I plug in a USB from the desktop PC (which is a stable 5v) I start to see errors. My focus is on bus-powered to be compliant with BBQKees's independent circuits. I think now we need some Buderus EMS2.0 and more Junkers/HT3 guinea pigs to try this out.

@MichaelDvP
Copy link
Collaborator

@proddy if you power from usb, do you have the dcdc-converter from the gateway removed? If powered from both sides it could make strange efffects.

@proddy
Copy link
Collaborator Author

proddy commented Jul 29, 2020

@proddy if you power from usb, do you have the dcdc-converter from the gateway removed? If powered from both sides it could make strange efffects.

yes, the buck is removed in the ESP32. Ideally I would need to do a logic analyzer scan of bus-powered vs USB (via a powerbank) and compare the charts. And also do a test with the mqtt disabled and then later the complete wifi to see if the wifi is somehow affecting it. And lastly maybe we should look into isolating some of the commands to a single-core on the ESP32 (using SemaphoreHandle_t).

@bbqkees
Copy link

bbqkees commented Jul 30, 2020

I'm now on v2.0.0b9 (not the uart one) and mode 1 works fine with zero errors.
As a test I uploaded the bin file via the web interface of b7 and that went smoothly.

@proddy
Copy link
Collaborator Author

proddy commented Jul 30, 2020

I've been running ESP8266, bus-powered on tx_mode 2 with 0 errors for the last 24hrs. Weird.

@proddy proddy added this to In Progress in Feature Backlog Aug 14, 2020
@proddy
Copy link
Collaborator Author

proddy commented Aug 14, 2020

On ESP8266, TxMode 1, USB powered I'm getting a 100% success rate with Rx and Tx. But as soon as I change it use bus-powered I get the odd corrupted/incomplete Rx. e.g.

[telegram] Rx: 17 0B A8 00 01 00 FF F6 01 06 00 01 0D 01 00 FF FF 01 02 02 02 00 00 05 1F 05 1F 02 0E 00 90 (CRC 90 != 10) ERROR

because its missing the last byte of the telegram. A correct telegram looks like:

[telegram] Rx: 17 0B A8 00 01 00 FF F6 01 06 00 01 0D 01 00 FF FF 01 02 02 02 00 00 05 1F 05 1F 02 0E 00 FF DF (notice the extra byte) OK

@MichaelDvP this is since the last UART updates from d6c5321

@bbqkees this is also what you've been experiencing testing the latest v2_cmd branch.

I'll make a capture from USB vs bus-powered with the logic analyzer to see what the differences are.

@MichaelDvP
Copy link
Collaborator

Are you sure it's the uart-update? There is no change in rx and i used it before without crc-error, but now i also get some (3 in 36 h). I'm logging now every crc error to syslog to see if there is a system (same src/dest/type).
The uart change was only different break-length for tx-modes modes due to this

@proddy
Copy link
Collaborator Author

proddy commented Aug 15, 2020

I'm seeing this also with USB power (only 5 incompletes in 20hrs) and it only started to happen in the last days when I pushed the big change with commands (v2_cmd), so I agree I don't think it is related to these UART changes. I'm tracing too to syslog to find a pattern

@proddy
Copy link
Collaborator Author

proddy commented Aug 16, 2020

I have a suspicion its due to the NTP server which I re-enabled for the ESP8266 build. When removed I'm not getting Rx errors but it's only been running for the last 2 hrs so need to leave it a while longer to confirm. It may be that the lwip and sntp service is conflicting with the interrupts we use in the UART.

@MichaelDvP
Copy link
Collaborator

I had 2 errors yesterday with very strange telegrams, thermostat don't send these messages.

Aug 15 13:34:54 ems - 000+00:00:17.350 E 3: [telegram] Rx: 10 08 19 00 01 08 01 04 01 05 FF FF FF 00 02 0F 48 09 8A E7 00 00 00 09 02 83 02 04 49 7D 00 1D #033[0;31m(CRC 1D != 1F)#033[0m
Aug 15 15:00:54 ems - 000+01:26:18.051 E 449: [telegram] Rx: 10 08 18 1B 1A 08 00 34 00 30 02 72 7D 00 21 00 00 03 00 01 19 45 00 18 B8 D7 #033[0;31m(CRC D7 != A2)#033[0m

Than i updated platformio and framework and get no errors since that (NTP activated, tx_mode 1, virtual roomcontrol-thermostat activated), maybe it was something in the framework.

@proddy
Copy link
Collaborator Author

proddy commented Aug 19, 2020

@MichaelDvP just tried the latest v2 (which was v2_cmd) on an ESP32 and it crashes. Error says "Debug exception reason: Stack canary watchpoint triggered (emsuart_recvTas)" pointing to somewhere in the UART code. Are you seeing the same?

@MichaelDvP
Copy link
Collaborator

Tried the latest v2 code. When connected to the bus the esp32 restarts after around half a minute.

@proddy
Copy link
Collaborator Author

proddy commented Aug 19, 2020

same here. Something in the Tx code I think because when I use tx_mode 0 it works. My first thought was because of the dual-core calling the callback functions too fast at the same time now that there is no Rx queue. So may need need to stick in some mutex calls (or use std::atomic). Still investigating.

@MichaelDvP
Copy link
Collaborator

I moved process_telegram back to the main thread, but without queue, and esp32 worked without crashes. Since process_telegram also calls a lot of other stuff like mqtt, i think it's better to keep recvTask fast and clean.

void RxService::loop() {
    if (telegram_rx_) {
        (void)EMSESP::process_telegram(rx_telegram); // further process the telegram
        increment_telegram_count();                  // increase count
        telegram_rx_ = false;
    }
}

@proddy
Copy link
Collaborator Author

proddy commented Aug 20, 2020

thanks Michael. I'll try this too and push a fix

@proddy
Copy link
Collaborator Author

proddy commented Aug 20, 2020

ok, back to a working v2 on ESP32 (thanks Michael!) and seeing some errors with bus-powered vs USB-powered (tx mode 1). Running for ~30 minutes below are the stats:

via usb: all is good

usb-p

via bus-powered: not so good. Both Rx and Tx fails.

bus-p

@proddy proddy closed this as completed Oct 18, 2020
@proddy proddy moved this from In Progress to To Do in Feature Backlog Nov 18, 2020
@proddy proddy moved this from To Do to In Progress in Feature Backlog Nov 18, 2020
@proddy proddy moved this from In Progress to Done in Feature Backlog Nov 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

5 participants