Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UART0 freezing RTL8710 + flash layout talks #91

Closed
hn opened this issue Feb 25, 2023 · 50 comments
Closed

UART0 freezing RTL8710 + flash layout talks #91

hn opened this issue Feb 25, 2023 · 50 comments
Labels
bug Something isn't working

Comments

@hn
Copy link
Contributor

hn commented Feb 25, 2023

I'm trying to replace the stock (AliOs) firmware of a Ginlong Solis Solar Inverter. The device is based on the EMW3080 MCU, which is an (identical?) clone of the RTL8710BN (my project page with datasheet, more info on the MCU, link to AliOs EMW3080).

libretuya is booting (fantastic work, thanks!), WiFi and GPIOs for reset pin and status LEDs seem to work fine (basic tests done: WiFi connect/DHCP, button press, blinking LEDs, ...).

If I try to use hardware serial port 0 (for interfacing modbus later), the system freezes, e.g. with plain libretuya:

void setup() {
  Serial.begin(115200);
  Serial.println("Started log serial (uart2)");

  Serial0.begin(9600);
  Serial.println("Started modbus serial (uart0)");

The MCU freezes at Serial0.begin, the 'Started modbus serial' is never reached (and main loop is not started as well).

Same test with libretuya-esphome

libretuya:
  board: generic-rtl8710bn-2mb-788k
  framework:
    version: latest

[... WiFI setup ...]

binary_sensor:
  - platform: gpio
    pin: PA08
    name: "Reset button"

#uart:
#  id: uart_0
#  tx_pin: PA23
#  rx_pin: PA18
#  baud_rate: 9600

If I remove the comments for 'uart' the MCU freezes at 'Setting up UART...' log output.

Any help on how to fix or debug this is welcome!

@kuba2k2
Copy link
Member

kuba2k2 commented Feb 25, 2023

Can you add some printf's in SerialClass::begin to see where exactly does it halt? The file is libretuya/arduino/realtek-ambz/cores/arduino/SerialClass.cpp. It might be something related to the IRQs, maybe..

@hn
Copy link
Contributor Author

hn commented Feb 25, 2023

Thanks for your quick answer!

You can see correct init for log uart2 first and uart0 freezes:

I [      0.000] LibreTuya v0.12.6 on generic-rtl8710bn-2mb-788k, compiled at Feb 23 2023 15:45:47
I [      0.000] Reset reason: 0
SerialClass::begin before 'UART_InitTypeDef cfg'
SerialClass::begin before 'UART_Init(data.uart'
SerialClass::begin before 'UART_SetBaud('
SerialClass::begin before 'if (data.buf)'
SerialClass::begin before 'Pinmux_Config'
SerialClass::begin before 'VECTOR_IrqUnRegister'
Started log serial (uart2)
SerialClass::begin before 'UART_InitTypeDef cfg'
SerialClass::begin before 'UART_Init(data.uart'

Seems to freeze at 'UART_Init':

        uint8_t stopBits = (config & SERIAL_STOP_BIT_MASK) == SERIAL_STOP_BIT_2;

        Serial2.println("SerialClass::begin before 'UART_InitTypeDef cfg'");

        UART_InitTypeDef cfg;
        UART_StructInit(&cfg);
        cfg.WordLen        = dataWidth;
        cfg.Parity         = parity;
        cfg.ParityType = parityType;
        cfg.StopBit        = stopBits;

        Serial2.println("SerialClass::begin before 'UART_Init(data.uart'");

        UART_Init(data.uart, &cfg);

        Serial2.println("SerialClass::begin before 'UART_SetBaud('");

        UART_SetBaud(data.uart, baudrate);

        Serial2.println("SerialClass::begin before 'if (data.buf)'");

        if (data.buf) {

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 5, 2023

Found the issue - turns out that I probably haven't tested Serial on RTL at all... there was a system call missing. Also, I found out that Serial interrupts (reading data) didn't work, so I'm fixing that too.

The fix will be included in the structure-refactor branch.

@kuba2k2 kuba2k2 added the bug Something isn't working label Mar 5, 2023
@hn
Copy link
Contributor Author

hn commented Mar 5, 2023

Thanks! Is there an easy way to test the fix? If I specify platform = https://github.com/kuba2k2/libretuya.git#structure-refactor in platformio.ini pio run fails with

File ".platformio/penv/lib/python3.9/site-packages/ltchiptool/models/family.py", line 28, in __init__
    for key, value in data.items():
AttributeError: 'str' object has no attribute 'items'

(side note: how to specify a platform branch in esphome.yaml?)

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 5, 2023

Yeah, you'd need to git clone the ltchiptool repository on the refactor branch, and install it inside the PlatformIO virtualenv. Not that straightforward, but doable.

@hn
Copy link
Contributor Author

hn commented Mar 6, 2023

Hmm, that ltchiptool thing completely broke my platformio installation :) In the end it was easier to backport the essential lines of 046f7df and voila:

SerialClass::begin fix active
Started modbus serial (uart0)
Hello World!

It no longer crashes! At the time of writing, I don't have a Modbus slave device available, so I'll have to test later to see if any data is actually flowing.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 6, 2023

The installation will be automatic when I release this. So it won't break platformio installations :)

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 13, 2023

Hi @hn
I saw the mention in hn/ginlong-solis@a67e2cc. Since the structure refactor won't be out for at least a few more days, I've backported the fix on master branch as well 😄 Compilation works, runtime should work too, but I didn't have time to test it.

EDIT: let's keep the issue open so that I remember to add that "MX1290" to the list of supported chips.

@hn
Copy link
Contributor Author

hn commented Mar 14, 2023

Thanks @kuba2k2 ! I removed the "UART fix" hint.

Things stay a little bit shaky with the EMW3080, e.g. the WiFi AP mode and/or captive portal don't seem to work (probably this is issue #13). I've no time to dig in deeper right now but I'll report later if problems persist.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 14, 2023

Yes, that's #13, a lucky one. For some reason AP mode works just fine when used in a simple Arduino sketch, but not in ESPHome.

@hn
Copy link
Contributor Author

hn commented Mar 14, 2023

The ESPhome AP seemed to collapse after some seconds, didn't work at all. With plain arduino AP mode via WiFi.softAP(ssid) my phone connected but did not get an IP address via DHCP.

This is what I remember from one single test last week, no details today :)

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 14, 2023

That's the second part of the issue. On esphome it's just not visible at all, and on Arduino it is but DHCP sometimes doesn't work.

@hn
Copy link
Contributor Author

hn commented Mar 17, 2023

Hehe, I experienced very strange errors with some received ModBus messages. After further investigating I saw that zero (0x00) bytes were missing from the input stream. After checking cabling, beating up the ModBus lib and so on I noticed that the following fix is simply missing from your backport:

-	UART_CharGet(data->uart, &c);
-	if (c)
+	while (UART_Readable(data->uart)) {
+		UART_CharGet(data->uart, &c);
 		data->buf->store_char(c);
+	}

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 18, 2023

Ah, yes, sorry for that. Will you be fine having that fix only locally for now? I don't want to clutter the master branch, and I think the refactor is coming close to be finished soon.

@hn
Copy link
Contributor Author

hn commented Mar 19, 2023

For me it's fine, the inverter reliably reads various ModBus registers and pushes data to ESPhome / HA, that's really great!

Side note: OTA seems to be broken? Logs are ok

Starting OTA Update ...
OTA in progress: 0.1%
OTA in progress: 36.4%
OTA in progress: 72.5%
OTA update finished!
Rebooting safely...

But no new firmware is showing up. So I'm still working with serial uploads.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 19, 2023

Not really, I don't recall it being broken. Does your board have 2MB of flash? Can you dump it via serial and post it here? You can mask out your SSID/pass as long as you keep it the same length.

@hn
Copy link
Contributor Author

hn commented Mar 20, 2023

I have to double-check the OTA thing. I think it might be related to the fact that so far I solely flashed image_0x00B000.ota1.bin to address 0xb000 (so the rest of the flash is still AliOs standard and the other parts of the UF2 are missing). It is probably a good idea to flash the complete UF2 with ltchiptool to get OTA working, right?

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 20, 2023

Yes, you should always flash the UF2, unless you want to break something. There's some information at 0x9000 that indicate OTA2 address, and it's possible that yours is different from the standard 0xD0000.

@hn
Copy link
Contributor Author

hn commented Mar 20, 2023

Hm, AliOs OTA2 seems to be at 0x100000 while LibreTuya expects 0xD0000:

I: Detected file type: UF2 - esphome 2023.3.0-dev
I: Connecting to 'Realtek AmebaZ' on /dev/ttyUSB0 @ 1500000
I: |-- Success! Chip info: Realtek RTL87xxB
I: Writing 'firmware.uf2'
I: |-- esphome 2023.3.0-dev @ 2023-03-20 10:22:10 -> generic-rtl8710bn-2mb-788k
E: ValueError: Invalid OTA2 address on chip - found 1048576, expected 851968
E: |-- File ".local/lib/python3.9/site-packages/ltchiptool/soc/ambz/flash.py", line 152, in flash_write_uf2

Is there a way to conveniently "force" ltchiptool to overwrite system at 0x9000 or do I have to manually read->change->write the system block?

PS: From the dump I see that the ESPhome OTA routine has placed the image at 0xD0000 so this (very first) part of the update process seems to be successful.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 20, 2023

I believe generic-rtl8710bx-4mb-980k has exactly that offset. I presume your flash is 4MB then. You might want to override the MCU name and clock frequency in PlatformIO options.

@hn
Copy link
Contributor Author

hn commented Mar 20, 2023

Hm, the datasheet says "2M bytes XIP flash". On the other hand, the reconstructed partition table lists some addresses above 0x200000. Hard to believe that they put in more memory than printed in the datasheet ...

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 20, 2023

That partition table doesn't look real. There's no such thing as "recovery", "OTA storage" on AmebaZ, and the application is always at 0xB000. Do you have a full flash dump of the stock firmware that you can post?

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 20, 2023

Ah, I see. The "2ndboot" is what's stored at 0xB000, and it probably performs tasks like OTA unpackaging (from 0x150000) and booting the main app (from 0x19000), hence it's called "recovery". Nevertheless, if you've been flashing to 0xB000 manually, the 2ndboot is long gone, so only Realtek's bootloader remains.

Now, when you flash an OTA update from an UF2 file (through web_server), it gets written to 0xD0000. The "system" partition is then switched to boot from OTA2, but since there's nothing valid at 0x100000, the Realtek bootloader jumps back to 0xB000.

LibreTuya, when used as an Arduino framework, has a LT.getFlashChipSize() function, that can be used to retrieve the actual size.

@hn
Copy link
Contributor Author

hn commented Mar 20, 2023

The AliOs guys have added an own layer to the AmebaZ standard. Their "application" is jsut the "2ndboot boot loader" which has its own AliOs update mechanism. I wiped out their stuff by overwriting 2ndboot.

You can see their partition table

I'll patch system part to 0xd0000 and see what happens then.

@hn
Copy link
Contributor Author

hn commented Mar 20, 2023

With patched system block the OTA seems to work :-)

I'm still confused with the 2MB/4MB thing, though. You can read from 0x2a2000 (WiFi credentials), 0x2a4000 (AliOs cloud credentials) and so on. But 0x2a0000 is not within 2MB ...

I had to use Python2-rtltool to upload the system block, ltchiptool fails with:

$ ltchiptool flash write -f RTL8710B -s 0x9000 systempart-d0000.bin
I: Available COM ports:
I: |-- ttyUSB0 - FT232R USB UART - FT232R USB UART - VID=0403 (FTDI), PID=6001
I: |   |-- Selecting this port. To override, use -d/--device
I: |-- ttyUSB1 - USB Serial - VID=1A86 (None), PID=7523
I: |-- ttyAMA0 - ttyAMA0 - HWID=3f201000.serial
C: Unknown error in parameter processing logic

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 20, 2023

Oops, thanks for the error report, fixed in v3.0.3.

If you can read at these addresses, the flash is most definitely 4 MiB. Also, an OTA partition of 0x100000 size wouldn't fit on 2 MiB flash, as there's another OTA partition to fit as well.

@hn
Copy link
Contributor Author

hn commented Mar 22, 2023

Hm, things are getting even more strange. Playing with Arduino LT.getFlashChipSize() and others show:

Started log serial (uart2)
getBoard: generic-rtl8710bn-2mb-788k
getChipCoreType: ARM Cortex-M4F
getChipCores: 1
getChipModel: RTL8710BN
getFlashChipSize: 8388608
getRamSize: 262144
getFlashChipId->chipId: 64
getFlashChipId->chipSizeId: 23
getFlashChipId->manufacturerId: 104    // = 0x68 => SPI flash 'Boya Microelectronics Inc' ?

So do we have an 8MB flash? I did not find a quick way (yet) to map the getFlashChipId() values to anything meaningful.

I dumped 10MB flash address space with rtltool and after 0x800000 the same byte pattern appears. Somewhat reinforces the idea that it really is 8MB flash.

ltchiptool does not allow to dump more than 2MB for this chip:

$ ltchiptool flash read -l 4194304 RTL8710B dummy-dump-4m.bin
I: Connecting to 'Realtek AmebaZ' on /dev/ttyUSB0 @ 1500000
I: |-- Success! Chip info: Realtek RTL87xxB
E: ValueError: Reading length 4 MiB @ 0x0 is more than chip capacity (2 MiB)

@hn
Copy link
Contributor Author

hn commented Mar 22, 2023

Anyway, using this ESPhome config

esphome:
  name: solis-inv
  platformio_options:
    board_build.mcu: rtl8710bn
    board_build.f_cpu: 125000000L

libretuya:
  board: generic-rtl8710bx-4mb-980k
  framework:
    version: latest

I can successfully upload and OTA-update the EMW3080 (with 0x9000 system block set back to stock AliOS OTA2=0x100000).

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 22, 2023

So do we have an 8MB flash? I did not find a quick way (yet) to map the getFlashChipId() values to anything meaningful.

It appears so, yes. The chipSizeId is a bit shift operation specifier, and the size is usually calculated from 1 << chipSizeId (that's how LT.getFlashChipSize() does this).

Most of other LT.get...() functions are actually hardcoded (like the core type or RAM size).

About ltchiptool - I didn't find a reliable way to detect the flash size during download mode, yet. Maybe I'll just skip the size validity check at some point.

I'll also try to add an option to customize flash partition layout, so that one can use any compatible board and just change whatever they need in the flash layout. Maybe I'll include that in the refactor, but most likely it'll be later.

@hn
Copy link
Contributor Author

hn commented Mar 24, 2023

About ltchiptool - I didn't find a reliable way to detect the flash size during download mode, yet. Maybe I'll just skip the size validity check at some point.

I would vote to change the hard check to a warning. If someone explicitly sets start address or length, they should know what they are doing.

I'll also try to add an option to customize flash partition layout, so that one can use any compatible board and just change whatever they need in the flash layout. Maybe I'll include that in the refactor, but most likely it'll be later.

Nice. Maybe even provide a way to add local MCU definitions inherited from the official ones. I don't know how this would fit into the ESPhome environment, but it should be doable somehow.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 24, 2023

I would vote to change the hard check to a warning. If someone explicitly sets start address or length, they should know what they are doing.

Will do.

Nice. Maybe even provide a way to add local MCU definitions inherited from the official ones. I don't know how this would fit into the ESPhome environment, but it should be doable somehow.

What do you mean by this? I'm not sure what are MCU definitions, and what official ones.

@hn
Copy link
Contributor Author

hn commented Mar 24, 2023

What do you mean by this? I'm not sure what are MCU definitions, and what official ones.

I was roughly thinking of a local emw3080-8mb.json file where I can set MCU manufacturer name, cpu speed, pins, flash layout etc. This file inherits everything from one of your 'official' jsons, e.g. generic-rtl8710bn-2mb-788k.json. Don't know if this is a good idea.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 24, 2023

So something like board_build.mcu: rtl8710bn but in a JSON? That could work, if done well, but surely ESPHome support for that would be a bit problematic. LT/PlatformIO has no idea about ESPHome YAMLs and their path. ESPHome just generates PIO files that are then built by LT.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 25, 2023

I figured that something like this should be okay:
obraz

With this, you can override any parameter of the board JSON, either one by one in platformio.ini, or with your own JSON having multiple parameters.

Flash partitions can also be customized, using the shorthand option custom_flash.xxxx = offset. Specifying that in JSON is also possible, but then it won't recalculate the flash lengths for you. So, if you change, let's say, app to 0x140000 like I did, it will overlap with download which starts at 0x132000 and goes for 0xA6000 bytes (note that on the screenshot it's automatically recalculated to 0xE000).

It is of course possible to include all custom_ options in another custom JSON, but I think that misses the point. Additionally, things like custom options:

custom_options.lwip =
    LWIP_DEFINE_HELLO = 1

would be problematic, because the code expects them to actually be in form of \n LWIP_DEFINE_HELLO = 1\n, and that would be tedious to do in JSON.

So for now we'll stick with allowing custom board JSONs only. If you want to change the entire flash layout and include it in many platformio.inis, change (and recalculate) the entire partition table, and put it in custom_board JSON.

@kuba2k2 kuba2k2 changed the title Using hardware uart0 freezes EMW3080 (RTL8710BN) UART0 freezing RTL8710 + flash layout talks Mar 25, 2023
@kuba2k2
Copy link
Member

kuba2k2 commented Mar 25, 2023

Regarding ESPHome, I will probably add dedicated options to set some of the custom_ properties (incl. the board JSON), so that can work really well. This was a cool idea!

There's still another problem: OTA. Currently, the flashing file (firmware.uf2) only contains the partition names (plus the binary content to write to them). When customizing the partition layout, the UF2 has no indication of the new offsets. This is not a problem when flashing a UF2 with a matching partition table to a device which has this part. table already, but a device without the new table will still flash to old offsets - and, most definitely, brick itself.

I'm thinking of embedding the partition table in the UF2 in all cases. That way you could freely change the partition table, and OTA will always respect it before booting the new firmware. That will not work on older devices, though (which will just ignore the "partition table block"), so they'll have to be flashed to new firmware first.

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

Sounds quite cool. I was just lazy brainstorming how a local mxchip-emw3080-8mb-980k.json could possible look like:

{
        "_base": [
 // maybe it would be possible to someway inherit from non-_base boards here,
 // e.g. "generic-rtl8710bn-2mb-788k.json", because otherwise you have to include
 // everything (pcb-pinouts ...) here. I think it would be nice to inherit everything
 // and just overwrite the needed parts.
               "generic",
                "realtek-ambz",
                "realtek-ambz-2mb-788k",
                "ic/rtl8710bn"
        ],
        "build": {
                "mcu": "rtl8710bn",
                "variant": "mxchip-emw3080-8mb-980k"
        },
        "name": "MXCHIP - EMW3080 (8M/980k)",
        "symbol": "EMW3080 (8M/980k)",
        "url": "https:// ... ",
        "vendor": "MXCHIP",
        "flash": {
                "ota1": "0x00B000+0xF5000",
                "ota2": "0x100000+0xF5000",
                "kvs": "0x1F5000+0x8000",
                "userdata": "0x1FD000+0x202000",
                "rdp": "0x3FF000+0x1000"
        },
        "upload": {
                "flash_size": 8388608,
                "maximum_size": 1003520
        }

// "pcb": {
//                "pinout":   // needed here if not inherited from "generic-rtl8710bn-2mb-788k.json"

}

But ... in my case it might be easier not to use a json and just overwrite the needed things with custom_ options. Would it be possible to set the vendor/name as well?

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 25, 2023

You can set everything you posted in the custom_board JSON. The base JSON is inherited from the board specified by board = , i.e. there's no need for _base because it's set by default. The custom JSON inherits everything from the board you've chosen - an empty JSON will just not change anything.

It's possible to override vendor/name, but there's actually no point since they're not used anywhere except in docs.

If you want, you can actually create a PR to add the EMW3080 board to the list of supported boards. But since the boards will change a bit after the refactor, you'd have to make it to the other branch... Or you can wait till I merge the refactor and create the PR then 🙂

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

I just pushed a beta version of the Solis S3 WiFi stick ESPhome replacement firmware . Things are doing really well, except after some hours the ModBus traffic somehow gets out of sync and fails with CRC errors. I have to dig deeper into this later (but I think it is caused by the a-little-bit-shaky modbus component and not by LibreTuya).

LibreTuya Wishlist ;-) :

  • time/ntp component: Would allow to sync 'internet time' to the inverter DSP via ModBus
  • stable OTA/captive portal: Would allow to offer precompiled firmware binaries without hardcoded WiFi credentials

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 25, 2023

IIRC, NTP works just fine. I tested it some time ago and there was no issues.

Captive portal is working on the refactor branch, but only for the first time (i.e. it won't work after disconnecting from the WiFi, and being unable to join the network). In fact, it won't even reconnect to the network after losing connection - I wasn't able to get the RTL SDK to reconnect, but maybe that's an issue with my network only.

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

You can set everything you posted in the custom_board JSON. The base JSON is inherited from the board specified by board = , i.e. there's no need for _base because it's set by default. The custom JSON inherits everything from the board you've chosen - an empty JSON will just not change anything.

Ahh, great, I just misunderstood your first posting.

If you want, you can actually create a PR to add the EMW3080 board to the list of supported boards. But since the boards will change a bit after the refactor, you'd have to make it to the other branch... Or you can wait till I merge the refactor and create the PR then

The EMW3080 seems not to be very common. And they produce an 8MB version without publishing matching datasheets. This gives me the slight feeling that it is not appropriate for them to get their own JSON :) But, I'll think about this later.

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

IIRC, NTP works just fine. I tested it some time ago and there was no issues.

Failed config
time: [source solis-inv-esphome.yaml:45]
    Component not found: time.
  - platform: sntp
    id: sntp_time
    timezone: CET-1CEST,M3.5.0,M10.5.0/3
    servers: de.pool.ntp.org

Source YAML. I have not checked this in detail as the LibreTuya page denies support for NTP.

Captive portal is working on the refactor branch, ...

Ok, I'll re-check when the refactor branch has been published.

@catalin2402
Copy link
Collaborator

catalin2402 commented Mar 25, 2023

use :

libretuya:
  framework:
    version: dev

version dev should be used instead of latest

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

Same error here with:

libretuya:
  board: generic-rtl8710bx-4mb-980k
  framework:
    version: dev

I just changed the string. Do I need to somehow re-download/configure libretuya?

@catalin2402
Copy link
Collaborator

It should do it automatically. "latest" is the latest one published on platformio, "dev" is pulling the libretuya platform from git

@hn
Copy link
Contributor Author

hn commented Mar 25, 2023

Not a LibreTuya problem: (Debian) system package python3-tzlocal was missing, which meant that the ESPhome time component could not be activated.

@kuba2k2
Copy link
Member

kuba2k2 commented Mar 26, 2023

In your guide I can see you're installing packages manually, while you should do it with pip install -r requirements.txt

@hn
Copy link
Contributor Author

hn commented Mar 31, 2023

Sorry to disturb you again. Things are generally working very well. But after a few hours, sometimes the ModBus/UART traffic gets out of sync and cannot be resynchronised except by rebooting the stick.

More detail: The correct ModBus response string is 01 04 02 00 01 78 f0 (last two bytes are CRC). The problem is, that the very last byte (0xf0) comes 'late' and gets 'mixed' into the next ModBus response (t=millis() extra debug added by me):

[modbus_controller:035]: t=57137 Sending next modbus command to device 1 register 0xBFF count 1
[modbus:199]: Modbus write: 01.04.0B.FF.00.01.03.DE (8)
[modbus_controller:486]: Command sent 4 0xBFF 
[modbus:042]: t=57228 Modbus received Byte  240 (0xF0)   <----- this is from the previous response
[modbus:042]: t=57234 Modbus received Byte  1 (0x 1)
[modbus:042]: t=57239 Modbus received Byte  4 (0x 4)
[modbus:042]: t=57245 Modbus received Byte  2 (0x 2)
[modbus:042]: t=57251 Modbus received Byte  0 (0x 0)
[modbus:042]: t=57257 Modbus received Byte  1 (0x 1)
[modbus:042]: t=57263 Modbus received Byte  120 (0x78)
[modbus_controller:035]: t=57456 Sending next modbus command to device 1 register 0xBFF count 1
[modbus:199]: Modbus write: 01.04.0B.FF.00.01.03.DE (8)
[modbus_controller:486]: Command sent 4 0xBFF 
[modbus:042]: t=57535 Modbus received Byte  240 (0xF0)  <----- this is from the previous response
[modbus:042]: t=57541 Modbus received Byte  1 (0x 1)
[modbus:042]: t=57546 Modbus received Byte  4 (0x 4)
[modbus:042]: t=57552 Modbus received Byte  2 (0x 2)
[modbus:042]: t=57558 Modbus received Byte  0 (0x 0)
[modbus:042]: t=57564 Modbus received Byte  1 (0x 1)
[modbus:042]: t=57570 Modbus received Byte  120 (0x78)
[modbus_controller:029]: t=57775 Modbus command to device=1 register=0xBFF countdown=0 no response received - removed from send queue

I wonder if this is a problem of LibreTuya or the ModBus component. It's somewhat hard to debug because you have to wait more or less half a day until it appears.

The ModBus component just checks if bytes are available() (esphome/components/modbus/modbus.cpp Modbus::loop), so for me it's unclear where the time lag / desync comes from.

I have a very vague suspicion that the serial.available()=false status may be wrong (and thus the last byte remains in the buffer) and is only set to true again by the arrival of the next real ModBus response. Hmmm.

@RoganDawes
Copy link

Connect to the UART with a USB-RS232 adapter, send 1 byte at a time, and see if it shows up when there is only 1 byte in the buffer?

@hn
Copy link
Contributor Author

hn commented Apr 24, 2023

I'm still having problems with ModBus traffic getting de-sync-ed (as described above). I've tried numerous things and in the end it depends on one single line in SerialClass.cpp. Believe it or not:

  • with foo-println (Serial2.println("foo"), see below) I had no CRC errors in the logs for 5+ days
  • without the foo-println I get CRC errors after some hours every day

Obviously, this line makes no sense. I don't know if the println changes some timing edge case or some inner workings inside the UART or something .. just strange.

I'll double/triple-check this during the next days and/or try the structure-refactor branch when it is released (yeah!).

--- libretuya/arduino/realtek-ambz/cores/arduino/SerialClass-orig.cpp	2023-04-24 22:25:29.470688094 +0200
+++ libretuya/arduino/realtek-ambz/cores/arduino/SerialClass.cpp	2023-04-24 22:25:06.562122497 +0200
@@ -31,6 +31,7 @@
 		UART_CharGet(data->uart, &c);
  		data->buf->store_char(c);
 	}
+	Serial2.println("foo");
 
 	data->uart->DLH_INTCR = intcr;
 	return 0;

@hn
Copy link
Contributor Author

hn commented May 7, 2023

A lot of topics have been discussed in this issue and it may be too long overall. In particular, the original problem has been solved, so I am closing the issue.

I've opened a new issue to track the ModBus desync problem.

@hn hn closed this as completed May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants