Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_ota_begin() breaks wifi connection (IDFGH-3656) #5591

Closed
dreske opened this issue Jul 14, 2020 · 13 comments
Closed

esp_ota_begin() breaks wifi connection (IDFGH-3656) #5591

dreske opened this issue Jul 14, 2020 · 13 comments
Labels
Awaiting Response awaiting a response from the author

Comments

@dreske
Copy link

dreske commented Jul 14, 2020

Environment

  • Module or chip used: ESP32-WROOM-32D
  • IDF version: v4.2-dev-1856-g00148cd0c
  • Build System: idf.py
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2020r2) 8.2.0
  • Operating System: Linux
  • Using an IDE?: No
  • Power Supply: external 3.3V

Problem Description

Calling esp_ota_begin() with OTA_SIZE_UNKNOWN randomly disconnects wifi.
Found this old issue #2083 with exactly the same behavior.

I've already set some 'aggressive' settings for spi flash I think (50 Ticks yield every 1ms):

CONFIG_SPI_FLASH_YIELD_DURING_ERASE=y
CONFIG_SPI_FLASH_ERASE_YIELD_DURATION_MS=1
CONFIG_SPI_FLASH_ERASE_YIELD_TICKS=50

Tested with different modules, same behavior.

Expected Behavior

Flash is erased successfully without wifi interruption

Actual Behavior

Wifi breaks in some (around 30%) cases

Steps to reproduce

Calling esp_ota_begin() in a loop and log output

Code to reproduce this issue

void start_ota(void* param)
{
    while(1) {
        esp_err_t err;        
        esp_ota_handle_t update_handle = 0 ;
        const esp_partition_t *update_partition = NULL;

        ESP_LOGI(TAG, "Starting OTA example");
        const esp_partition_t *configured = esp_ota_get_boot_partition();
        const esp_partition_t *running = esp_ota_get_running_partition();

        if (configured != running) {
            ESP_LOGW(TAG, "Configured OTA boot partition at offset 0x%08x, but running from offset 0x%08x",
                 configured->address, running->address);
            ESP_LOGW(TAG, "(This can happen if either the OTA boot data or preferred boot image become corrupted somehow.)");
        }
        ESP_LOGI(TAG, "Running partition type %d subtype %d (offset 0x%08x)",
                running->type, running->subtype, running->address);


        update_partition = esp_ota_get_next_update_partition(NULL);
        update_partition->subtype, update_partition->address);
        
        ESP_LOGI(TAG, "esp_ota_begin");
        err = esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &update_handle);
        if (err != ESP_OK) {
            ESP_LOGE(TAG, "esp_ota_begin failed (%s)", esp_err_to_name(err));
            continue;
        }
        ESP_LOGI(TAG, "esp_ota_begin succeeded");
        vTaskDelay(5000 / portTICK_RATE_MS);
    }
}


void app_main(void)
{
    // Initialize NVS.
    esp_err_t err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        // 1.OTA app partition table has a smaller NVS partition size than the non-OTA
        // partition table. This size mismatch may cause NVS initialization to fail.
        // 2.NVS partition contains data in new format and cannot be recognized by this version of code.
        // If this happens, we erase NVS partition and initialize NVS again.
        ESP_ERROR_CHECK(nvs_flash_erase());
        err = nvs_flash_init();
    }
    ESP_ERROR_CHECK( err );

    ESP_ERROR_CHECK(esp_netif_init());
    ESP_ERROR_CHECK(esp_event_loop_create_default());

    /* This helper function configures Wi-Fi or Ethernet, as selected in menuconfig.
     * Read "Establishing Wi-Fi or Ethernet Connection" section in
     * examples/protocols/README.md for more information about this function.
    */
    ESP_ERROR_CHECK(example_connect());

#if CONFIG_EXAMPLE_CONNECT_WIFI
    /* Ensure to disable any WiFi power save mode, this allows best throughput
     * and hence timings for overall OTA operation.
     */
    esp_wifi_set_ps(WIFI_PS_NONE);
#endif // CONFIG_EXAMPLE_CONNECT_WIFI

    xTaskCreate(&start_ota, "advanced_ota_example_task", 1024 * 8, NULL, 1, NULL);
}

Debug Logs

I (893078) advanced_https_ota_example: Starting OTA example                                                                                                                                                                                  
I (893078) advanced_https_ota_example: Running partition type 0 subtype 0 (offset 0x00010000)                                                                                                                                                
I (893078) advanced_https_ota_example: Writing to partition subtype 16 at offset 0x110000                                                                                                                                                    
I (893088) advanced_https_ota_example: esp_ota_begin                                                                                                                                                                                         
I (896548) wifi:state: run -> init (cf00)                                                                                                                                                                                                    
I (896548) wifi:pm stop, total sleep time: 0 us / 17210844 us                                                                                                                                                                                
                                                                                                                                                                                                                                             
I (896548) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1                                                                                                                                                                        
I (896558) example_connect: Wi-Fi disconnected, trying to reconnect...                                                                                                                                                                       
I (896798) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1                                                                                                                                                                        
I (896798) wifi:state: init -> auth (b0)
I (896798) wifi:state: auth -> assoc (0)
I (896808) wifi:state: assoc -> run (10)
I (896818) wifi:connected with home, aid = 1, channel 6, BW20, bssid = 18:e8:29:c1:dd:75
I (896818) wifi:security: WPA2-PSK, phy: bgn, rssi: -54
I (896818) wifi:pm start, type: 0

I (896888) advanced_https_ota_example: esp_ota_begin succeeded

@github-actions github-actions bot changed the title esp_ota_begin() breaks wifi connection esp_ota_begin() breaks wifi connection (IDFGH-3656) Jul 14, 2020
@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, we will look into.

@copercini
Copy link

copercini commented Jul 27, 2020

I'm having the exactly same issue but with 100% of fail rate. Here I'm using a webserver to upload the new .bin file to OTA, and it only happens in WiFi STA mode, when using WiFi AP or ethernet, the connection with WiFi STA still drops, but OTA works

EDIT: some logs when OTAing from AP:

I (24886) OTA: Running partition type 0 subtype 16 (offset 0x00010000)
I (24886) OTA: OTA started....
I (24886) OTA: Writing to partition subtype 17 at offset 0x210000
I (26306) wifi:state: run -> init (cf00)
I (26576) wifi:pm stop, total sleep time: lu us / lu us

I (26586) wifi:new:<1,0>, old:<1,1>, ap:<255,255>, sta:<1,1>, prof:1
[W][WiFiGeneric.cpp:815] _eventCallback(): Reason: 8 - ASSOC_LEAVE
I (69246) OTA: esp_ota_begin succeeded
I (69246) OTA: Have written image length 438
I (69286) OTA: Have written image length 1874
....

@shubhamkulkarni97
Copy link
Contributor

Hi @copercini @dreske,

I'm enable to reproduce this issue. Is there anything specific change in configuration which causes this issue?
Are you facing this issue with custom code or with IDF examples as well?

Can you share sdkconfig file for the same?

@dreske
Copy link
Author

dreske commented Jul 29, 2020

Hi, also with the OTA examples.
I'll share my sdkconfig later, but its default, except the wifi config and my changes regarding spi flash (mentioned above)

@Alvin1Zhang Alvin1Zhang added the Awaiting Response awaiting a response from the author label Aug 6, 2020
@Alvin1Zhang
Copy link
Collaborator

@dreske Thanks for reporting, would you please help share if any updates for this issue? Thanks.

@copercini
Copy link

copercini commented Aug 21, 2020

@shubhamkulkarni97 @Alvin1Zhang it's happening with simple_ota_example too https://github.com/espressif/esp-idf/tree/master/examples/system/ota/simple_ota_example

just changing Wifi credentials and OTA URL in IDF v4.2-dev-1856-g00148cd0c-dirty

Here are the files including sdkconfig, I kept the OTA link with a hello_word bin, you can use it if you want simple_ota_example.zip

Logs here:

rst:0x1 (POWERON_RESET),boot:0x37 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4
load:0x3fff0034,len:7216
ho 0 tail 12 room 4
load:0x40078000,len:13696
load:0x40080400,len:4000
0x40080400: _init at ??:?

entry 0x40080688
I (31) boot: ESP-IDF v4.2-dev-1856-g00148cd0c-dirty 2nd stage bootloader
I (31) boot: compile time 12:33:34
I (32) boot: chip revision: 3
I (36) boot_comm: chip revision: 3, min. bootloader chip revision: 0
I (57) boot.esp32: SPI Speed      : 40MHz
I (57) boot.esp32: SPI Mode       : DIO
I (57) boot.esp32: SPI Flash Size : 4MB
I (60) boot: Enabling RNG early entropy source...
I (65) boot: Partition Table:
I (69) boot: ## Label            Usage          Type ST Offset   Length
I (76) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (83) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (91) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (98) boot:  3 factory          factory app      00 00 00010000 00100000
I (106) boot:  4 ota_0            OTA app          00 10 00110000 00100000
I (113) boot:  5 ota_1            OTA app          00 11 00210000 00100000
I (121) boot: End of partition table
I (125) boot: Defaulting to factory image
I (130) boot_comm: chip revision: 3, min. application chip revision: 0
I (137) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x1fbb4 (129972) map
I (195) esp_image: segment 1: paddr=0x0002fbdc vaddr=0x3ffb0000 size=0x0043c (  1084) load
I (196) esp_image: segment 2: paddr=0x00030020 vaddr=0x400d0020 size=0x8f22c (586284) map
0x400d0020: _stext at ??:?

I (425) esp_image: segment 3: paddr=0x000bf254 vaddr=0x3ffb043c size=0x03554 ( 13652) load
I (431) esp_image: segment 4: paddr=0x000c27b0 vaddr=0x40080000 size=0x00404 (  1028) load
0x40080000: _WindowOverflow4 at C:/Users/Desk/Desktop/esp32/esp-idf-v4.2/components/freertos/xtensa/xtensa_vectors.S:1730

I (432) esp_image: segment 5: paddr=0x000c2bbc vaddr=0x40080404 size=0x159c8 ( 88520) load
I (491) boot: Loaded app from partition at offset 0x10000
I (491) boot: Disabling RNG early entropy source...
I (492) cpu_start: Pro cpu up.
I (495) cpu_start: Application information:
I (500) cpu_start: Project name:     simple_ota
I (505) cpu_start: App version:      1
I (510) cpu_start: Compile time:     Aug 21 2020 12:33:21
I (516) cpu_start: ELF file SHA256:  2e538721fcdaa367...
I (522) cpu_start: ESP-IDF:          v4.2-dev-1856-g00148cd0c-dirty
I (529) cpu_start: Starting app cpu, entry point is 0x400818bc
0x400818bc: call_start_cpu1 at C:/Users/Desk/Desktop/esp32/esp-idf-v4.2/components/esp32/cpu_start.c:282

I (0) cpu_start: App cpu up.
I (539) heap_init: Initializing. RAM available for dynamic allocation:
I (546) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (552) heap_init: At 3FFB9418 len 00026BE8 (154 KiB): DRAM
I (558) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (565) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (571) heap_init: At 40095DCC len 0000A234 (40 KiB): IRAM
I (577) cpu_start: Pro cpu start user code
I (596) spi_flash: detected chip: generic
I (596) spi_flash: flash io: dio
W (596) spi_flash: Detected size(8192k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
I (607) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (753) wifi:wifi driver task: 3ffc18e8, prio:23, stack:6656, core=0
I (753) system_api: Base MAC address is not set
I (753) system_api: read default base MAC address from EFUSE
I (773) wifi:wifi firmware version: 10db11f
I (773) wifi:wifi certification version: v7.0
I (773) wifi:config NVS flash: enabled
I (773) wifi:config nano formating: disabled
I (783) wifi:Init data frame dynamic rx buffer num: 32
I (783) wifi:Init management frame dynamic rx buffer num: 32
I (793) wifi:Init management short buffer num: 32
I (793) wifi:Init dynamic tx buffer num: 32
I (803) wifi:Init static rx buffer size: 1600
I (803) wifi:Init static rx buffer num: 10
I (803) wifi:Init dynamic rx buffer num: 32
I (813) example_connect: Connecting to --hidden--...
I (933) phy: phy_version: 4182, f1ba940, Jun  4 2020, 19:40:07, 0, 0
I (933) wifi:mode : sta (24:0a:c4:e3:2a:b8)
I (933) example_connect: Waiting for IP(s)
I (1053) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (1833) wifi:state: init -> auth (b0)
I (1833) wifi:state: auth -> init (cf00)
I (1833) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (1833) wifi:new:<1,1>, old:<1,0>, ap:<255,255>, sta:<1,1>, prof:1
I (1843) wifi:state: init -> auth (b0)
I (1843) wifi:state: auth -> assoc (0)
I (1853) wifi:state: assoc -> run (10)
I (1863) wifi:connected with --hidden--, aid = 4, channel 1, 40U, bssid = --hidden--
I (1863) wifi:security: WPA2-PSK, phy: bgn, rssi: -54
I (1863) wifi:pm start, type: 1

I (1923) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (2743) esp_netif_handlers: example_connect: sta ip: 192.168.0.10, mask: 255.255.255.0, gw: 192.168.0.1
I (2743) example_connect: Got IPv4 event: Interface "example_connect: sta" address: 192.168.0.10
I (3743) example_connect: Got IPv6 event: Interface "example_connect: sta" address: fe80:0000:0000:0000:260a:c4ff:fee3:2ab8, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (3743) example_connect: Connected to example_connect: sta
I (3753) example_connect: - IPv4 address: 192.168.0.10
I (3763) example_connect: - IPv6 address: fe80:0000:0000:0000:260a:c4ff:fee3:2ab8, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (3773) wifi:Set ps type: 0

I (3773) simple_ota_example: Starting OTA example
I (3823) esp_https_ota: Starting OTA...
I (3823) esp_https_ota: Writing to partition subtype 16 at offset 0x110000
I (4553) wifi:state: run -> init (cf00)
I (4553) wifi:pm stop, total sleep time: 1214675 us / 2685964 us

I (4553) wifi:new:<1,0>, old:<1,1>, ap:<255,255>, sta:<1,1>, prof:1
I (4923) example_connect: Wi-Fi disconnected, trying to reconnect...
I (5293) wifi:state: init -> init (cf00)
I (5293) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (5293) wifi:state: init -> auth (b0)
I (5293) wifi:state: auth -> assoc (0)
I (5663) wifi:state: assoc -> run (10)
I (6033) wifi:state: run -> init (cf00)
I (6033) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (6033) wifi:new:<1,1>, old:<1,0>, ap:<255,255>, sta:<1,1>, prof:1
I (6043) wifi:state: init -> auth (b0)
I (6413) wifi:state: auth -> assoc (0)
I (6783) wifi:state: assoc -> run (10)
I (7153) wifi:connected with --hidden--, aid = 4, channel 1, 40U, bssid = --hidden--
I (7153) wifi:security: WPA2-PSK, phy: bgn, rssi: -54
I (7153) wifi:pm start, type: 0

I (7523) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (7523) wifi:state: run -> init (cf00)
I (7523) wifi:pm stop, total sleep time: 0 us / 366053 us

I (7523) wifi:new:<1,0>, old:<1,1>, ap:<255,255>, sta:<1,1>, prof:1
I (7893) example_connect: Wi-Fi disconnected, trying to reconnect...
I (8253) wifi:state: init -> init (cf00)
I (8253) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (8253) wifi:state: init -> auth (b0)
I (8623) wifi:state: auth -> assoc (0)
I (8623) wifi:state: assoc -> init (cf00)
I (8623) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (8633) wifi:new:<1,1>, old:<1,0>, ap:<255,255>, sta:<1,1>, prof:1
I (8993) wifi:state: init -> auth (b0)
I (9363) wifi:state: auth -> assoc (0)
I (9363) wifi:state: assoc -> run (10)
E (9753) TRANS_TCP: tcp_poll_read select error 113, errno = Software caused connection abort, fd = 54
E (9753) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (9763) esp_image: segment 0: paddr=0x00110020 vaddr=0x3f400020 size=0x05b58 ( 23384) map
I (9763) wifi:connected with --hidden--, aid = 4, channel 1, 40U, bssid = --hidden--
I (9773) wifi:security: WPA2-PSK, phy: bgn, rssi: -53
E (9773) esp_image: invalid segment length 0xffffffff
I (9783) wifi:pm start, type: 0
E (9783) simple_ota_example: Firmware upgrade failed

I (9803) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (9803) wifi:state: run -> init (cf00)
I (9813) wifi:pm stop, total sleep time: 0 us / 15966 us

I (9813) wifi:new:<1,0>, old:<1,1>, ap:<255,255>, sta:<1,1>, prof:1
I (9813) example_connect: Wi-Fi disconnected, trying to reconnect...
I (9943) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (9943) wifi:state: init -> auth (b0)
I (9943) wifi:state: auth -> assoc (0)
I (9953) wifi:state: assoc -> run (10)
I (9973) wifi:connected with --hidden--, aid = 7, channel 1, BW20, bssid = --hidden--
I (9973) wifi:security: WPA2-PSK, phy: bgn, rssi: -16
I (9973) wifi:pm start, type: 0

I (9983) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (10743) esp_netif_handlers: example_connect: sta ip: 192.168.0.10, mask: 255.255.255.0, gw: 192.168.0.1
I (10743) example_connect: Got IPv4 event: Interface "example_connect: sta" address: 192.168.0.10
I (11743) example_connect: Got IPv6 event: Interface "example_connect: sta" address: fe80:0000:0000:0000:260a:c4ff:fee3:2ab8, type: ESP_IP6_ADDR_IS_LINK_LOCAL

note that right after I (3823) esp_https_ota: Writing to partition subtype 16 at offset 0x110000 the WiFi STA connection drops

@copercini
Copy link

copercini commented Aug 21, 2020

This problem doesn't happen in an esp32 dev kit (using same code, sdkconfig, IDF version, PC, everything), just in my custom board ESP32 D0WD V3, 8MB of flash, PSRAM (present in hardware but not enabled)

EDIT: it's also happening in esp32 dev kit (ESP32D0WDQ6 revision 0), but somehow in the first run after an erase_flash it works, after that doesn't

@shubhamkulkarni97
Copy link
Contributor

Hello @copercini, thanks for the update.

I could reproduce the issue on mentioned IDF version, however it is fixed in 4761c40.

@dreske @copercini, updating your branch with latest changes should fix the issue.

Thanks,
Shubham

@shubhamkulkarni97
Copy link
Contributor

@dreske @copercini Any updates on the issue?

@dreske
Copy link
Author

dreske commented Sep 19, 2020

Looks good after some quick tests. Thank you!

@Alvin1Zhang
Copy link
Collaborator

Good to hear the issue fixed. Feel free to reopen. Thanks.

@chrismerck
Copy link

chrismerck commented Mar 12, 2021

I could reproduce the issue on mentioned IDF version, however it is fixed in 4761c40. [release/v4.2]

Yikes, has this fix been backported to IDF v3?

A new ESP32-WROVER-E product we are launching takes 14+ seconds for esp_ota_begin. Sometimes it even triggers the task watchdog triggers shortly esp_ota_begin returns.

We are on IDF v3.3.4.

Screen Shot 2021-03-12 at 11 29 19

@Alvin1Zhang
Copy link
Collaborator

@chrismerck Thanks for sharing the updates. After preliminary analysis, seems what you shared may be a new issue, would you please help create a new ticket per the issue template? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response awaiting a response from the author
Projects
None yet
Development

No branches or pull requests

5 participants