Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#17597] OTA updates failing randomly inside spi_flash_write_inner #1479

Closed
MrSurly opened this issue Jan 8, 2018 · 23 comments
Closed

[TW#17597] OTA updates failing randomly inside spi_flash_write_inner #1479

MrSurly opened this issue Jan 8, 2018 · 23 comments

Comments

@MrSurly
Copy link

MrSurly commented Jan 8, 2018

EDIT 3: esp_ota_begin sometimes takes a few seconds to return. Sometimes, it never returns, and a WDT reboot occurs.
EDIT 3.5: Increasing WDT timeout to 10s (per #578) fixes this the esp_ota_begin issue, but not the main issue outlined below.
EDIT 2: Increasing IPC stack size does not help.
EDIT 1: Calling esp_ota_write with 4096 bytes (full flash sector) seems to help, a bit. It now sometimes completes. Probably because this results in fewer calls.

Having a lot of trouble with OTA updates.

After doing a bit of digging, it appears that spi_flash_write_inner is not returning properly from where it's called from flash_ops.c:
https://github.com/espressif/esp-idf/blob/master/components/spi_flash/flash_ops.c#L375

Line 377 is never reached.

I've updated to the latest IDF (hash f9ad17e), and the IDF submodules are up-to-date as well. Toolchain is: xtensa-esp32-elf-gcc (crosstool-NG crosstool-ng-1.22.0-75-gbaf03c2) 5.2.0

Doesn't seem to matter if SPI_FLASH_VERIFY_WRITE (and setting all suboptions) is set or not. Unsetting CONFIG_SPI_FLASH_ROM_DRIVER_PATCH seems to make it worse, when usin

When it freezes, the serial monitor shows:

Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: ipc1
Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: ipc1

Relevant code section is below. esp_ota_write fails to return.

    bool downloadUpdate(const string& url, string updateHash) {
        const char * TAG = "update";

        const esp_partition_t *configured = esp_ota_get_boot_partition();
        const esp_partition_t *running = esp_ota_get_running_partition();
        const esp_partition_t *update_partition = esp_ota_get_next_update_partition(NULL);

        if (configured != NULL ) {
            L(TAG, "Configured for partition subtype %d at offset 0x%x", configured->subtype, update_partition->address);
        } else {
            L(TAG, "Configured partition is NULL");
        }

        if (running != NULL) {

            L(TAG, "Running from partition subtype %d at offset 0x%x", running->subtype, update_partition->address); 
        } else {
            L(TAG, "Running partition is NULL");
        }

        if(update_partition != NULL) {
            L(TAG, "Writing to partition subtype %d at offset 0x%x", update_partition->subtype, update_partition->address);

        } else {
            L(TAG, "Running partition is NULL");
        }
        esp_ota_handle_t update_handle;
        L(TAG, "update begin");
        if (esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &update_handle) == ESP_OK) {
            L(TAG, "after update begin");

            Http http("<redacted>/" VERSION_STRING, ROOT_CERT);
            int result = http.get(url);
            if (result == 200) {


                Sha256 sha256;
                uint8_t buf [1024];
                int read;
                int ttl_read = 0;
                int timeout = 5000;
                elapsedMillis printTimer = 3000;

                while (true) {
                    read = http.read(buf, sizeof(buf), timeout);

                    if (read == 0) {
                        break;
                    }

                    if ( read < 0 ) {
                        if (errno == EAGAIN) {
                            L(TAG, "read timeout");
                            continue;
                        }
                        L(TAG,"read error, aborting");
                        return false;
                    }

                    ttl_read += read;
                    sha256.update(buf, read);
                    L(TAG, "> esp_ota_write %d", read);
                    if (esp_ota_write(update_handle, buf, read) != ESP_OK) {
                        L(TAG,"ota_esp_write failed");
                        return false;
                    }
                    L(TAG, "< esp_ota_write");
                    if (printTimer > 100) {
                        L(TAG, "downloaded %d bytes", ttl_read);
                        printTimer = 0;
                    }
                }
                L(TAG, "downloaded %d bytes total", ttl_read);
                if (esp_ota_end(update_handle) != ESP_OK) {
                    L(TAG,"esp_ota_end() failed");
                    return false;
                }

                uint8_t computedHash[65]; 
                memset(computedHash, 0, sizeof(computedHash));
                sha256.finish(computedHash);

                for (std::string::iterator i = updateHash.begin(); i != updateHash.end(); ++i) {
                    *i = toupper(*i);
                }

                for (int i = 31; i >= 0; --i) {
                    uint8_t b = computedHash[i];
                    computedHash[i * 2] = HEXDIGIT(b >> 4);
                    computedHash[i * 2 + 1] = HEXDIGIT(b & 0xF);
                }
                L(TAG, "computed hash %s", computedHash);

                if (updateHash != (char *)computedHash) {
                    L(TAG, "computed hash of %s does not equal update hash of %s, aborting", computedHash, updateHash.c_str());
                    return false;
                }

                L(TAG, "Updating bootable partition");
                if (esp_ota_set_boot_partition(update_partition, 3) != ESP_OK) {
                    L(TAG,"esp_ota_set_boot_partition() failed");
                    return false;
                }
            } else {
                L(TAG,"HTTP GET failed");
                return false;
            }
        } else {
            L(TAG, "esp_ota_begin() returned an error");
            return false;
        }

        L(TAG, "OTA update successful");
        esp_ota_boot_count_op(OTA_BOOT_COUNT_ZERO, NULL);
        return true;

    }

sdkconfig:

#
# Automatically generated file; DO NOT EDIT.
# Espressif IoT Development Framework Configuration
#

#
# SDK tool configuration
#
CONFIG_TOOLPREFIX="xtensa-esp32-elf-"
CONFIG_PYTHON="python"
CONFIG_MAKE_WARN_UNDEFINED_VARIABLES=y

#
# Bootloader config
#
CONFIG_LOG_BOOTLOADER_LEVEL_NONE=
CONFIG_LOG_BOOTLOADER_LEVEL_ERROR=
CONFIG_LOG_BOOTLOADER_LEVEL_WARN=
CONFIG_LOG_BOOTLOADER_LEVEL_INFO=y
CONFIG_LOG_BOOTLOADER_LEVEL_DEBUG=
CONFIG_LOG_BOOTLOADER_LEVEL_VERBOSE=
CONFIG_LOG_BOOTLOADER_LEVEL=3
CONFIG_BOOTLOADER_VDDSDIO_BOOST=y

#
# Security features
#
CONFIG_SECURE_BOOT_ENABLED=
CONFIG_FLASH_ENCRYPTION_ENABLED=

#
# Serial flasher config
#
CONFIG_ESPTOOLPY_PORT="/dev/ttyUSB0"
CONFIG_ESPTOOLPY_BAUD_115200B=
CONFIG_ESPTOOLPY_BAUD_230400B=
CONFIG_ESPTOOLPY_BAUD_921600B=
CONFIG_ESPTOOLPY_BAUD_2MB=y
CONFIG_ESPTOOLPY_BAUD_OTHER=
CONFIG_ESPTOOLPY_BAUD_OTHER_VAL=115200
CONFIG_ESPTOOLPY_BAUD=2000000
CONFIG_ESPTOOLPY_COMPRESSED=y
CONFIG_FLASHMODE_QIO=
CONFIG_FLASHMODE_QOUT=
CONFIG_FLASHMODE_DIO=y
CONFIG_FLASHMODE_DOUT=
CONFIG_ESPTOOLPY_FLASHMODE="dio"
CONFIG_ESPTOOLPY_FLASHFREQ_80M=
CONFIG_ESPTOOLPY_FLASHFREQ_40M=y
CONFIG_ESPTOOLPY_FLASHFREQ_26M=
CONFIG_ESPTOOLPY_FLASHFREQ_20M=
CONFIG_ESPTOOLPY_FLASHFREQ="40m"
CONFIG_ESPTOOLPY_FLASHSIZE_1MB=
CONFIG_ESPTOOLPY_FLASHSIZE_2MB=y
CONFIG_ESPTOOLPY_FLASHSIZE_4MB=
CONFIG_ESPTOOLPY_FLASHSIZE_8MB=
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=
CONFIG_ESPTOOLPY_FLASHSIZE="2MB"
CONFIG_ESPTOOLPY_FLASHSIZE_DETECT=y
CONFIG_ESPTOOLPY_BEFORE_RESET=y
CONFIG_ESPTOOLPY_BEFORE_NORESET=
CONFIG_ESPTOOLPY_BEFORE="default_reset"
CONFIG_ESPTOOLPY_AFTER_RESET=y
CONFIG_ESPTOOLPY_AFTER_NORESET=
CONFIG_ESPTOOLPY_AFTER="hard_reset"
CONFIG_MONITOR_BAUD_9600B=
CONFIG_MONITOR_BAUD_57600B=
CONFIG_MONITOR_BAUD_115200B=y
CONFIG_MONITOR_BAUD_230400B=
CONFIG_MONITOR_BAUD_921600B=
CONFIG_MONITOR_BAUD_2MB=
CONFIG_MONITOR_BAUD_OTHER=
CONFIG_MONITOR_BAUD_OTHER_VAL=115200
CONFIG_MONITOR_BAUD=115200

#
# Partition Table
#
CONFIG_PARTITION_TABLE_SINGLE_APP=
CONFIG_PARTITION_TABLE_TWO_OTA=
CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions.csv"
CONFIG_PARTITION_TABLE_CUSTOM_APP_BIN_OFFSET=0x10000
CONFIG_PARTITION_TABLE_FILENAME="partitions.csv"
CONFIG_APP_OFFSET=0x10000

#
# Compiler options
#
CONFIG_OPTIMIZATION_LEVEL_DEBUG=y
CONFIG_OPTIMIZATION_LEVEL_RELEASE=
CONFIG_OPTIMIZATION_ASSERTIONS_ENABLED=y
CONFIG_OPTIMIZATION_ASSERTIONS_SILENT=
CONFIG_OPTIMIZATION_ASSERTIONS_DISABLED=
CONFIG_CXX_EXCEPTIONS=
CONFIG_STACK_CHECK_NONE=y
CONFIG_STACK_CHECK_NORM=
CONFIG_STACK_CHECK_STRONG=
CONFIG_STACK_CHECK_ALL=
CONFIG_STACK_CHECK=

#
# Component config
#

#
# Application Level Tracing
#
CONFIG_ESP32_APPTRACE_DEST_TRAX=
CONFIG_ESP32_APPTRACE_DEST_NONE=y
CONFIG_ESP32_APPTRACE_ENABLE=
CONFIG_ESP32_APPTRACE_LOCK_ENABLE=y

#
# FreeRTOS SystemView Tracing
#
CONFIG_AWS_IOT_SDK=y
CONFIG_AWS_IOT_MQTT_HOST=""
CONFIG_AWS_IOT_MQTT_PORT=8883
CONFIG_AWS_IOT_MQTT_TX_BUF_LEN=1024
CONFIG_AWS_IOT_MQTT_RX_BUF_LEN=3192
CONFIG_AWS_IOT_MQTT_NUM_SUBSCRIBE_HANDLERS=2
CONFIG_AWS_IOT_MQTT_MIN_RECONNECT_WAIT_INTERVAL=1000
CONFIG_AWS_IOT_MQTT_MAX_RECONNECT_WAIT_INTERVAL=128000

#
# Bluetooth
#
CONFIG_BT_ENABLED=y
CONFIG_BTDM_CONTROLLER_PINNED_TO_CORE_0=y
CONFIG_BTDM_CONTROLLER_PINNED_TO_CORE_1=
CONFIG_BTDM_CONTROLLER_PINNED_TO_CORE=0
CONFIG_BTDM_CONTROLLER_HCI_MODE_VHCI=y
CONFIG_BTDM_CONTROLLER_HCI_MODE_UART_H4=
CONFIG_BLUEDROID_ENABLED=y
CONFIG_BLUEDROID_PINNED_TO_CORE_0=y
CONFIG_BLUEDROID_PINNED_TO_CORE_1=
CONFIG_BLUEDROID_PINNED_TO_CORE=0
CONFIG_BTC_TASK_STACK_SIZE=3072
CONFIG_BLUEDROID_MEM_DEBUG=
CONFIG_CLASSIC_BT_ENABLED=
CONFIG_GATTS_ENABLE=
CONFIG_GATTC_ENABLE=
CONFIG_BLE_SMP_ENABLE=
CONFIG_BT_STACK_NO_LOG=
CONFIG_BT_ACL_CONNECTIONS=4
CONFIG_SMP_ENABLE=
CONFIG_BT_RESERVE_DRAM=0x10000

#
# ESP32-specific
#
CONFIG_ESP32_DEFAULT_CPU_FREQ_80=
CONFIG_ESP32_DEFAULT_CPU_FREQ_160=y
CONFIG_ESP32_DEFAULT_CPU_FREQ_240=
CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ=160
CONFIG_MEMMAP_SMP=y
CONFIG_SPIRAM_SUPPORT=
CONFIG_MEMMAP_TRACEMEM=
CONFIG_MEMMAP_TRACEMEM_TWOBANKS=
CONFIG_ESP32_TRAX=
CONFIG_TRACEMEM_RESERVE_DRAM=0x0
CONFIG_ESP32_ENABLE_COREDUMP_TO_FLASH=
CONFIG_ESP32_ENABLE_COREDUMP_TO_UART=
CONFIG_ESP32_ENABLE_COREDUMP_TO_NONE=y
CONFIG_ESP32_ENABLE_COREDUMP=
CONFIG_TWO_UNIVERSAL_MAC_ADDRESS=
CONFIG_FOUR_UNIVERSAL_MAC_ADDRESS=y
CONFIG_NUMBER_OF_UNIVERSAL_MAC_ADDRESS=4
CONFIG_SYSTEM_EVENT_QUEUE_SIZE=32
CONFIG_SYSTEM_EVENT_TASK_STACK_SIZE=4096
CONFIG_MAIN_TASK_STACK_SIZE=1024
CONFIG_IPC_TASK_STACK_SIZE=1024
CONFIG_TIMER_TASK_STACK_SIZE=4096
CONFIG_NEWLIB_STDOUT_LINE_ENDING_CRLF=y
CONFIG_NEWLIB_STDOUT_LINE_ENDING_LF=
CONFIG_NEWLIB_STDOUT_LINE_ENDING_CR=
CONFIG_NEWLIB_STDIN_LINE_ENDING_CRLF=
CONFIG_NEWLIB_STDIN_LINE_ENDING_LF=
CONFIG_NEWLIB_STDIN_LINE_ENDING_CR=y
CONFIG_NEWLIB_NANO_FORMAT=
CONFIG_CONSOLE_UART_DEFAULT=y
CONFIG_CONSOLE_UART_CUSTOM=
CONFIG_CONSOLE_UART_NONE=
CONFIG_CONSOLE_UART_NUM=0
CONFIG_CONSOLE_UART_BAUDRATE=115200
CONFIG_ULP_COPROC_ENABLED=
CONFIG_ULP_COPROC_RESERVE_MEM=0
CONFIG_ESP32_PANIC_PRINT_HALT=
CONFIG_ESP32_PANIC_PRINT_REBOOT=y
CONFIG_ESP32_PANIC_SILENT_REBOOT=
CONFIG_ESP32_PANIC_GDBSTUB=
CONFIG_ESP32_DEBUG_OCDAWARE=y
CONFIG_INT_WDT=y
CONFIG_INT_WDT_TIMEOUT_MS=300
CONFIG_INT_WDT_CHECK_CPU1=y
CONFIG_TASK_WDT=y
CONFIG_TASK_WDT_PANIC=
CONFIG_TASK_WDT_TIMEOUT_S=5
CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU0=y
CONFIG_TASK_WDT_CHECK_IDLE_TASK_CPU1=y
CONFIG_BROWNOUT_DET=y
CONFIG_BROWNOUT_DET_LVL_SEL_0=y
CONFIG_BROWNOUT_DET_LVL_SEL_1=
CONFIG_BROWNOUT_DET_LVL_SEL_2=
CONFIG_BROWNOUT_DET_LVL_SEL_3=
CONFIG_BROWNOUT_DET_LVL_SEL_4=
CONFIG_BROWNOUT_DET_LVL_SEL_5=
CONFIG_BROWNOUT_DET_LVL_SEL_6=
CONFIG_BROWNOUT_DET_LVL_SEL_7=
CONFIG_BROWNOUT_DET_LVL=0
CONFIG_ESP32_TIME_SYSCALL_USE_RTC_FRC1=y
CONFIG_ESP32_TIME_SYSCALL_USE_RTC=
CONFIG_ESP32_TIME_SYSCALL_USE_FRC1=
CONFIG_ESP32_TIME_SYSCALL_USE_NONE=
CONFIG_ESP32_RTC_CLOCK_SOURCE_INTERNAL_RC=y
CONFIG_ESP32_RTC_CLOCK_SOURCE_EXTERNAL_CRYSTAL=
CONFIG_ESP32_RTC_CLK_CAL_CYCLES=1024
CONFIG_ESP32_DEEP_SLEEP_WAKEUP_DELAY=2000
CONFIG_ESP32_XTAL_FREQ_40=y
CONFIG_ESP32_XTAL_FREQ_26=
CONFIG_ESP32_XTAL_FREQ_AUTO=
CONFIG_ESP32_XTAL_FREQ=40
CONFIG_DISABLE_BASIC_ROM_CONSOLE=
CONFIG_ESP_TIMER_PROFILING=
CONFIG_COMPATIBLE_PRE_V2_1_BOOTLOADERS=

#
# Wi-Fi
#
CONFIG_SW_COEXIST_ENABLE=
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=10
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM=32
CONFIG_ESP32_WIFI_STATIC_TX_BUFFER=
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER=y
CONFIG_ESP32_WIFI_TX_BUFFER_TYPE=1
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER_NUM=32
CONFIG_ESP32_WIFI_AMPDU_TX_ENABLED=y
CONFIG_ESP32_WIFI_TX_BA_WIN=6
CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED=y
CONFIG_ESP32_WIFI_RX_BA_WIN=6
CONFIG_ESP32_WIFI_NVS_ENABLED=y

#
# PHY
#
CONFIG_ESP32_PHY_CALIBRATION_AND_DATA_STORAGE=y
CONFIG_ESP32_PHY_INIT_DATA_IN_PARTITION=
CONFIG_ESP32_PHY_MAX_WIFI_TX_POWER=20
CONFIG_ESP32_PHY_MAX_TX_POWER=20

#
# Power Management
#
CONFIG_PM_ENABLE=

#
# Ethernet
#
CONFIG_DMA_RX_BUF_NUM=10
CONFIG_DMA_TX_BUF_NUM=10
CONFIG_EMAC_L2_TO_L3_RX_BUF_MODE=
CONFIG_EMAC_TASK_PRIORITY=20

#
# FAT Filesystem support
#
CONFIG_FATFS_CODEPAGE_DYNAMIC=
CONFIG_FATFS_CODEPAGE_437=y
CONFIG_FATFS_CODEPAGE_720=
CONFIG_FATFS_CODEPAGE_737=
CONFIG_FATFS_CODEPAGE_771=
CONFIG_FATFS_CODEPAGE_775=
CONFIG_FATFS_CODEPAGE_850=
CONFIG_FATFS_CODEPAGE_852=
CONFIG_FATFS_CODEPAGE_855=
CONFIG_FATFS_CODEPAGE_857=
CONFIG_FATFS_CODEPAGE_860=
CONFIG_FATFS_CODEPAGE_861=
CONFIG_FATFS_CODEPAGE_862=
CONFIG_FATFS_CODEPAGE_863=
CONFIG_FATFS_CODEPAGE_864=
CONFIG_FATFS_CODEPAGE_865=
CONFIG_FATFS_CODEPAGE_866=
CONFIG_FATFS_CODEPAGE_869=
CONFIG_FATFS_CODEPAGE_932=
CONFIG_FATFS_CODEPAGE_936=
CONFIG_FATFS_CODEPAGE_949=
CONFIG_FATFS_CODEPAGE_950=
CONFIG_FATFS_CODEPAGE=437
CONFIG_FATFS_LFN_NONE=y
CONFIG_FATFS_LFN_HEAP=
CONFIG_FATFS_LFN_STACK=
CONFIG_FATFS_FS_LOCK=0
CONFIG_FATFS_TIMEOUT_MS=10000
CONFIG_FATFS_PER_FILE_CACHE=y

#
# FreeRTOS
#
CONFIG_FREERTOS_UNICORE=
CONFIG_FREERTOS_CORETIMER_0=y
CONFIG_FREERTOS_CORETIMER_1=
CONFIG_FREERTOS_HZ=100
CONFIG_FREERTOS_ASSERT_ON_UNTESTED_FUNCTION=y
CONFIG_FREERTOS_CHECK_STACKOVERFLOW_NONE=
CONFIG_FREERTOS_CHECK_STACKOVERFLOW_PTRVAL=
CONFIG_FREERTOS_CHECK_STACKOVERFLOW_CANARY=y
CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK=y
CONFIG_FREERTOS_INTERRUPT_BACKTRACE=y
CONFIG_FREERTOS_THREAD_LOCAL_STORAGE_POINTERS=1
CONFIG_FREERTOS_ASSERT_FAIL_ABORT=y
CONFIG_FREERTOS_ASSERT_FAIL_PRINT_CONTINUE=
CONFIG_FREERTOS_ASSERT_DISABLE=
CONFIG_FREERTOS_IDLE_TASK_STACKSIZE=1024
CONFIG_FREERTOS_ISR_STACKSIZE=1536
CONFIG_FREERTOS_LEGACY_HOOKS=
CONFIG_FREERTOS_MAX_TASK_NAME_LEN=16
CONFIG_SUPPORT_STATIC_ALLOCATION=
CONFIG_TIMER_TASK_PRIORITY=1
CONFIG_TIMER_TASK_STACK_DEPTH=2048
CONFIG_TIMER_QUEUE_LENGTH=10
CONFIG_FREERTOS_QUEUE_REGISTRY_SIZE=0
CONFIG_FREERTOS_USE_TRACE_FACILITY=
CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS=
CONFIG_FREERTOS_DEBUG_INTERNALS=

#
# Heap memory debugging
#
CONFIG_HEAP_POISONING_DISABLED=y
CONFIG_HEAP_POISONING_LIGHT=
CONFIG_HEAP_POISONING_COMPREHENSIVE=
CONFIG_HEAP_TRACING=

#
# libsodium
#
CONFIG_LIBSODIUM_USE_MBEDTLS_SHA=y

#
# Log output
#
CONFIG_LOG_DEFAULT_LEVEL_NONE=
CONFIG_LOG_DEFAULT_LEVEL_ERROR=y
CONFIG_LOG_DEFAULT_LEVEL_WARN=
CONFIG_LOG_DEFAULT_LEVEL_INFO=
CONFIG_LOG_DEFAULT_LEVEL_DEBUG=
CONFIG_LOG_DEFAULT_LEVEL_VERBOSE=
CONFIG_LOG_DEFAULT_LEVEL=1
CONFIG_LOG_COLORS=y

#
# LWIP
#
CONFIG_L2_TO_L3_COPY=
CONFIG_LWIP_MAX_SOCKETS=10
CONFIG_LWIP_SO_REUSE=y
CONFIG_LWIP_SO_REUSE_RXTOALL=y
CONFIG_LWIP_SO_RCVBUF=
CONFIG_LWIP_DHCP_MAX_NTP_SERVERS=1
CONFIG_LWIP_IP_FRAG=
CONFIG_LWIP_IP_REASSEMBLY=
CONFIG_LWIP_STATS=
CONFIG_LWIP_ETHARP_TRUST_IP_MAC=y
CONFIG_TCPIP_RECVMBOX_SIZE=32
CONFIG_LWIP_DHCP_DOES_ARP_CHECK=y

#
# DHCP server
#
CONFIG_LWIP_DHCPS_LEASE_UNIT=60
CONFIG_LWIP_DHCPS_MAX_STATION_NUM=8
CONFIG_LWIP_AUTOIP=
CONFIG_LWIP_NETIF_LOOPBACK=y
CONFIG_LWIP_LOOPBACK_MAX_PBUFS=8

#
# TCP
#
CONFIG_LWIP_MAX_ACTIVE_TCP=16
CONFIG_LWIP_MAX_LISTENING_TCP=16
CONFIG_TCP_MAXRTX=12
CONFIG_TCP_SYNMAXRTX=6
CONFIG_TCP_MSS=1436
CONFIG_TCP_MSL=60000
CONFIG_TCP_SND_BUF_DEFAULT=5744
CONFIG_TCP_WND_DEFAULT=5744
CONFIG_TCP_RECVMBOX_SIZE=6
CONFIG_TCP_QUEUE_OOSEQ=y
CONFIG_TCP_OVERSIZE_MSS=y
CONFIG_TCP_OVERSIZE_QUARTER_MSS=
CONFIG_TCP_OVERSIZE_DISABLE=

#
# UDP
#
CONFIG_LWIP_MAX_UDP_PCBS=16
CONFIG_UDP_RECVMBOX_SIZE=6
CONFIG_TCPIP_TASK_STACK_SIZE=2560
CONFIG_PPP_SUPPORT=

#
# ICMP
#
CONFIG_LWIP_MULTICAST_PING=
CONFIG_LWIP_BROADCAST_PING=

#
# LWIP RAW API
#
CONFIG_LWIP_MAX_RAW_PCBS=16

#
# mbedTLS
#
CONFIG_MBEDTLS_SSL_MAX_CONTENT_LEN=16384
CONFIG_MBEDTLS_DEBUG=
CONFIG_MBEDTLS_HARDWARE_AES=y
CONFIG_MBEDTLS_HARDWARE_MPI=
CONFIG_MBEDTLS_HARDWARE_SHA=
CONFIG_MBEDTLS_HAVE_TIME=y
CONFIG_MBEDTLS_HAVE_TIME_DATE=
CONFIG_MBEDTLS_TLS_SERVER_AND_CLIENT=y
CONFIG_MBEDTLS_TLS_SERVER_ONLY=
CONFIG_MBEDTLS_TLS_CLIENT_ONLY=
CONFIG_MBEDTLS_TLS_DISABLED=
CONFIG_MBEDTLS_TLS_SERVER=y
CONFIG_MBEDTLS_TLS_CLIENT=y
CONFIG_MBEDTLS_TLS_ENABLED=y

#
# TLS Key Exchange Methods
#
CONFIG_MBEDTLS_PSK_MODES=
CONFIG_MBEDTLS_KEY_EXCHANGE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_DHE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ELLIPTIC_CURVE=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_RSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDHE_ECDSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDH_ECDSA=y
CONFIG_MBEDTLS_KEY_EXCHANGE_ECDH_RSA=y
CONFIG_MBEDTLS_SSL_RENEGOTIATION=y
CONFIG_MBEDTLS_SSL_PROTO_SSL3=
CONFIG_MBEDTLS_SSL_PROTO_TLS1=y
CONFIG_MBEDTLS_SSL_PROTO_TLS1_1=y
CONFIG_MBEDTLS_SSL_PROTO_TLS1_2=y
CONFIG_MBEDTLS_SSL_PROTO_DTLS=
CONFIG_MBEDTLS_SSL_ALPN=y
CONFIG_MBEDTLS_SSL_SESSION_TICKETS=y

#
# Symmetric Ciphers
#
CONFIG_MBEDTLS_AES_C=y
CONFIG_MBEDTLS_CAMELLIA_C=
CONFIG_MBEDTLS_DES_C=
CONFIG_MBEDTLS_RC4_DISABLED=y
CONFIG_MBEDTLS_RC4_ENABLED_NO_DEFAULT=
CONFIG_MBEDTLS_RC4_ENABLED=
CONFIG_MBEDTLS_BLOWFISH_C=
CONFIG_MBEDTLS_XTEA_C=
CONFIG_MBEDTLS_CCM_C=y
CONFIG_MBEDTLS_GCM_C=y
CONFIG_MBEDTLS_RIPEMD160_C=

#
# Certificates
#
CONFIG_MBEDTLS_PEM_PARSE_C=y
CONFIG_MBEDTLS_PEM_WRITE_C=y
CONFIG_MBEDTLS_X509_CRL_PARSE_C=y
CONFIG_MBEDTLS_X509_CSR_PARSE_C=y
CONFIG_MBEDTLS_ECP_C=y
CONFIG_MBEDTLS_ECDH_C=y
CONFIG_MBEDTLS_ECDSA_C=y
CONFIG_MBEDTLS_ECP_DP_SECP192R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP224R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP256R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP384R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP521R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP192K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP224K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_SECP256K1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP256R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP384R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_BP512R1_ENABLED=y
CONFIG_MBEDTLS_ECP_DP_CURVE25519_ENABLED=y
CONFIG_MBEDTLS_ECP_NIST_OPTIM=y

#
# OpenSSL
#
CONFIG_OPENSSL_DEBUG=
CONFIG_OPENSSL_ASSERT_DO_NOTHING=y
CONFIG_OPENSSL_ASSERT_EXIT=

#
# PThreads
#
CONFIG_ESP32_PTHREAD_TASK_PRIO_DEFAULT=5
CONFIG_ESP32_PTHREAD_TASK_STACK_SIZE_DEFAULT=2048

#
# SPI Flash driver
#
CONFIG_SPI_FLASH_VERIFY_WRITE=
CONFIG_SPI_FLASH_ENABLE_COUNTERS=
CONFIG_SPI_FLASH_ROM_DRIVER_PATCH=y
CONFIG_SPI_FLASH_WRITING_DANGEROUS_REGIONS_ABORTS=y
CONFIG_SPI_FLASH_WRITING_DANGEROUS_REGIONS_FAILS=
CONFIG_SPI_FLASH_WRITING_DANGEROUS_REGIONS_ALLOWED=

#
# SPIFFS Configuration
#
CONFIG_SPIFFS_MAX_PARTITIONS=3

#
# SPIFFS Cache Configuration
#
CONFIG_SPIFFS_CACHE=y
CONFIG_SPIFFS_CACHE_WR=y
CONFIG_SPIFFS_CACHE_STATS=
CONFIG_SPIFFS_PAGE_CHECK=y
CONFIG_SPIFFS_GC_MAX_RUNS=10
CONFIG_SPIFFS_GC_STATS=
CONFIG_SPIFFS_OBJ_NAME_LEN=32
CONFIG_SPIFFS_USE_MAGIC=y
CONFIG_SPIFFS_USE_MAGIC_LENGTH=y
CONFIG_SPIFFS_META_LENGTH=4
CONFIG_SPIFFS_USE_MTIME=y

#
# Debug Configuration
#
CONFIG_SPIFFS_DBG=
CONFIG_SPIFFS_API_DBG=
CONFIG_SPIFFS_GC_DBG=
CONFIG_SPIFFS_CACHE_DBG=
CONFIG_SPIFFS_CHECK_DBG=
CONFIG_SPIFFS_TEST_VISUALISATION=

#
# tcpip adapter
#
CONFIG_IP_LOST_TIMER_INTERVAL=120

#
# Wear Levelling
#
CONFIG_WL_SECTOR_SIZE_512=
CONFIG_WL_SECTOR_SIZE_4096=y
CONFIG_WL_SECTOR_SIZE=4096
@MrSurly
Copy link
Author

MrSurly commented Jan 10, 2018

I have a minimalized reproduction of this issue.

@negativekelvin
Copy link
Contributor

EDIT 3.5: Increasing WDT timeout to 10s (per #578) fixes this

Fixes all the issues or just with esp_ota_begin?

@FayeY FayeY changed the title OTA updates failing randomly inside spi_flash_write_inner [TW#17597] OTA updates failing randomly inside spi_flash_write_inner Jan 11, 2018
@MrSurly
Copy link
Author

MrSurly commented Jan 11, 2018

Fixes all the issues or just with esp_ota_begin?

Just the esp_ota_begin, the main issue is still there.

@projectgus
Copy link
Contributor

Hi @MrSurly,

Thanks for being patient while someone gets back to you about this. I have a couple of questions:

  • If you add a vTaskDelay(1) somewhere in the main OTA while loop, does anything change?
  • Which CPU is the update task running on?
  • What other tasks (non-IDF-internal) are running in the system while the OTA update is running?
  • If you turn on Comprehensive heap checking, does anything jump out at you?

The first question is because we've noticed, particularly on LANs where the TCP latency is minimal and bandwidth is much higher than the flash write bandwidth, the OTA loop can starve other tasks in the system. This may explain the the task WDT, although it doesn't explain the other part.

The high-priority IPC (inter-processor) task being involved in the crash dump is related to flash write operations. The CPU doing a flash write will call spi_flash_guard_start() which signals the IPC task on the other CPU to prevent it from reading flash (via code cached in flash) until the flash write is over. This doesn't explain why it's not returning from this properly, though... which is why I mention heap corruption as a possibility.

I have a minimalized reproduction of this issue.

Is this something you're able to share?

@MrSurly
Copy link
Author

MrSurly commented Feb 6, 2018

If you add a vTaskDelay(1) somewhere in the main OTA while loop, does anything change?

Yes, this helps immensely. Seems much more reliable. I set it to iterate the download 20 times, and it failed during # 12 with:

Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: ipc1

Which CPU is the update task running on?

Honestly, not sure. Using xTaskCreate(mainTask, "mainTask", 32768, NULL, tskIDLE_PRIORITY, NULL); from app_main. Code below.

What other tasks (non-IDF-internal) are running in the system while the OTA update is running?

For my actual project, there are several; stuff like keeping Wifi active and what not. For the example code below, there's only one user-space task.

If you turn on Comprehensive heap checking, does anything jump out at you?

Not for the example code; haven't tried yet for production.

The first question is because we've noticed, particularly on LANs where the TCP latency is minimal and bandwidth is much higher than the flash write bandwidth, the OTA loop can starve other tasks in the system. This may explain the the task WDT, although it doesn't explain the other part.

I am downloading from my local workstation; latency is likely < 1ms

The high-priority IPC (inter-processor) task being involved in the crash dump is related to flash write operations. The CPU doing a flash write will call spi_flash_guard_start() which signals the IPC task on the other CPU to prevent it from reading flash (via code cached in flash) until the flash write is over. This doesn't explain why it's not returning from this properly, though... which is why I mention heap corruption as a possibility.

Hrm. Let me turn on the heap checking for my main code and we'll see

I have a minimalized reproduction of this issue.

Is this something you're able to share?

Yes, below. I can provide the supporting code if you actually wish to compile.

// C
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <stddef.h>
#include <string.h>
#include <sys/param.h> // MIN/MAX

// ESP
#include <esp_task_wdt.h>
#include <esp_wifi.h>
#include <esp_wifi_types.h>
#include <esp_log.h>
#include <esp_event_loop.h>
#include <esp_ota_ops.h>
#include <nvs_flash.h>

// local
#include "Http.hpp"
#include "Sha256.hpp"
#include "elapsedMillis.h"
#include "log.h"
#include "config.h"

#define HEXDIGIT(x) (x) <= 9 ? '0' + (x): 'A' + ((x)-10)
#define ARRAY_LEN(x) (sizeof(x) / sizeof(x[0]))

using namespace std;

class Main {



    public:

        Main() : m_haveIp(false) {}

        static esp_err_t eventHandler(void *ctx, system_event_t *event) {
            const char * TAG = "Event";
            Main& self = *(Main*)ctx;
            (void)self;
            switch (event->event_id) {
                case SYSTEM_EVENT_STA_START:
                    L(TAG, "STA_START");
                    break;
                case SYSTEM_EVENT_STA_GOT_IP:
                    L(TAG, "Got IP");
                    self.m_haveIp = true;
                    break;

                case SYSTEM_EVENT_STA_DISCONNECTED: 
                    {
                        self.m_haveIp = false;

                        // This is a workaround as ESP32 WiFi libs don't currently
                        // auto-reassociate.
                        system_event_sta_disconnected_t *disconn = &event->event_info.disconnected;
                        L(TAG, "STA_DISCONNECTED, reason:%d", disconn->reason);
                        switch (disconn->reason) {
                            case WIFI_REASON_BEACON_TIMEOUT:
                                L(TAG,"beacon timeout");
                                break;
                            case WIFI_REASON_NO_AP_FOUND:
                                L(TAG,"no AP found");
                                break;
                            case WIFI_REASON_AUTH_FAIL:
                                L(TAG,"authentication failed");
                                break;

                            default:
                                // Let other errors through and try to reconnect.
                                break;
                        }

                        break;

                    }
                default:
                    L(TAG, "Some unhandled event: %d", event->event_id);
            }
            return ESP_OK;
        }


        void restart() {
            esp_restart();
        }




        bool downloadUpdate(const string& url, string updateHash) {
            const char * TAG = "update";

            const esp_partition_t *configured = esp_ota_get_boot_partition();
            const esp_partition_t *running = esp_ota_get_running_partition();
            const esp_partition_t *update_partition = esp_ota_get_next_update_partition(NULL);

            if (configured != NULL ) {
                L(TAG, "Configured for partition subtype %d at offset 0x%x", configured->subtype, update_partition->address);
            } else {
                L(TAG, "Configured partition is NULL");
            }

            if (running != NULL) {

                L(TAG, "Running from partition subtype %d at offset 0x%x", running->subtype, update_partition->address); 
            } else {
                L(TAG, "Running partition is NULL");
            }

            if(update_partition != NULL) {
                L(TAG, "Writing to partition subtype %d at offset 0x%x", update_partition->subtype, update_partition->address);

            } else {
                L(TAG, "Running partition is NULL");
            }
            esp_ota_handle_t update_handle;
            L(TAG, "update begin");
            elapsedMillis beginTimer;
            if (esp_ota_begin(update_partition, OTA_SIZE_UNKNOWN, &update_handle) == ESP_OK) {
                L(TAG, "after update begin, %d milliseconds", (int) beginTimer);

                Http http("mango/1.0");
                int result = http.get(url);
                if (result == 200) {


                    Sha256 sha256;
                    uint8_t buf [4096];
                    int read;
                    int buf_fill = 0;
                    int ttl_read = 0;
                    int timeout = 5000;
                    elapsedMillis printTimer = 3000;

                    while (true) {
                        read = http.read(buf + buf_fill, sizeof(buf) - buf_fill , timeout);

                        if ( read < 0 ) {
                            if (errno == EAGAIN) {
                                L(TAG, "read timeout");
                                continue;
                            }
                            L(TAG,"read error, aborting");
                            return false;
                        }

                        buf_fill += read;

                        if (read != 0 && buf_fill < sizeof(buf)) {
                            continue;
                        }


                        ttl_read += buf_fill;
                        sha256.update(buf, buf_fill);
                        //L(TAG, "> esp_ota_write");
                        if (esp_ota_write(update_handle, buf, buf_fill) != ESP_OK) {
                            L(TAG,"ota_esp_write failed");
                            return false;
                        }
                        //L(TAG, "< esp_ota_write");
                        if (printTimer > 500) {
                            L(TAG, "downloaded %d bytes", ttl_read);
                            printTimer = 0;
                        }
                        if (read == 0) {
                            break;
                        }
                        buf_fill = 0;
                    }
                    L(TAG, "downloaded %d bytes total", ttl_read);
                    if (esp_ota_end(update_handle) != ESP_OK) {
                        L(TAG,"esp_ota_end() failed");
                        return false;
                    }

                    uint8_t computedHash[65]; 
                    memset(computedHash, 0, sizeof(computedHash));
                    sha256.finish(computedHash);

                    for (std::string::iterator i = updateHash.begin(); i != updateHash.end(); ++i) {
                        *i = toupper(*i);
                    }

                    for (int i = 31; i >= 0; --i) {
                        uint8_t b = computedHash[i];
                        computedHash[i * 2] = HEXDIGIT(b >> 4);
                        computedHash[i * 2 + 1] = HEXDIGIT(b & 0xF);
                    }
                    L(TAG, "computed hash %s", computedHash);

                    if (updateHash != (char *)computedHash) {
                        L(TAG, "computed hash of %s does not equal update hash of %s, aborting", computedHash, updateHash.c_str());
                        return false;
                    }

                    L(TAG, "Updating bootable partition");
                    if (esp_ota_set_boot_partition(update_partition) != ESP_OK) {
                        L(TAG,"esp_ota_set_boot_partition() failed");
                        return false;
                    }
                } else {
                    L(TAG,"HTTP GET failed");
                    return false;
                }
            } else {
                L(TAG, "esp_ota_begin() returned an error");
                return false;
            }

            L(TAG, "OTA update successful");
            return true;

        }

        void wifi() {

            const char * ssid = WIFI_SSID;
            const char * pass = WIFI_PASS;

            tcpip_adapter_init();
            esp_event_loop_init(eventHandler, this);

            const char * TAG = "initWifi";
            wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();

            L(TAG, "esp_wifi_init()");
            if (esp_wifi_init(&cfg) != ESP_OK) {
                L(TAG, "esp_wifi_init() failed");
                restart();
            }

            L(TAG, "esp_wifi_set_storage()");
            if(esp_wifi_set_storage(WIFI_STORAGE_RAM) != ESP_OK) {
                L(TAG, "esp_wifi_set_storage() failed");
                restart();
            }


            L(TAG, "esp_wifi_set_mode()");
            if(esp_wifi_set_mode(WIFI_MODE_STA) != ESP_OK) {
                L(TAG, "esp_wifi_set_mode() failed");
                restart();
            }


            L(TAG, "esp_wifi_start()");
            if(esp_wifi_start() != ESP_OK) {
                L(TAG, "esp_wifi_start() failed");
                restart();
            }
            wifi_config_t config;
            memset(&config, 0, sizeof(config));
            memcpy(config.sta.ssid, ssid, MIN(strlen(ssid), sizeof(config.sta.ssid)));
            memcpy(config.sta.password, pass, MIN(strlen(pass), sizeof(config.sta.password)));

            if(esp_wifi_set_config(ESP_IF_WIFI_STA, &config) != ESP_OK) {
                L(TAG, "esp_wifi_set_config() failed");
                restart();
            }

            if (esp_wifi_connect() != ESP_OK) {
                L(TAG, "esp_wifi_connect() failed");
                restart();
            }
        }


        void run() {
            nvs_flash_init(); //  Required for Wifi, BT
            const char * TAG = "main";
            wifi();

            while(!m_haveIp) {
                L(TAG, "waiting for IP");
                sleep(1);
            }

            for(int i = 0; i < 20; i++) {
                L(TAG, "Begin test #%d\n", i + 1);
                downloadUpdate("http://172.16.23.157/test.bin","13fbead996e7fa70612467c3ea478fb8db80db47b0d5f97e0f7662027bbf9cd9");
                L(TAG, "End test #%d\n", i + 1);
            }
            L(TAG, "finished, idle");
            while(true) {
                sleep(3);
            }
        }

    private:
        volatile bool m_haveIp;
};



void mainTask (void* ctx) {
    (void)ctx;
    Main test;
    test.run();
}


extern "C" {
    void app_main() {
        xTaskCreate(mainTask, "mainTask", 32768, NULL, tskIDLE_PRIORITY, NULL);
    }
};

@MrSurly
Copy link
Author

MrSurly commented Feb 6, 2018

Just got this ... a little different this time since CPU 0 says "wifi". The ones after it say "IDLE" for CPU 0.

Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 1)
Tasks currently running:
CPU 0: wifi
CPU 1: ipc1

Also, sometimes the watchdog message is after download is complete, during a call to esp_ota_end

@MrSurly
Copy link
Author

MrSurly commented Feb 6, 2018

Not sure of the relevance, but upon success, this is seen when finishing the OTA

I (34910) esp_image: segment 1: paddr=0x0017a324 vaddr=0x3ffc0000 size=0x03320 ( 13088) 
I (34940) esp_image: segment 2: paddr=0x0017d64c vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _iram_start at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/freertos/./xtensa_vectors.S:1685

I (34970) esp_image: segment 3: paddr=0x0017da54 vaddr=0x40080400 size=0x025bc (  9660) 
I (35000) esp_image: segment 4: paddr=0x00180018 vaddr=0x400d0018 size=0xbc07c (770172) map
0x400d0018: _stext at ??:?

I (36070) esp_image: segment 5: paddr=0x0023c09c vaddr=0x400829bc size=0x110c0 ( 69824) 
0x400829bc: _xt_coproc_exc at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/freertos/./xtensa_vectors.S:999

I (36190) esp_image: segment 6: paddr=0x0024d164 vaddr=0x400c0000 size=0x00064 (   100) 
I (36220) esp_image: segment 7: paddr=0x0024d1d0 vaddr=0x50000000 size=0x00008 (     8) 
[  update]: computed hash 13FBEAD996E7FA70612467C3EA478FB8DB80DB47B0D5F97E0F7662027BBF9CD9
[  update]: Updating bootable partition
I (36280) esp_image: segment 0: paddr=0x00140020 vaddr=0x3f400020 size=0x3a2fc (238332) map
I (36610) esp_image: segment 1: paddr=0x0017a324 vaddr=0x3ffc0000 size=0x03320 ( 13088) 
I (36630) esp_image: segment 2: paddr=0x0017d64c vaddr=0x40080000 size=0x00400 (  1024) 
0x40080000: _iram_start at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/freertos/./xtensa_vectors.S:1685

I (36660) esp_image: segment 3: paddr=0x0017da54 vaddr=0x40080400 size=0x025bc (  9660) 
I (36680) esp_image: segment 4: paddr=0x00180018 vaddr=0x400d0018 size=0xbc07c (770172) map
0x400d0018: _stext at ??:?

I (37740) esp_image: segment 5: paddr=0x0023c09c vaddr=0x400829bc size=0x110c0 ( 69824) 
0x400829bc: _xt_coproc_exc at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/freertos/./xtensa_vectors.S:999

I (37850) esp_image: segment 6: paddr=0x0024d164 vaddr=0x400c0000 size=0x00064 (   100) 
I (37870) esp_image: segment 7: paddr=0x0024d1d0 vaddr=0x50000000 size=0x00008 (     8) 

@projectgus
Copy link
Contributor

projectgus commented Feb 7, 2018

Thanks for all this info. There's a lot here, but one thing jumped out at me:

Which CPU is the update task running on?

Honestly, not sure. Using xTaskCreate(mainTask, "mainTask", 32768, NULL, tskIDLE_PRIORITY, NULL); from app_main. Code below.

This will allow the task to migrate between cores depending on which is idle. Given the crash seems to be related to inter-core communciation in a busy system, this may be relevant.

Can you please try changing this to xTaskCreatePinnedToCore(). You'll need another argument at the end of the args list, either 0 or 1 (for which core to pin to). If you're not sure, try 1.

(This isn't a fix as much as trying to narrow down whatever the underlying cause of the bug is.)

@MrSurly
Copy link
Author

MrSurly commented Feb 7, 2018

Edit: Pinning to core 0 seems to cause the WDT loop

Using xTaskCreatePinnedToCore it's better. It no longer goes into a WDT loop w/o resetting, but now it resets (maybe 10% of the time):


rst:0x7 (TG0WDT_SYS_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5632
ho 0 tail 12 room 4
load:0x40078000,len:0
load:0x40078000,len:13716
entry 0x40078fb4
W (30) boot: PRO CPU has been reset by WDT.
W (30) boot: WDT reset info: PRO CPU PC=0x40083a99
0x40083a99: spi_flash_op_block_func at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/spi_flash/./cache_utils.c:82 (discriminator 1)

W (31) boot: WDT reset info: APP CPU PC=0x400d1bde (waiti mode)
0x400d1bde: esp_vApplicationIdleHook at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/esp32/./freertos_hooks.c:85

I (38) boot: ESP-IDF v3.1-dev-171-gf9ad17e-dirty 2nd stage bootloader

@MrSurly
Copy link
Author

MrSurly commented Feb 8, 2018

So, this just happened, inside of esp_wifi_init(). Wifi is managed by a different task. Should all tasks I create be pinned to a core to prevent this issue?

[initWifi]: esp_wifi_init()
I (8855) wifi: wifi firmware version: 403db1d
I (8855) wifi: config NVS flash: enabled
I (8855) wifi: config nano formating: disabled
I (8855) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (8865) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 1)
Tasks currently running:
CPU 0: IDLE
CPU 1: ipc1
< loops forever >

@rahul-bedarkar
Copy link

I am facing exactly same with OTA. I have external SPI RAM. There are many tasks running while OTA is running. If I run only wifi and OTA tasks, OTA completes successfully without watchdog trigger issue.

@MrSurly
Copy link
Author

MrSurly commented Feb 15, 2018

Now I'm seeing WDT resets during normal operation, which appear flash related:

ets Jun  8 2016 00:22:57

rst:0x7 (TG0WDT_SYS_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5832
load:0x40078000,len:0
ho 12 tail 0 room 4
load:0x40078000,len:15376
entry 0x400790d4
W (30) boot: PRO CPU has been reset by WDT.
W (30) boot: WDT reset info: PRO CPU PC=0x40083c37
0x40083c37: spi_flash_op_block_func at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/spi_flash/./cache_utils.c:82 (discriminator 1)

W (31) boot: WDT reset info: APP CPU PC=0x400d1e92 (waiti mode)
0x400d1e92: esp_vApplicationIdleHook at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/esp32/./freertos_hooks.c:85

I (38) boot: ESP-IDF v3.1-dev-185-gdcd7a1e 2nd stage bootloader
I (44) boot: compile time 09:18:35
I (49) boot: Enabling RNG early entropy source...
I (54) boot: SPI Speed      : 40MHz
I (58) boot: SPI Mode       : DIO
I (62) boot: SPI Flash Size : 4MB
I (66) boot: Partition Table:
I (70) boot: ## Label            Usage          Type ST Offset   Length
I (77) boot:  0 nvs              WiFi data        01 02 00009000 00003000
I (84) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (92) boot:  2 factory          factory app      00 00 00010000 00130000
I (99) boot:  3 otadata          OTA data         01 00 00140000 00004000
I (107) boot:  4 XXX           WiFi data        01 02 00144000 00006000
I (114) boot:  5 ota_0            OTA app          00 10 00150000 00130000
I (122) boot:  6 ota_1            OTA app          00 11 00280000 00130000
I (130) boot: End of partition table
I (134) boot: Defaulting to factory image
I (138) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x3c988 (248200) map
I (234) esp_image: segment 1: paddr=0x0004c9b0 vaddr=0x3ffc0000 size=0x034e0 ( 13536) load
I (239) esp_image: segment 2: paddr=0x0004fe98 vaddr=0x40080000 size=0x00178 (   376) load
0x40080000: _WindowOverflow4 at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/freertos/./xtensa_vectors.S:1685

I (241) esp_image: segment 3: paddr=0x00050018 vaddr=0x400d0018 size=0xcf5c4 (849348) map
0x400d0018: _flash_cache_start at ??:?

I (544) esp_image: segment 4: paddr=0x0011f5e4 vaddr=0x40080178 size=0x00288 (   648) load
0x40080178: _WindowUnderflow12 at ??:?

I (544) esp_image: segment 5: paddr=0x0011f874 vaddr=0x40080400 size=0x14e9c ( 85660) load
I (586) esp_image: segment 6: paddr=0x00134718 vaddr=0x400c0000 size=0x00064 (   100) load
I (586) esp_image: segment 7: paddr=0x00134784 vaddr=0x50000000 size=0x00008 (     8) load
I (605) boot: Loaded app from partition at offset 0x10000
I (605) boot: Disabling RNG early entropy source...
I (607) cpu_start: Pro cpu up.
I (610) cpu_start: Starting app cpu, entry point is 0x40081014
0x40081014: call_start_cpu1 at /home/epoulsen/workspaces/XXX/ext_deps/esp-idf/components/esp32/./cpu_start.c:215

I (0) cpu_start: App cpu up.
I (621) heap_init: Initializing. RAM available for dynamic allocation:
I (628) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (634) heap_init: At 3FFD01E8 len 0000FE18 (63 KiB): DRAM
I (640) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (646) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (653) heap_init: At 4009529C len 0000AD64 (43 KiB): IRAM
I (659) cpu_start: Pro cpu start user code
I (9) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (10) gpio: GPIO[13]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (10) gpio: GPIO[14]| InputEn: 1| Out

@projectgus
Copy link
Contributor

Hi @MrSurly ,

Sorry, I've been out sick the past week.

Can I please take you up on the offer of a full project I can reproduce this in?

Angus

@MrSurly
Copy link
Author

MrSurly commented Feb 16, 2018

Sorry, I've been out sick the past week.

=( Sorry to hear that. I hope you're feeling better.

Can I please take you up on the offer of a full project I can reproduce this in?

Absolutely. It's presently a branch from a client project. I'm out for the next 4 days -- I'll share a repo when I can get it properly sanitized and anonymized.

@MrSurly
Copy link
Author

MrSurly commented Feb 21, 2018

@projectgus
Sorry for the delay: https://github.com/MrSurly/ESPIDF-issue-1479

I use an image > 1MB. Usually fails around attempt # 10, with a WDT.

For the production project I'm working on, this fails 3/4 times; probably because of more threads running ..?

@MrSurly
Copy link
Author

MrSurly commented Mar 20, 2018

@projectgus Any word?

@chegewara
Copy link
Contributor

I have one question, maybe its relevant or maybe not. Why you create main task with so low priority?
xTaskCreate(mainTask, "mainTask", 32768, NULL, tskIDLE_PRIORITY, NULL);

Could you try something like:
xTaskCreate(mainTask, "mainTask", 32768, NULL, 5, NULL);

@projectgus
Copy link
Contributor

projectgus commented Mar 21, 2018

Hi @MrSurly ,

Sorry for the extended silence I've been busy with some other tasks and this slipped.

@chegewara makes a very good point about the task priority. I totally missed this when you posted it earlier. This means that the main task is always competing with the idle task for CPU time, and this may have unexpected consequences (at minimum, degraded performance as the idle task generally assumes that any "real" task will pre-empt it). Try raising it to a value like the 5 suggested, or even (tskIDLE_PRIORITY + 1).

Angus

@MrSurly
Copy link
Author

MrSurly commented Mar 22, 2018

@chegewara @projectgus

Increasing the priority seems to have entirely fixed the issue, though I don't understand the mechanism of why.

@chegewara
Copy link
Contributor

https://www.freertos.org/Documentation/161204_Mastering_the_FreeRTOS_Real_Time_Kernel-A_Hands-On_Tutorial_Guide.pdf

This is good position to start. Sure, you dont need to know freeRTOS, but if you will learn just a basic stuff then it will help you a lot in work with esp-idf.

Task has few parameters, one of them is priority. In your code it was set to tskIDLE_PRIORITY which is lowest possible. This priority is for idle freeRTOS task to make some cleanings etc. When you got your task set to tskIDLE_PRIORITY freeRTOS has been switching between your task and idle task even if your main task required more time to finish job. Basicaly for normal applications its good to set task priority between 5 and 10. But like i said, its good to read about it from this document to save time and health and avoid strange app behaviour.

@projectgus
Copy link
Contributor

@MrSurly ,

To add to what @chegewara says, the "idle task" is special because it's assumed that it's the task that runs when there's nothing of higher priority to run. So it may do things like, for example, suspend the CPU clock to put the system into a low power state.

The idle task can do this because it knows that if anything else needed to run, it would have a higher priority so it would already be running (in a hard real-time system like an RTOS, the highest priority "ready" task will always run before any lower priority task(s)).

There is also a small difference in the ESP-IDF implementation of the FreeRTOS scheduler where we don't guarantee fair round-robin preemption of tasks at the same priority. The idle task never suspends waiting on a mutex or a semaphore (by design, because the assumption is that any other task in the system will have higher priority so why should the idle task ever need to block). So there's no guarantee that preemption will fairly wake up another task at the same "idle" priority if it's blocked on something.

(Normally this scenario would trigger the ESP-IDF task watchdog to say "hey, task X is never blocking and might be starving something else out." But, again, the idle task is special as the assumption is that it's the lowest priority thing in the system so it can never starve anything out.)

TLDR: Every task created should be at least tskIDLE_PRIORITY +1 priority, unless you're doing something very complex with FreeRTOS internals.

Sorry that I didn't spot this in your sample code the first time around, and well done @chegewara for spotting it.

@chegewara
Copy link
Contributor

Its my bad too, because i dont read most of issues ( i just dont feel i have knowledge to answer, this time i just got lucky), and im glad it solved your problem.

Have a fun with esp32.

@MrSurly
Copy link
Author

MrSurly commented Mar 23, 2018

@chegewara @projectgus

Thank you both for this. That's what I get for copy-paste the task create code. Going to close this.

@MrSurly MrSurly closed this as completed Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants