Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIRAM corruption in light sleep mode (IDFGH-10297) #11558

Closed
3 tasks done
Erlkoenig90 opened this issue May 31, 2023 · 3 comments
Closed
3 tasks done

SPIRAM corruption in light sleep mode (IDFGH-10297) #11558

Erlkoenig90 opened this issue May 31, 2023 · 3 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@Erlkoenig90
Copy link
Contributor

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

both v5.0.2 and master/903af13e847cd301e476d8b16b4ee1c21b30b5c6

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

PowerShell

Development Kit.

ESP32-S3-DevKitC

Power Supply used.

USB

What is the expected behavior?

When using sleep mode, SPIRAM contents should stay intact.

What is the actual behavior?

When the CPU is in light-sleep for > ~10min, individual bits in SPIRAM are flipped. If data structures / pointers are affected, this leads to random crashes/errors.

Steps to reproduce.

  1. Compile the test case application.
  2. Flash on ESP32S3-DevKit-C
  3. Run for 10-30min

This error occurs when memory in SPIRAM is allocated via heap_caps_malloc (bufSize, MALLOC_CAP_SPIRAM), some data is written to memory, sleep mode is entered for some time (10s to 10min), and then memory is read again. Some of the read bytes have one bit flipped.

The following config options are set to reproduce this:
ESPTOOLPY_FLASHSIZE_8MB
SPIRAM
SPIRAM_MODE_OCT
SPIRAM_USE_CAPS_ALLOC
PM_ENABLE
FREERTOS_USE_TICKLESS_IDLE
ESP_DEFAULT_CPU_FREQ_MHZ_240
COMPILER_OPTIMIZATION_PERF

However, when ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND and ESP_SLEEP_FLASH_LEAKAGE_WORKAROUND are set, this issue does not occur.

Debug Logs.

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3820,len:0x1718
load:0x403c9700,len:0x4
load:0x403c9704,len:0xc08
load:0x403cc700,len:0x2f08
entry 0x403c990c
I (27) boot: ESP-IDF v5.2-dev-823-g903af13e84 2nd stage bootloader
I (27) boot: compile time May 31 2023 14:56:00
I (27) boot: Multicore bootloader
I (32) boot: chip revision: v0.1
I (35) boot.esp32s3: Boot SPI Speed : 80MHz
I (40) boot.esp32s3: SPI Mode       : DIO
I (45) boot.esp32s3: SPI Flash Size : 8MB
I (50) boot: Enabling RNG early entropy source...
I (55) boot: Partition Table:
I (59) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (73) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (81) boot:  2 factory          factory app      00 00 00010000 00100000
I (88) boot: End of partition table
I (92) esp_image: segment 0: paddr=00010020 vaddr=3c020020 size=0aa50h ( 43600) map
I (109) esp_image: segment 1: paddr=0001aa78 vaddr=3fc93500 size=02b08h ( 11016) load
I (112) esp_image: segment 2: paddr=0001d588 vaddr=40374000 size=02a90h ( 10896) load
I (120) esp_image: segment 3: paddr=00020020 vaddr=42000020 size=18098h ( 98456) map
I (144) esp_image: segment 4: paddr=000380c0 vaddr=40376a90 size=0ca38h ( 51768) load
I (156) esp_image: segment 5: paddr=00044b00 vaddr=600fe000 size=00034h (    52) load
I (162) boot: Loaded app from partition at offset 0x10000
I (163) boot: Disabling RNG early entropy source...
I (176) cpu_start: Multicore app
I (176) octal_psram: vendor id    : 0x0d (AP)
I (176) octal_psram: dev id       : 0x02 (generation 3)
I (179) octal_psram: density      : 0x03 (64 Mbit)
I (185) octal_psram: good-die     : 0x01 (Pass)
I (190) octal_psram: Latency      : 0x01 (Fixed)
I (195) octal_psram: VCC          : 0x01 (3V)
I (200) octal_psram: SRF          : 0x01 (Fast Refresh)
I (206) octal_psram: BurstType    : 0x01 (Hybrid Wrap)
I (212) octal_psram: BurstLen     : 0x01 (32 Byte)
I (218) octal_psram: Readlatency  : 0x02 (10 cycles@Fixed)
I (224) octal_psram: DriveStrength: 0x00 (1/1)
I (229) esp_psram: Found 8MB PSRAM device
I (234) esp_psram: Speed: 40MHz
I (238) cpu_start: Pro cpu up.
I (241) cpu_start: Starting app cpu, entry point is 0x403754d4
0x403754d4: call_start_cpu1 at C:/Users/n.guertler/Projects/ESP-IDF/esp-idf-v5.0.2/components/esp_system/port/cpu_start.c:154

I (0) cpu_start: App cpu up.
I (979) esp_psram: SPI SRAM memory test OK
I (988) cpu_start: Pro cpu start user code
I (988) cpu_start: cpu freq: 240000000 Hz
I (988) cpu_start: Application information:
I (991) cpu_start: Project name:     esp32s3-spiram-corrupt
I (997) cpu_start: App version:      0.1
I (1002) cpu_start: Compile time:     May 31 2023 14:54:23
I (1008) cpu_start: ELF file SHA256:  b619a63b95568f30...
I (1014) cpu_start: ESP-IDF:          v5.2-dev-823-g903af13e84
I (1020) cpu_start: Min chip rev:     v0.0
I (1025) cpu_start: Max chip rev:     v0.99
I (1030) cpu_start: Chip rev:         v0.1
I (1035) heap_init: Initializing. RAM available for dynamic allocation:
I (1042) heap_init: At 3FC96960 len 00052DB0 (331 KiB): DRAM
I (1049) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (1055) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (1062) heap_init: At 600FE034 len 00001FCC (7 KiB): RTCRAM
I (1068) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
I (1076) spi_flash: detected chip: generic
I (1080) spi_flash: flash io: dio
I (1084) sleep: Configure to isolate all GPIO pins in sleep state
I (1091) sleep: Enable automatic switching of GPIO sleep configuration
I (1110) app_start: Starting scheduler on CPU0
I (1110) app_start: Starting scheduler on CPU1
I (1110) main_task: Started on CPU0
I (1120) main_task: Calling app_main()
I (1120) pm: Frequency switching config: CPU_MAX: 240, APB_MAX: 80, APB_MIN: 40, Light sleep: ENABLED
I (1130) sleep: Code start at 0x42000020, total 98455, data start at 0x3c000000, total 33554432 Bytes
0x42000020: _stext at ??:?

I (3440) main: Sleeping for 600 s
I (603440) main: Checking
E (603440) main: Memory check failed at addr=0x3c03092f i=     47, got=6f, expect=2f, xor=40, mismatches=0, label=loop
E (603440) main: Memory check failed at addr=0x3c030938 i=     56, got=78, expect=38, xor=40, mismatches=0, label=loop
E (603450) main: Memory check failed at addr=0x3c030986 i=    134, got=87, expect=86, xor=01, mismatches=0, label=loop
E (603470) main: Memory check failed at addr=0x3c0309aa i=    170, got=ab, expect=aa, xor=01, mismatches=0, label=loop
E (603480) main: Memory check failed at addr=0x3c030a0e i=    270, got=0c, expect=0e, xor=02, mismatches=0, label=loop
E (603490) main: Memory check failed at addr=0x3c030a23 i=    291, got=63, expect=23, xor=40, mismatches=0, label=loop
E (603500) main: Memory check failed at addr=0x3c030a3a i=    314, got=32, expect=3a, xor=08, mismatches=0, label=loop
E (603510) main: Memory check failed at addr=0x3c030ae8 i=    488, got=ec, expect=e8, xor=04, mismatches=0, label=loop
E (603520) main: Memory check failed at addr=0x3c030b1a i=    538, got=18, expect=1a, xor=02, mismatches=0, label=loop
E (603530) main: Memory check failed at addr=0x3c030b5d i=    605, got=55, expect=5d, xor=08, mismatches=0, label=loop
E (603550) main: Memory check failed at addr=0x3c030c37 i=    823, got=35, expect=37, xor=02, mismatches=0, label=loop
E (603560) main: Memory check failed at addr=0x3c030ccb i=    971, got=4b, expect=cb, xor=80, mismatches=0, label=loop
E (603570) main: Memory check failed at addr=0x3c030df6 i=   1270, got=f4, expect=f6, xor=02, mismatches=0, label=loop
E (603580) main: Memory check failed at addr=0x3c030e6d i=   1389, got=65, expect=6d, xor=08, mismatches=0, label=loop
E (603590) main: Memory check failed at addr=0x3c030e86 i=   1414, got=c6, expect=86, xor=40, mismatches=0, label=loop
E (603600) main: Memory check failed at addr=0x3c030e8b i=   1419, got=9b, expect=8b, xor=10, mismatches=0, label=loop
E (603610) main: Memory check failed at addr=0x3c030e9b i=   1435, got=99, expect=9b, xor=02, mismatches=0, label=loop
E (603620) main: Memory check failed at addr=0x3c030ecb i=   1483, got=c9, expect=cb, xor=02, mismatches=0, label=loop
E (603640) main: Memory check failed at addr=0x3c030ee3 i=   1507, got=63, expect=e3, xor=80, mismatches=0, label=loop
E (603650) main: Memory check failed at addr=0x3c030f97 i=   1687, got=17, expect=97, xor=80, mismatches=0, label=loop
E (603660) main: Memory check failed at addr=0x3c030fa3 i=   1699, got=e3, expect=a3, xor=40, mismatches=0, label=loop
E (603670) main: Memory check failed at addr=0x3c030fd8 i=   1752, got=dc, expect=d8, xor=04, mismatches=0, label=loop
E (603680) main: Memory check failed at addr=0x3c030fe8 i=   1768, got=f8, expect=e8, xor=10, mismatches=0, label=loop
E (603690) main: Memory check failed at addr=0x3c03102e i=   1838, got=2c, expect=2e, xor=02, mismatches=0, label=loop
E (603700) main: Memory check failed at addr=0x3c03103d i=   1853, got=35, expect=3d, xor=08, mismatches=0, label=loop
E (603720) main: Memory check failed at addr=0x3c031120 i=   2080, got=21, expect=20, xor=01, mismatches=0, label=loop
E (603730) main: Memory check failed at addr=0x3c03113e i=   2110, got=36, expect=3e, xor=08, mismatches=0, label=loop
E (603740) main: Memory check failed at addr=0x3c031178 i=   2168, got=58, expect=78, xor=20, mismatches=0, label=loop
E (603750) main: Memory check failed at addr=0x3c031194 i=   2196, got=95, expect=94, xor=01, mismatches=0, label=loop
E (603760) main: Memory check failed at addr=0x3c0311aa i=   2218, got=2a, expect=aa, xor=80, mismatches=0, label=loop
E (603770) main: Memory check failed at addr=0x3c0311e2 i=   2274, got=e0, expect=e2, xor=02, mismatches=0, label=loop
E (603780) main: Memory check failed at addr=0x3c031209 i=   2313, got=0d, expect=09, xor=04, mismatches=0, label=loop
E (603800) main: Memory check failed at addr=0x3c03120e i=   2318, got=0f, expect=0e, xor=01, mismatches=0, label=loop
E (603810) main: Memory check failed at addr=0x3c031222 i=   2338, got=02, expect=22, xor=20, mismatches=0, label=loop
E (603820) main: Memory check failed at addr=0x3c03122d i=   2349, got=4d, expect=2d, xor=60, mismatches=0, label=loop
E (603830) main: Memory check failed at addr=0x3c031256 i=   2390, got=57, expect=56, xor=01, mismatches=0, label=loop

More Information.

It took me a long time to narrow this issue down. I experienced random crashes in a complex application after waking up from sleep mode which turned out to be caused by the memory corruption. I accidentally disabled ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND and ESP_SLEEP_FLASH_LEAKAGE_WORKAROUND. I didn't realize that those are necessary to use SPIRAM and sleep mode. The documentation for those settings states that they can be used to reduce power consumption in sleep mode, but not that they are needed to keep SPIRAM data valid.

I am not sure whether this is normal/intended behavior. Should SPIRAM work normally even without ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND and ESP_SLEEP_FLASH_LEAKAGE_WORKAROUND, just with higher power consumption?

Perhaps this is just a documentation issue? Should the documentation for those config settings state that they are necessary for using sleep mode? Maybe the doc for SPIRAM and sleep modes should have a warning that those settings are needed?

Would it make sense to run a memory check after waking up from sleep mode to verify SPIRAM contents is still valid (keeping a checksum in IRAM)?

@Erlkoenig90 Erlkoenig90 added the Type: Bug bugs in IDF label May 31, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label May 31, 2023
@github-actions github-actions bot changed the title SPIRAM corruption in light sleep mode SPIRAM corruption in light sleep mode (IDFGH-10297) May 31, 2023
@huming2207
Copy link
Contributor

I think we had a similar issue before, and I think enabling the ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND is a proper fix unless you have an external PSRAM and external pullup. Otherwise, two things will happen:

  1. The SPI flash is sharing the same SPI master with the PSRAM and the microcontroller need to talk to it as well. If the CS/CE# pin of PSRAM is floating when the ESP32 is preparing itself to go to sleep or wake up, garbage data may be written into the PSRAM, which causes corruption;
  2. In regards to the power consumption, at least for ESP-PSRAM64H and APS12808L-3OBM, both datasheets mentioned: "When CE#=1, the chip is in standby state". Thus if you disable ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND and the CS/CE# pin does not have an external pull-up, it will probably waste more power; while if you enable it, it won't waste power. Indeed pulling up the CE# pin during sleep may also cause a minor leakage, it should be far less than leaving PSRAM awake, probably a few uA vs a few mA.

See:

@Erlkoenig90
Copy link
Contributor Author

Hi, thanks for the information. Then I will keep ESP_SLEEP_PSRAM_LEAKAGE_WORKAROUND enabled.

Which PSRAM chip is included in the ESP32-S3R8 chip, and does it have pull-up resistors? We use the ESP32-S3-WROOM-1-N16R8 module on a custom PCB, and the module datasheet does not show any pull-up on SPICS0/SPICS1, but perhaps the ESP32-S3R8 contains one internally?

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Jun 19, 2023
@esp-wzh
Copy link
Collaborator

esp-wzh commented Jun 19, 2023

@huming2207 Thanks for your explanation, it's absolutely correct.

@Erlkoenig90, the CS pin is not pulled up in the chip package, and it not mapped out to the module pin, so if you are using modules, you must enable this option (maybe we should emphasize this in the help doc). Compared with adding a pull-up resistor outside the chip, this software solution only increases the current by a few uA, which is acceptable for lightsleep.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Jun 19, 2023
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

4 participants