Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation issues (IDFGH-2698) #4769

Closed
enricop opened this issue Feb 15, 2020 · 6 comments
Closed

Memory allocation issues (IDFGH-2698) #4769

enricop opened this issue Feb 15, 2020 · 6 comments

Comments

@enricop
Copy link

enricop commented Feb 15, 2020

Dear support,
we're facing some memory allocation issues with our ESP-WROVER based BLE Mesh IoT gateway.

We're currently trying to assess the limits of our system by increasing the number of mesh nodes and devices connected to our ESP32.
Everything is going pretty good so far, except for a few sporadic crashes, whose origin we determined to be related to our chronical shortage of internal memory.

We've seen several times this pattern occurring:

  1. With big plants comprised of a large number of devices (> 40), free internal DRAM tends to hover around ~25KiB free, probably highly fragmented. We've already moved everything humanly possible to external SPI-RAM, including setting CONFIG_SPIRAM_USE_MALLOC=y and SPIRAM_MALLOC_ALWAYSINTERNAL=256, plus writing custom allocators for our C++ code that forces the usage of heap_caps_malloc with MALLOC_CAP_SPIRAM.
  2. Some parts of our code then try to operate on the NVS, calling nvs_flash_init_partition. This triggers the loading of pages from the NVS, which in turns invokes spi_flash_read;
  3. spi_flash_read invokes esp_flash_read, which returns ESP_ERR_NO_MEM (0x101);
  4. The application calls abort() after printing "unexpected SPI flash error code: 101" (aka ESP_ERR_NO_MEM).

We have a strong suspicion that this error could origin from the following code:

esp_flash_api.c:

esp_err_t IRAM_ATTR esp_flash_read(esp_flash_t *chip, void *buffer, uint32_t address, uint32_t length)
{
   [..]
    //when the cache is disabled, only the DRAM can be read, check whether we need to receive in another buffer in DRAM.
    bool direct_read = chip->host->supports_direct_read(chip->host, buffer);

    uint8_t* temp_buffer = NULL;
    
    if (!direct_read) {
        uint32_t length_to_allocate = MAX(MAX_READ_CHUNK, length);
        length_to_allocate = (length_to_allocate+3)&(~3);
        temp_buffer = heap_caps_malloc(length_to_allocate, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
        ESP_LOGV(TAG, "allocate temp buffer: %p", temp_buffer);
        if (temp_buffer == NULL) return ESP_ERR_NO_MEM;
    }

    [..]

It seems like that due to the conflict between SPI RAM access and NVS reads if the destination buffer is not in DRAM (and given we've set SPIRAM_MALLOC_ALWAYSINTERNAL=256, almost all big allocations are in SPIRAM) the function attempts to allocate an at least 16KiB temporary buffer from internal memory; this operation always fails because of our 25k of free internal memory are very fragmented.

Do you have any tips about how to avoid this issue? We've already tried freeing up as much internal memory as possible. In particular, we would like to know if you can recommend some methods to prevent or at least mitigate memory fragmentation.

Thank you in advance for your support.

@github-actions github-actions bot changed the title Memory allocation issues Memory allocation issues (IDFGH-2698) Feb 15, 2020
@negativekelvin
Copy link
Contributor

uint32_t length_to_allocate = MAX(MAX_READ_CHUNK, length);

uint32_t length_to_read = MIN(MAX_READ_CHUNK, length);

Should be MIN not MAX

@enricop
Copy link
Author

enricop commented Feb 17, 2020

Hi @negativekelvin , thank you.

Could this be fixed in release/v4.0 branch?

@mahavirj
Copy link
Member

@enricop We will fix this ASAP and backport for previous applicable release branches as well.

espressif-bot pushed a commit that referenced this issue Mar 12, 2020
…ash to PSRAM buffers

Previously would try allocate buffer of minimum size 16KB not maximum size 16KB, causing
out of memory errors for any large reads, or if less than 16KB contiguous free heap.

Also, if using legacy API and internal allocation failed then implementation would abort()
instead of returning the error to the caller.

Added test for using large buffers in PSRAM.

Closes #4769

Also reported on forum: https://esp32.com/viewtopic.php?f=13&t=14304&p=55972
espressif-bot pushed a commit that referenced this issue Mar 18, 2020
…ash to PSRAM buffers

Previously would try allocate buffer of minimum size 16KB not maximum size 16KB, causing
out of memory errors for any large reads, or if less than 16KB contiguous free heap.

Also, if using legacy API and internal allocation failed then implementation would abort()
instead of returning the error to the caller.

Added test for using large buffers in PSRAM.

Closes #4769

Also reported on forum: https://esp32.com/viewtopic.php?f=13&t=14304&p=55972
@nacpem
Copy link

nacpem commented Apr 29, 2020

+1 opening this issue

following all the threads which have led me here

W (198897) wifi:alloc eb len=24 type=3 fail, heap:0

W (198897) wifi:m f null

@projectgus
Copy link
Contributor

@rebeliousconformist This doesn't seem to be the same as the issue reported above. The log message indicates that Wi-Fi is failing to allocate 24 bytes from internal memory because there is no free heap remaining ("heap:0" indicates zero bytes available).

If you think this is an issue in ESP-IDF, rather than the firmware is running out of free memory, then please open a new issue with the extra details shown in the issue template.

If you suspect a memory leak somewhere, there are some tips for heap memory debugging here: https://docs.espressif.com/projects/esp-idf/en/release-v4.1/api-reference/system/heap_debug.html#

@nacpem
Copy link

nacpem commented May 6, 2020

@projectgus yes. ill report them if i do find any bugs. The dearth of heap memory has occured after updating to the latest commit. In the past it has so occured that i have spent days debugging and after i found the bug, it was fixed in the next commit. Hence my post to make sure im not running blind.
My best Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants