Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a device to the SPI master (i.e. spi_bus_add_device()) fails on interrupt allocation (IDFGH-4917) #6709

Closed
eflukx opened this issue Mar 13, 2021 · 8 comments
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@eflukx
Copy link

eflukx commented Mar 13, 2021

I'm using LEDC, Ethernet (EMAC with LAN8710 PHY) and GPIO interrupt service in conjunction with SPI master. If I run my application with either LEDC and/or Ethernet enabled with GPIO interrupt service with default flags (gpio_install_isr_service(0)), initialization of my SPI devices fail.

spi_bus_add_device(...) returns with a ESP_ERR_NOT_FOUND error code. The IDF manual claims this return code means that there's no free CS slot available, digging down further however shows that this is not the case (in my case). The error code bubbles up from a call to esp_intr_alloc in spi_master_init_driver Apparently no free interrupt is available (?).

I have some weird behavior that stuff starts working when shuffling things around e.g. when starting the gpio isr service after setting up SPI device or by disabling initialization of either LEDC or the Ethernet (EMAC) module

Tried with ESP-IDF v4.2 and v4.3, error is the same.
The initialization functions are all called from main, not running in a separate rtos task.

What is going on here, what could be causing this? How to handle?

@espressif-bot espressif-bot added the Status: Opened Issue is new label Mar 13, 2021
@github-actions github-actions bot changed the title Adding a device to the SPI master (i.e. spi_bus_add_device()) fails on interrupt allocation Adding a device to the SPI master (i.e. spi_bus_add_device()) fails on interrupt allocation (IDFGH-4917) Mar 13, 2021
@negativekelvin
Copy link
Contributor

If possible move some of your interrupts to the other core

@KaeLL
Copy link
Contributor

KaeLL commented Mar 15, 2021

If possible move some of your interrupts to the other core

Why? And how to know if that should be done other than doing it and falling for implying causation out of the correlation because it started working? Asking for learning purposes

@igrr
Copy link
Member

igrr commented Mar 15, 2021

@eflukx If you are running out of available interrupt numbers, there are two possible solutions:

  1. Use shared interrupt for some of the peripherals you are initializing. Most peripheral drivers have an "interrupt flags" argument which you can pass when initializing the driver. For example, this is the intr_flags field in spi_bus_config_t, intr_alloc_flags argument in gpio_install_isr_service, intr_alloc_flags in ledc_fade_func_install or ledc_isr_register, etc. One of the flags you can pass there is ESP_INTR_FLAG_SHARED, see https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/intr_alloc.html#multiple-handlers-sharing-a-source. When you pass this flag when registering an interrupt for two or more drivers, they will share one CPU interrupt input, making more interrupt inputs available.
  2. As @negativekelvin has suggested, move initialization of some of the driver to another CPU, if you are using a dual-core chip. Since each CPU has its own interrupt inputs, you can register twice as many drivers.

That said, there are 12 external interrupts available per core on interrupt levels 1-3. ESP-IDF will use 3 of them by default: one for cross-core communication, one for the esp_timer, one for the Task Watchdog. Each peripheral instance you initialize (HSPI, VSPI, UART1, UART2, etc) will also require one interrupt. Based on this, if you think that the application should be using 9 interrupts or less, then the issue might be not related to running out of the interrupt numbers. One thing to check would be whether you are zero-initializing spi_bus_config_t structure before filling it and passing to the SPI driver. For further debugging it would be helpful to build the project with Debug logging enabled (in menuconfig, component config, log output) and then running the program, filtering the logs with intr_alloc tag. You can post the logs here for further analysis.

And how to know if that should be done

@KaeLL very good point, I think the drivers should be logging an error message when the interrupt allocation fails. The doxygen comments of the drivers should also be updated to indicate ESP_ERR_NOT_FOUND as being possibly caused by the lack of interrupt numbers.

@eflukx
Copy link
Author

eflukx commented Mar 16, 2021

Thanks for your helpful feedback so far!

I use several peripherals in my application (iot gateway): wifi, ethernet, spi, i2c, gpio ints, ledc and probably something i'm forgetting right now (console UART?).

It did not occur to me that the number of available interrupts (per core) are such a scarce resource. Specifically because I'm using the "high level" functions provided by the IDF which seem to mostly abstract away all these interrupt shenanigans.

Overall I'm very happy with the quality and vastness/breadth of the IDF, its documentation and available example code(!). However the complexities around interrupts seem to be a little underexposed topic, both in example code (interrupt limits are never reached) as well as in the documentation. In documentation it is mentioned that there are 32 interrupts per CPU core, @igrr you mention 12 available interrupts; are the 20 remaining interrupts reserved for specific internal cpu functions?

@igrr I will play around with the interrupt allocation flags. Do you have any recommendations how to do this (i.e. are there downsides to declaring interrupts as shared (higher latency?). Or maybe there are specific peripherals that are best left at its defaults..)

What I do not understand is how got my app working again (with all peripherals enabled) just by swapping around some initialization code. To me this does not really make sense i.e. the total number of required interrupts should be the same, right (regardless of the order of initialization)? It makes me feel a little weary when stuff "magically" starts behaving correctly after shuffling some code around.. (application seem to be working stable though!)

@negativekelvin thanks for your feedback! What would be the approach to allocate the interrupts on a specific core. The only way I can think of is to create a core-pinned task that does the initialization (and thus int allocation) of the peripherals on a specific core. Would that assumption/approach be correct?

Is there a way to list the currently allocated irq's (and peripherals/handlers) per CPU core?

@igrr
Copy link
Member

igrr commented Mar 16, 2021

how got my app working again (with all peripherals enabled) just by swapping around some initialization code

I agree, this looks pretty odd. I see two possible reasons:

  1. Somewhere in the code, an uninitialized variable is passed in place of the interrupt allocation flags, like for example in an uninitialized spi_bus_config_t structure. If the structure is declared on the stack, and not explicitly zero-initialized, then the value in that variable might be "sane" (from the perspective of the interrupt allocator) or not, depending on changes to your initialization code flow. My recommendation for enabling debug logging and filtering intr_alloc logs should help in this case.
  2. One of the peripherals is initialized from a non-pinned FreeRTOS task, and in some cases the task gets picked up by the 2nd CPU, resulting in the interrupt being allocated there. This leaves more interrupts on the 1st CPU free. Again, inspecting intr_alloc debug logs should help in this case.

As you probably see by now, I recommend inspecting intr_alloc logs at debug level in the situation when the issue is reproduced. This will help figure out the precise cause of the issue. As you noted, changes to your initialization flow make the issue disappear, so "playing around" with interrupt allocation flags may fix the issue or just mask it. For all we know, this might be a bug somewhere in IDF.

to create a core-pinned task that does the initialization (and thus int allocation) of the peripherals on a specific core.

Correct, you can use this approach to force the interrupt allocation to happen on the specific CPU.

You can also use a convenience function, esp_ipc_call_blocking to call a function on the given CPU in a high-priority task. Take care though, IPC task stack size is limited, but can be increased using CONFIG_ESP_IPC_TASK_STACK_SIZE Kconfig option.

are the 20 remaining interrupts reserved for specific internal cpu functions?

The remaining interrupts either have fixed purpose (such as CPU internal timer, software, and profiling interrupts), or are high-level interrupts (level 4, 5, 7), or a reserved for the BT and Wi-Fi stacks. You can find the textual list of interrupts here (might be slightly out of date) and the list used by the interrupt allocator here.

Is there a way to list the currently allocated irq's (and peripherals/handlers) per CPU core?

I'm afraid no, however collecting intr_alloc logs at debug level should provide the complete list of interrupts which got allocated.

@eflukx
Copy link
Author

eflukx commented Mar 17, 2021

Ok.. did some more testing: ran the app with debug logging in the working (with SPI initialization early on) and failing conditions (gist).

Somewhere in the code, an uninitialized variable is passed in place of the interrupt allocation flags,

I'm aware that not zero-initializing config structs can lead to 'interesting' behavior :) As far as I can tell I'm doing that correctly, (see SPI-init routine below).

As suggested I have tried to pin the init routines to a specific core using the esp_ipc_call_blocking helper: this worked, the problem went away!

Still the "why does stuff start to work when I swap around some code" this wasn't solved.
Then my eye caught that I was initializing the SPI bus with intr_flags set to ESP_INTR_FLAG_LEVEL1 🤔 (at that time probably thinking: "well... lowest prio is fine for me") But what I does mean (I think) is that we request an interrupt that matches this prio level 1 exactly. Setting the buscfg.intr_flags to a more lenient setting of 0 or ESP_INTR_FLAG_LOWMED allows me to initialize the SPI on CPU core 0 just fine.

Now my hypothesis is that (because of my strict ESP_INTR_FLAG_LEVEL1 requirement) initializing SPI early succeeds (as the system has a level 1 prio slot free). After that peripheral with less strict interrupt requirements can initialize fine, but not the other way around. If someone can confirm this hypothesis: I'd say: mystery solved (and I like solved -firmware- mysteries! 😄)

esp_err_t _sx12xx_spi_init(sx12xx_pin_map_t *pinmap)
{
    ESP_LOGD(TAG, "%s on core %d", __func__, xPortGetCoreID());

    spi_bus_config_t buscfg;
    spi_device_interface_config_t devcfg;

    memset(&buscfg, 0, sizeof(buscfg));
    //buscfg.intr_flags = ESP_INTR_FLAG_LEVEL1; // Lowest priority (see comments above)
    buscfg.mosi_io_num = pinmap->mosi;
    buscfg.miso_io_num = pinmap->miso;
    buscfg.sclk_io_num = pinmap->sck;
    buscfg.quadwp_io_num = -1;
    buscfg.quadhd_io_num = -1;
    ESP_ERROR_CHECK(spi_bus_initialize(HAL_SPI_HOST, &buscfg, 1)); // Init with DMA for transfers > 64 bytes

    memset(&devcfg, 0, sizeof(devcfg));
    devcfg.clock_speed_hz = 1000000;
    devcfg.mode = 0;
    devcfg.command_bits = 0;
    devcfg.address_bits = 8;
    devcfg.queue_size = 1;

    if (pinmap->cs_0 >= 0)
    {
        ESP_LOGD(TAG, "Setup SPI device for radio 0");
        devcfg.spics_io_num = pinmap->cs_0;
        ESP_ERROR_CHECK(spi_bus_add_device(HAL_SPI_HOST, &devcfg, &_spi_handles[0]));
    }

    if (pinmap->cs_1 >= 0)
    {
        ESP_LOGD(TAG, "Setup SPI device for radio 1");
        devcfg.spics_io_num = pinmap->cs_0;
        ESP_ERROR_CHECK(spi_bus_add_device(HAL_SPI_HOST, &devcfg, &_spi_handles[1]));
    }

    return ESP_OK;
}

@igrr
Copy link
Member

igrr commented Mar 17, 2021

But what I does mean (I think) is that we request an interrupt that matches this prio level 1 exactly. Setting the buscfg.intr_flags to a more lenient setting of 0 or ESP_INTR_FLAG_LOWMED allows me to initialize the SPI on CPU core 0 just fine.

Yes, that explains the issue you have observed! Good to know the mystery is solved!

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting and glad the issue resolved, feel free to reopen. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants