Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPIO Interrupts Lost (IDFGH-8258) #9746

Open
3 tasks done
pgreenland opened this issue Sep 7, 2022 · 14 comments
Open
3 tasks done

GPIO Interrupts Lost (IDFGH-8258) #9746

pgreenland opened this issue Sep 7, 2022 · 14 comments
Assignees
Labels
Status: Selected for Development Issue is selected for development Type: Bug bugs in IDF

Comments

@pgreenland
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.1-dev-644-g867745a05c

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

No response

Development Kit.

LOLIN32 v1.0.0

Power Supply used.

USB

What is the expected behavior?

I have two sensor ICs, generating edge triggered GPIO interrupts, waking their respective worker tasks.

After a random period, between minutes and hours, one of these interrupts is missed. In the case of our particular device the missed interrupt results in a FIFO overflow and no further communication from the device.

I've captured several of these events on a logic analyser, and found them to be related to the two interrupts arriving at virtually the same time (separated by microseconds).

I'd expect the ESP32 to be able to handle interrupts arriving close together.

What is the actual behavior?

Instead the earlier interrupt is handled and the slightly later interrupt is lost.

Screenshot 2022-09-07 at 16 46 39

In the screenshot above:
The blue trace "Pres ISR", signal is one of the interrupt lines.
The purple trace "IMU ISR" is the other interrupt line.

Both rising edge triggered.

The blue rising edge occurs in this case 1.55uS before the purple rising edge.

The green trace "IMU ISR -> Task" is that of a GPIO line which is set in the GPIO interrupt handler and cleared in the FreeRTOS task after the interrupt has been handled.

It remains low the entire time, indicating that the interrupt handler hasn't been called, which in turn hasn't woken the worker task.

Steps to reproduce.

I was able to reproduce this behaviour with a signal generator and the following minimal application.

main.c.zip

Providing 1Khz square waves on GPIO 23 and 27 (selected due to being the lines used in our product), synchronised but delayed with respect to one another.

In the interrupt handlers, the lower numbered GPIO (23), sets a GPIO line high and the higher numbered GPIO (27)'s handler clears the same GPIO. The expected behaviour is therefore short pulses on the GPIO line, as follows:

image

Varying the delay between the first and second interrupt and zooming out, we see periods where the second interrupt begins to be missed (indicated by the GPIO line remaining high).

image

The "sweet spot" on this particular device appears to be 2.44uS.

image

Marker 3 with a delay of 2.41uS is handled correctly, where as marker 4 with a delay of 2.44uS is consistently missed. Increasing the delay further to 2.46uS resolves the issue and both interrupts are once again seen.

The duty cycle of the first interrupt is longer to match my original hardware setup, the same behaviour occurs with similarly short duty cycles.

While debugging this issue I found changeset 0637ea9 by @songruo which I'd hoped would resolve the issue, unfortunately it hasn't.

I'm happy to help debug the issue / try and fixes / workarounds....I just really need to get it fixed.

Debug Logs.

No response

More Information.

No response

@pgreenland pgreenland added the Type: Bug bugs in IDF label Sep 7, 2022
@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 7, 2022
@github-actions github-actions bot changed the title GPIO Interrupts Lost GPIO Interrupts Lost (IDFGH-8258) Sep 7, 2022
@negativekelvin
Copy link
Contributor

Check errata 3.14 in ECO documents https://espressif.com/en/support/documents/technical-documents

@songruo
Copy link
Collaborator

songruo commented Sep 13, 2022

Hi @pgreenland,

Very nice troubleshot! Based on your troubleshot result, I do think it is due to the hardware bug with ESP32 (see the document ECO and Workarounds for Bugs in ESP32 section 3.14 mentioned by @negativekelvin).

Let me give you some more information about this bug: At the clock when the GPIO_STATUS_W1TS_REG or GPIO_STATUS_W1TC_REG is being operated, the GPIO status of the whole 32-bit register cannot be updated. Therefore, the edge-triggered interrupt will be lost if the GPIO interrupt is sampled at the same time of an interrupt status clear operation. And this is what I believe have happened at the "sweet spot" moment!

Please try to see if you can workaround this problem by changing edge-triggered GPIO interrupt to level-triggered.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Sep 13, 2022
@pgreenland
Copy link
Author

Hi @songruo and @negativekelvin ,

Thanks for the reply and explanation.

I believe the hardware bug indicated is what I've been experiencing.

I Implemented the workaround as described and can no longer see the issue with test performed above.

Considering the nature of this issue, it may be worth either fixing it in the SDK, such that all edge triggered interrupts are converted to level triggered, as per the workaround?

Or adding a warning in the documentation noting that edge triggered interrupts do not work properly. I had a quick look and couldn't see anything to suggest there is a problem with them.

I've attached the my workaround in case other users find it useful.

gpio_isr_workaround.h.txt

Which can be used as follows

static stGPIO_ISR_WORKAROUND_State_t eISRState

void SetupInterrupt()
{
	gpio_isr_handler_add(INTERRUPT_PIN_NUMBER, ExampleInterruptHandler, NULL);
	GPIO_ISR_WORKAROUND_InitAndEnableISR(&eISRState, INTERRUPT_PIN_NUMBER, false);
}

static void IRAM_ATTR ExampleInterruptHandler(void* pvArg)
{
	if (!GPIO_ISR_WORKAROUND_Process(&eISRState))
	{
		/* Workaround says interrupt hasn't really triggered */
		return;
	}

	/* Handle interrupt as required */
}

Regards,

Phil

@kriegste
Copy link

With this workaround I get a
Guru Meditation Error: Core 0 panic'ed (Cache disabled but cached memory region accessed).
in gpio_set_intr_type. Will try to investigate further.

@KaeLL
Copy link
Contributor

KaeLL commented Sep 13, 2022

Put the functions called from the ISR and eISRState in D/IRAM or don't pass the ESP_INTR_FLAG_IRAM flag to interrupt setup functions.

@pgreenland
Copy link
Author

@KaeLL Could you correct my example above?

I've not seen any issues with guru mediation myself, but any pointers on getting the interrupt handling code into the right memory would be appreciated

Thanks, Phil

@kriegste
Copy link

Espressif needs to make gpio_set_intr_type IRAM_ATTR.

@kriegste
Copy link

Or use
GPIO.pin[NUMBER].int_type = GPIO_INTR_HIGH_LEVEL;
and
GPIO.pin[NUMBER].int_type = GPIO_INTR_LOW_LEVEL;
in the ISR.

@KaeLL
Copy link
Contributor

KaeLL commented Sep 26, 2022

@songruo

  1. Why isn't the workaround suggested by the errata already in IDF?
  2. What exactly are the risks making changes to the GPIO struct directly instead of through the public APIs/gpio_spinlock?

@KaeLL
Copy link
Contributor

KaeLL commented Sep 28, 2022

Also, either the wording for workaround 1 in the errata is ambiguous, or I'm completely oblivious as to what it means.

  1. Set the GPIO interrupt type to low/high
  2. Set the interrupt trigger type of the CPU to edge.

Aren't interrupt type and trigger type the same thing, in the context of the GPIO public API or otherwise? If so, is that really suggesting to call gpio_set_intr_type twice, one after the other, the first setting the interrupt type to level and the second to edge? Seems redundant, which is why I'm wary of my interpretation.
And if they're not the same, how exactly 'interrupt type' and 'interrupt trigger type' differ?

@songruo
Copy link
Collaborator

songruo commented Sep 29, 2022

Hi @KaeLL,

The workaround suggested by the errata is more like a workaround at application level. However, I think there might be something we can add to the GPIO ISR to workaround the hardware bug by comparing the GPIO.in.val before and after the interrupt status clear operation, but concurrency needs to be handled carefully, we will investigate into this.

Regarding the difference of writing to the GPIO struct directly and calling public GPIO APIs is only the spinlock protection. GPIO driver provides this additional protection for register write process. The reason why we didn't make gpio_set_intr_type function being placed in IRAM as an option is because we don't recommend calling it from the ISR content. However, now it looks like a "must" operation to workaround the ESP32 interrupt lost bug in the application layer. We will either provide a built-in workaround in IDF or consider making gpio_set_intr_type IRAM-safe.

Lastly, the peripheral interrupt sources are mapped to the CPU interrupts (this is so-called the Interrupt Matrix, please check TRM Interrupt Matrix chapter for more information). Both the GPIO interrupt source and the CPU interrupt have the choice of their trigger types.

@tommy1394
Copy link

I have a system with 10 independent edge trigger interrupt inputs, and I have been chasing this issue for weeks. None of the other "fixes" I have found have corrected this problem. I am hopeful that this approach may. I am not specifically sure how to go about implementing it....does anyone have some more specific details?

@tommy1394
Copy link

Hi @pgreenland could this method be used for the scenario where you would like to have edge interrupt on pin state change? I need to determine the Rising and Falling edge as I am calculating duty cycle of a signal. Also, how hard would this be to integrate in into the Arduino ESP32 2.0.5 core library?

@pgreenland
Copy link
Author

Hi @tommy1394,

I believe it should do. In terms of integrating with the Arduino ESP32 core, I suspect (haven't checked) that you'd be able to do everything with API on offer.

The issue as I understand it is a race condition between the processor resetting the interrupt status, and a new edge triggered interrupt arriving, causing an edge triggered interrupt to be lost.

The workaround is to use level triggered interrupts to emulate an edge triggered interrupt.

In the Arduino environment you'd reconfigure your edge triggered interrupts as level triggered, looking for either a high or low level. On the next interrupt you swap the config to look for the opposite level.

In my case my hardware uses active low interrupts, so the interrupt line idles high. I configure my GPIO pin to interrupt on low level. When the interrupt is triggered, I've seen my falling edge. In that interrupt handler I perform the required action (schedule a task to handle the interrupt). Then reconfigure the interrupt to trigger on a high level. When the interrupt line is de-asserted by the interrupting device and returns to a high level, I get another interrupt which I ignore using a flag and again reconfigure for a low level.

Slightly annoying in my case as I'm handing two interrupts for every real one. Although it sounds like you're interested in both edges, so this wouldn't be a problem for you.

Hope that helps,

Phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Selected for Development Issue is selected for development Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

7 participants