Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling radio.available() after failed transmission when using writeFast or startWrite causes interrupt loop #929

Closed
audiophil-dev opened this issue Nov 21, 2023 · 10 comments
Labels

Comments

@audiophil-dev
Copy link

audiophil-dev commented Nov 21, 2023

I don't know if this is an expected behaviour or a library bug. When calling radio.available() after a failed transmission a new interrupt is caused, thus ending up in a never ending interrupt loop. Took me quite some time to figure out, that radio.available() is causing the extra interrupt.

This is a minimal example (based on the ping_par_irq example) which demonstrates the unexpected behaviour (using only a sender, assuming the receiver is down):

code
/*
  Copyright (C) 2011 J. Coliz <maniacbug@ymail.com>

  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU General Public License
  version 2 as published by the Free Software Foundation.

  Update 2014 - TMRh20
*/

/**
  Example of using interrupts

  This is an example of how to user interrupts to interact with the radio, and a demonstration
  of how to use them to sleep when receiving, and not miss any payloads.
  The pingpair_sleepy example expands on sleep functionality with a timed sleep option for the transmitter.
  Sleep functionality is built directly into my fork of the RF24Network library
*/

#include <SPI.h>
//#include "nRF24L01.h"
#include "RF24.h"
#include "printf.h"

// Hardware configuration
RF24 radio(7, 8);  // Set up nRF24L01 radio on SPI bus plus pins 7 & 8
//RF24 radio = RF24(CE_PIN, CSN_PIN);

// #define CSN_PIN 9
// #define MOSI_PIN 11
// #define MISO_PIN 12
// #define SCK_PIN 13
// #define CE_PIN 14
// #define IRQ_PIN 16


// Our ACK payload will simply be 4 bytes containing the number of payloads received
static uint32_t message_count = 1;  // start counting at 1

// Demonstrates another method of setting up the addresses
byte address[][5] = { 0xCC, 0xCE, 0xCC, 0xCE, 0xCC, 0xCE, 0xCC, 0xCE, 0xCC, 0xCE };

/************************* Role management ****************************/
// Set up role.  This sketch uses the same software for all the nodes in this
// system.  Doing so greatly simplifies testing.

typedef enum { role_sender = 1,
               role_receiver } role_e;                                   // The various roles supported by this sketch
const char* role_friendly_name[] = { "invalid", "Sender", "Receiver" };  // The debug-friendly names of those roles
role_e role = role_sender;                                                             // The role of the current running sketch

void setup() {

  delay(20);                     // Just to get a solid reading on the role pin

  Serial.begin(115200);
  printf_begin();  // needed for printDetails()

  // print introduction
  Serial.print(F("\n\rRF24/examples/pingpair_irq\n\rROLE: "));
  Serial.println(role_friendly_name[role]);


  /********************** Setup and configure rf radio *********************/
  radio.begin();

  // Examples are usually run with both radios in close proximity to each other
  radio.setPALevel(RF24_PA_LOW);  // defaults to RF24_PA_MAX
  radio.enableAckPayload();       // We will be using the ACK Payload feature which is not enabled by default
  radio.enableDynamicPayloads();  // Ack payloads are dynamic payloads
  radio.setRetries(1, 3);

  // Open a writing and reading pipe on each radio, with opposite addresses
  if (role == role_sender) {
    radio.openWritingPipe(address[0]);
    radio.openReadingPipe(1, address[1]);
  } else {
    radio.openWritingPipe(address[1]);
    radio.openReadingPipe(1, address[0]);
    radio.startListening();  // First we need to start listening

    // Add an ACK payload for the first time around; 1 is the pipe number to acknowledge
    radio.writeAckPayload(1, &message_count, sizeof(message_count));
    ++message_count;  // increment counter by 1 for next ACK payload
  }

  radio.printDetails();  // Dump the configuration of the rf unit for debugging
  delay(50);

  // Attach interrupt handler to interrupt #0 (using pin 2) on BOTH the sender and receiver
  attachInterrupt(digitalPinToInterrupt(IRQ_PIN), check_radio, LOW);

  // send one package for test purpused
  unsigned long time = millis();  // Take the time
  Serial.print(F("Now sending "));
  Serial.println(time);
  radio.write(&time, sizeof(unsigned long), 0);  // Send the time

}  // setup


void loop() {

  /****************** Ping Out Role ***************************/
  if (role == role_sender) {
    // Repeatedly send the current time
    //unsigned long time = millis();  // Take the time
    // Serial.print(F("Now sending "));
    // Serial.println(time);
    //radio.startWrite(&time, sizeof(unsigned long), 0);  // Send the time
    //delay(2000);                                        // Try again soon (in 2 seconds)
  }


  /****************** Pong Back Role ***************************/
  // Receiver does nothing!  All the work is in Interrupt Handler

  if (role == role_receiver) {}

}  // loop


/********************** Interrupt Handler *********************/

void check_radio(void) {

  bool tx, fail, rx;                 // declare variables to store IRQ flags
  radio.whatHappened(tx, fail, rx);  // What happened?

  Serial.println("Interrupt");

  if (tx) {  // Have we successfully transmitted?
    if (role == role_sender)
      Serial.println(F("Send:OK"));
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Sent"));
  }

  if (fail) {  // Have we failed to transmit?
    if (role == role_sender)
      Serial.println(F("Send:Failed"));
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Failed"));
  }

  if (rx || radio.available()) {  // Did we receive a message?

    /**************** Ping Out Role (about received ACK payload) ************************/
    // If we're the sender, we've received an ack payload
    if (role == role_sender) {
      // Get the payload and dump it
      radio.read(&message_count, sizeof(message_count));
      Serial.print(F("Ack: "));
      Serial.println(message_count);
    }


    /****************** Pong Back Role ***************************/
    // If we're the receiver, we've received a time message
    if (role == role_receiver) {
      // Get the payload and dump it

      static unsigned long got_time;            // variable to hold the received time
      radio.read(&got_time, sizeof(got_time));  // get the payload
      Serial.print(F("Got payload "));
      Serial.println(got_time);

      // Add an ACK payload for the next time around; 1 is the pipe number to acknowledge
      radio.writeAckPayload(1, &message_count, sizeof(message_count));
      ++message_count;  // increment packet counter
    }
  }
}  // check_radio 

If you comment out radio.available, no interrupt loop is caused.

@TMRh20
Copy link
Member

TMRh20 commented Nov 21, 2023

I'm not exactly sure, I kind of think this is 'undefined' behaviour, because you are using ack payloads, but not keeping the chip in the right mode.

Technically, you should be calling radio.txStandBy() at some point, if using these methods of writing with ack-payloads.

  if (tx) {  // Have we successfully transmitted?
    if (role == role_sender){
      Serial.println(F("Send:OK"));
      radio.txStandBy();
    }
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Sent"));
  }

  if (fail) {  // Have we failed to transmit?
    if (role == role_sender){
      Serial.println(F("Send:Failed"));
      radio.txStandBy();
    }
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Failed"));
  }

Does adjusting this code resolve the issue?

@2bndy5
Copy link
Member

2bndy5 commented Nov 21, 2023

I don't see how calling available() could cause an extra interrupt. It only reads from registers, and you can't write to the IRQ flags (in the radio's STATUS register) that whatHappened() returns.

I would suggest using volatile bool states and handling their changes in the main loop() instead of the ISR (check_radio()) because ... per Arduino's attachinterrupt() docs:

Generally, an ISR should be as short and fast as possible. If your sketch uses multiple ISRs, only one can run at a time, other interrupts will be executed after the current one finishes in an order that depends on the priority they have. millis() relies on interrupts to count, so it will never increment inside an ISR. Since delay() requires interrupts to work, it will not work if called inside an ISR. micros() works initially but will start behaving erratically after 1-2 ms. delayMicroseconds() does not use any counter, so it will work as normal.

Calling txStandby() within the ISR will greatly increase the risk of violating the millis() concern because it uses millis() to measure the standby time.

Due to the delayMicroseconds(5) call twice per SPI transaction (when the CSN is toggled on non-Arduino platforms),

  • whatHappened() can incur 10 µs delay because it does 1 SPI transaction
  • available(void) will toggle the CSN pin 2x (10 µs)
  • available(&pipe_num) will toggle the CSN 4x (20 µs)
  • writeAckPayload() will toggle CSN 2x (10 µs) and now (since v1.3.11) returns the TX_FULL flag in the STATUS register.

This is a minimal example (based on the ping_par_irq example) which demonstrates the unexpected behavior (using only a sender, assuming the receiver is down):

The pingpair_irq example is a relic example from back when maniacbug was still involved in this library (10+ years ago). I haven't actually ran tests on those examples (now located in our examples/old_backups folder) since v1.3.11 when I rewrote the examples for simplicity (among other reasons).

@2bndy5
Copy link
Member

2bndy5 commented Nov 21, 2023

Serial.print() tends to take a lot of time when output to the Serial Monitor, especially if the board uses a chip that doesn't natively support USB (including but not limited to Arduino Nano/Uno and ESP8266 boards). I should revise the newer InterruptConfigure.ino to adhere to Arduino's "short and fast as possible" advice.

@audiophil-dev
Copy link
Author

audiophil-dev commented Nov 21, 2023

I'm not exactly sure, I kind of think this is 'undefined' behaviour, because you are using ack payloads, but not keeping the chip in the right mode.

Technically, you should be calling radio.txStandBy() at some point, if using these methods of writing with ack-payloads.

  if (tx) {  // Have we successfully transmitted?
    if (role == role_sender){
      Serial.println(F("Send:OK"));
      radio.txStandBy();
    }
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Sent"));
  }

  if (fail) {  // Have we failed to transmit?
    if (role == role_sender){
      Serial.println(F("Send:Failed"));
      radio.txStandBy();
    }
    if (role == role_receiver)
      Serial.println(F("Ack Payload:Failed"));
  }

Does adjusting this code resolve the issue?

Thank you, this actually solves the problem of the interrupt loop but as @2bndy5 pointed out this might be problematic, too.

EDIT: actually adding more code brought back the problem. Still, I think your answer goes in the right direction, since disabling AutoAck seems to prevent going into the undefined state. I guess adding radio.txStandBy() just changed the timing in the IRQ a bit so that the code did not get stuck in the interrupt loop anymore...

@audiophil-dev
Copy link
Author

audiophil-dev commented Nov 21, 2023

I don't see how calling available() could cause an extra interrupt. It only reads from registers, and you can't write to the IRQ flags (in the radio's STATUS register) that whatHappened() returns.

I also have no clue, why exactly the radio.available() triggers this behaviour, but if you try the code you could verify it.

EDIT:

Actually you are right. It seems that it was just a coincidence. I tested again in a more elaborated example and leaving out radio.available() does not seem to change the behaviour. I have no clue what else could cause the interrupt loop. So far, the only way to prevent it reliable seems to be disabling AutoAck.

@audiophil-dev
Copy link
Author

I would suggest using volatile bool states and handling their changes in the main loop() instead of the ISR (check_radio()) because ... per Arduino's attachinterrupt() docs:

Generally, an ISR should be as short and fast as possible.

I know, I will try your proposal. I used interrupts because from another application where I streamed audio, I remember this was the only way of detecting fast enough without calling available() every loop cycle and thus introducing too much delay. Could you give a hint how to notice the arrival of a package without using an interrupt and without calling functions which introduce a delay?

@audiophil-dev
Copy link
Author

Serial.print() tends to take a lot of time when output to the Serial Monitor, especially if the board uses a chip that doesn't natively support USB (including but not limited to Arduino Nano/Uno and ESP8266 boards)

I know, I only put it for demonstrating reasons, I don't use it in production code. During debugging on a teensy 4.0 it did not cause troubles so far (probably because of native USB support).

@2bndy5
Copy link
Member

2bndy5 commented Nov 21, 2023

It honestly sounds like you are causing undefined behavior by doing too much in the ISR. Trying to point blame at a specific function will be little more than guesses that yield nothing conclusive.

It helps to know you're using a teensy board. According to their docs:

The simplest and most common strategy is to keep all interrupt service routines short and simple, so they execute quickly, and to minimize time the main program disables interrupts. Virtually all examples follow this model.

When the hardware calls an interrupt service routine, it clears the global interrupt flag, so that no other interrupt routine may be called. The return from an interrupt service routine automatically reenables interrupts, and if any other interrupt flags are set, the hardware will call the next pending interrupt routine rather than returning to the main program.

I strongly suggest you try the volatile bool strategy instead making the ISR more and more complex.

@audiophil-dev
Copy link
Author

It honestly sounds like you are causing undefined behavior by doing too much in the ISR. Trying to point blame at a specific function will be little more than guesses that yield nothing conclusive.

I did not want to blame any function, I just wanted to understand what's happening. Honestly, I never experienced an extra interrupt being triggered because I was doing a single Serial.print() in the ISR.

It helps to know you're using a teensy board. According to their docs:

The simplest and most common strategy is to keep all interrupt service routines short and simple, so they execute quickly, and to minimize time the main program disables interrupts. Virtually all examples follow this model.
When the hardware calls an interrupt service routine, it clears the global interrupt flag, so that no other interrupt routine may be called. The return from an interrupt service routine automatically reenables interrupts, and if any other interrupt flags are set, the hardware will call the next pending interrupt routine rather than returning to the main program.

I strongly suggest you try the volatile bool strategy instead making the ISR more and more complex.

I get your point. I read your first answer as "use volatile bool instead of the ISR". After having a closer look at InterruptConfigure.ino I understand what you meant. I will close the issue and refactor my code to use the 'volatile bool' strategy. Sorry for the noise...

@2bndy5
Copy link
Member

2bndy5 commented Nov 22, 2023

There is a good reason why ISRs should be fast and simple. Looking back on my Assembly language class, I suspect a complex ISR might cause stack overflows which would explain the undefined behavior. To fully understand what that means would take considerable knowledge about how the compiler allocates memory for functions and how functions (in binary) are executed at runtime (and that's just focused on global functions - methods scoped to a data structure have more implications).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants