Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird and unstable behavior of python wrapper #101

Closed
kormitigrov opened this issue Apr 15, 2015 · 19 comments
Closed

Weird and unstable behavior of python wrapper #101

kormitigrov opened this issue Apr 15, 2015 · 19 comments
Labels

Comments

@kormitigrov
Copy link

I've come across a weird feature of python wrapper for the library several times already. It seams to be dependent of non-radio code in completely different parts of the script. I'm unable to find out if it is only my problem or if it persists for anybody else.

Using your code I've made another pair of ping-pong scripts for Arduino Nano and Raspberry Pi 2 exchanging current time with contant-size payload. Both the Arduino and Raspberry can be pingers and pongers and talk to each other, the scripts are compatible protocol-wise (exchanging 4 bytes of arduino's unsigned long).

The scripts I'm attaching work fine for me, for Raspberry being the pinger and Arduino Nano being the ponger. They both send data, receive acks successfully, then reply back with data, and again receive acks well.

But enabling the print 'aaa' line in Raspberry script in my case breaks the normal flow of acks for the scripts, even though the script was never meant to enter that part if it is pinger! I don't see any 'aaa' on the screen, but acks to the writes in pinger-part no longer arraive. If I comment that line again, everything works 100%.

# role pong
else:
    if radio.available():
        # why enabling this line breaks everything???
        # print 'aaa'

        # while radio.available():
        data = radio.read(4)

How can this be?

In another part of the code I've at some point found I had to access radio.payloadSize in order for the following 'read' to actually read anything. In the end I had to add that access into loop condition, and even then its only stable 100% if you do not touch the code. Change anything - it might right away become 100% unstable.

    # no data - bad luck
    if timeout:
        print 'Timed out while waiting for response'
    else:
        # print radio.payloadSize, 'available'

        # for some reason, this line is required,
        # remove it, and no data will arrive
        # radio.payloadSize

        # read all available payload
        while radio.available() and radio.payloadSize > 0:
            data = radio.read(4)

I could imagine there might be an explanation for access radio.payloadSize, but that previous breaking print 'aaa' evades me completely.

attachment: pingpong_recv.py

import RPi.GPIO as GPIO
from RF24 import *
import time
import sys

# output usage and current settings
print 'usage: pingpong_recv.py [ping | pong]'
role = 1 # pong by default
if len(sys.argv) > 1 and sys.argv[1] == 'ping':
    role = 0
print 'role selected: %s' % ('ping' if role == 0 else 'pong')

# initialize radio on Raspberry PI 2 pins 15 (GPIO22) and 24 (GPIO8)
# with wiring according to table on http://tmrh20.github.io/RF24/
radio = RF24(RPI_V2_GPIO_P1_15, RPI_V2_GPIO_P1_24, BCM2835_SPI_SPEED_8MHZ)

# LED pin to blink on sending and receiving (wiring is optional)
LEDPIN = 21

# init GPIO
GPIO.setwarnings(False)
GPIO.cleanup()
GPIO.setmode(GPIO.BCM)
GPIO.setup(LEDPIN, GPIO.OUT)

# radio addresses for both nodes
pipes = [b'1Node', b'2Node']

# init radio
radio.begin()
radio.setRetries(5,15)

# setup role pong
if role == 1: 
    radio.openWritingPipe(pipes[1])
    radio.openReadingPipe(1,pipes[0])
    radio.startListening()
# setup role ping
else:         
    radio.openWritingPipe(pipes[0])
    radio.openReadingPipe(1,pipes[1])

# print debug radio output
radio.printDetails()

# packets sent
cnt_pack = 0
# acks received
cnt_acks = 0
# response received
cnt_resp = 0
# total delivery time
sum_deliv = 0

# compatibility function for millis
t = time.time();
millis = lambda: int(round((time.time() - t) * 1000));

# main cycle
while True:

    # ping
    if role == 0: 

        # stop listening and prepare tp send
        radio.stopListening()

        # stat
        cnt_pack += 1

        # get milliseconds value
        value = millis()
        # encode into byte array, with byte order compatible with
        # arduino's radio.read(&got_time, sizeof(unsigned long))
        data = str(bytearray.fromhex('{:08x}'.format(value)))[::-1]

        # write 4 bytes with acknoledgement
        if radio.write(data[:4]):
            # if ack was received, count and blink
            print 'Ping at time %d delivered' % value
            cnt_acks += 1

            GPIO.output(LEDPIN,True)
            time.sleep(0.01)
            GPIO.output(LEDPIN,False)            
        else:
            # otherwise do not blink
            print 'Ping at time %d NOT delivered' % value

        # time to wait for reply
        radio.startListening()

        # wait until radio.available() or timeout
        started_waiting = millis()
        timeout = False
        while (not radio.available()) and (not timeout):
            if millis() - started_waiting > 500:
                timeout = True

        # no data - bad luck
        if timeout:
            print 'Timed out while waiting for response'
        else:
            # print radio.payloadSize, 'available'

            # for some reason, this line is required,
            # remove it, and no data will arrive
            # radio.payloadSize

            # read all available payload
            while radio.available() and radio.payloadSize > 0:
                data = radio.read(4)

            # stat
            cnt_resp += 1

            # print data
            # print ':'.join(x.encode('hex') for x in data)

            # convert byte string back to long, with respect to byte order
            value = int(''.join(x.encode('hex') for x in reversed(data)), 16)
            print 'Response %d acquired at %d ms roundtrip time' % (value, millis() - value)

            # stat
            sum_deliv += millis() - value

            # blink on successful reception
            GPIO.output(LEDPIN,True)
            time.sleep(0.01)
            GPIO.output(LEDPIN,False)

        print 'STAT: %d packets sent, %d%% acks, %d%% responses, %d avg time' \
                % (cnt_pack, cnt_acks*100/cnt_pack, cnt_resp*100/cnt_pack, sum_deliv/(cnt_resp+0.01))

        time.sleep(1)

    # role pong
    else:

        if radio.available():
            # print radio.payloadSize, 'available'

            # why enabling this line breaks everything???
            # print 'aaa'

            # while radio.available():
            data = radio.read(4)

            # print data
            # print ':'.join(x.encode('hex') for x in data)

            # convert byte string back to long, with respect to byte order
            value = int(''.join(x.encode('hex') for x in reversed(data)), 16)
            print 'Got request %d' % value

            # blink, as we have received
            GPIO.output(LEDPIN,True)
            time.sleep(0.01)
            GPIO.output(LEDPIN,False)

            # stop listening to write back
            radio.stopListening()

            # write back 4 bytes
            if radio.write(data[:4]):
                print 'Returned the answer'

                # blink after successful answer
                GPIO.output(LEDPIN,True)
                time.sleep(0.01)
                GPIO.output(LEDPIN,False)
            else:
                print 'Answer NOT delivered'

            # return to listening
            radio.startListening()

attachment: sketch_rf24pingpong.ino

#include <SPI.h>
#include "RF24.h"
#include "printf.h"

// On Receiver Blink PIN 1
#define BLPIN1 5
// On Receiver Blink PIN 2
#define BLPIN2 4
// Role selector PIN (leave open = pong, connect to gnd = ping)
#define ROLEPIN 6

RF24 radio(9,10);

// 5-byte addresses of two radios
byte addresses[][6] = {"1Node","2Node"};

// role of this one
int role;

void setup() {
  Serial.begin(57600);
  printf_begin();

  // set blinking pin
  pinMode(BLPIN1, OUTPUT);
  pinMode(BLPIN2, OUTPUT);

  // set role pin
  pinMode(ROLEPIN, INPUT_PULLUP);
  // wait a bit
  delay(20);
  // read state
  role = digitalRead(ROLEPIN);

  printf("\n\rRF24 ping-pong example\n\r");
  printf("Role = %d (%s)\n\r\n\r", role, (role ? "pong" : "ping"));

  radio.begin();
  // Set the PA Level low to prevent power supply related issues since this is a
  // getting_started sketch, and the likelihood of close proximity of the devices. RF24_PA_MAX is default.
  //radio.setPALevel(RF24_PA_LOW);

  // Open a writing and reading pipe on each radio, with opposite addresses
  if (role)
  {
    radio.openWritingPipe(addresses[1]);
    radio.openReadingPipe(1,addresses[0]);
  }
  else
  {
    radio.openWritingPipe(addresses[0]);
    radio.openReadingPipe(1,addresses[1]);
  }

  radio.startListening(); 

  radio.printDetails();
}

// packets sent
unsigned long cnt_pack = 0;
// acks received
unsigned long cnt_acks = 0;
// response received
unsigned long cnt_resp = 0;
// total delivery time
unsigned long sum_deliv = 0;

void loop() {

  // ping role
  if (role == 0)  {

    radio.stopListening();                                    // First, stop listening so we can talk.

    cnt_pack++;
    unsigned long time = millis();                             // Take the time, and send it.  This will block until complete
    if (radio.write(&time, sizeof(unsigned long)))
    {
      printf("Ping at time %lu delivered\n\r", time);
      cnt_acks++;
      digitalWrite(BLPIN1, HIGH); 
      delay(10);
      digitalWrite(BLPIN1, LOW);
    }
    else
    {
      printf("Ping at time %lu NOT delivered\n\r", time);
    }

    radio.startListening();                                    // Now, continue listening

    unsigned long started_waiting_at = millis();               // Set up a timeout period, get the current milliseconds
    boolean timeout = false;                                   // Set up a variable to indicate if a response was received or not

    while ( ! radio.available() ){                             // While nothing is received
      if (millis() - started_waiting_at > 200 ){            // If waited longer than 200ms, indicate timeout and exit while loop
        timeout = true;
        break;
      }      
    }

    if ( timeout )                                             // Describe the results
      printf("Timed out while waiting for response\n\r");
    else
    {
      unsigned long got_time;                                  // Grab the response, compare, and send to debugging spew
      while (radio.available())
        radio.read( &got_time, sizeof(unsigned long) );

      cnt_resp++;
      unsigned long time = millis();

      printf("Response %lu acquired at %lu ms roundtrip time\n\r", got_time, time-got_time);
      sum_deliv += time-got_time;
      digitalWrite(BLPIN2, HIGH);  
      delay(10);
      digitalWrite(BLPIN2, LOW);
    }

    printf("STAT: %lu packets sent, %lu%% acks, %lu%% responses, %lu avg time \n\r", cnt_pack, cnt_acks*100/cnt_pack, cnt_resp*100/cnt_pack, sum_deliv/cnt_resp);

    // Try again 1s later
    delay(1000);
  }


  // pong role
  if ( role == 1 )
  {
    unsigned long got_time;

    if (radio.available())
    {
                                                                  // Variable for the received timestamp
      while (radio.available()) {                                   // While there is data ready
        radio.read( &got_time, sizeof(unsigned long) );               // Get the payload
      }
      printf("Got request %lu\n\r", got_time);
      digitalWrite(BLPIN1, HIGH); 
      delay(10);
      digitalWrite(BLPIN1, LOW);

      radio.stopListening();                                        // First, stop listening so we can talk   
      if (radio.write(&got_time, sizeof(unsigned long)))            // Send the final one back.      
      {
        printf("Returned the answer\n\r");
        digitalWrite(BLPIN2, HIGH); 
        delay(10);
        digitalWrite(BLPIN2, LOW);
      }
      else
      {
        printf("Answer NOT delivered\n\r");
      }
      radio.startListening();                                       // Now, resume listening so we catch the next packets.     
    }
  }

}
@TMRh20
Copy link
Member

TMRh20 commented Apr 15, 2015

Thanks for posting this here. As I mentioned I'm getting similar results, but I've looked at it and am not sure as to the cause.
This behaviour seems specific to the python wrapper, so hopefully somebody like @mz-fuzzy can add some more info.

@martin-mat
Copy link
Collaborator

yep, I've registered the issue and planning to check that... as soon as I find some time, so be patient please.

@martin-mat
Copy link
Collaborator

ok, I was successful with reproducing the issue. For me, 1st loop with radio.write succeeds, ack is received, but subsequent loops fail with MAX_RT set (verified by debugging printouts) and write method returns 0. I did not see this behavior in the same code in C, however python wrapper just returns what it gets from RF24's write. I have checked that write takes right params from the wrapper and also wrapper takes the return value correctly. It's either a dynamic memory bug or a timing bug somewhere and such are hard to catch :-(
@TMRh20 any hint what to try? Maybe you can check what's going on inside the write method in case of this error, you know better what's inside.

@martin-mat
Copy link
Collaborator

not sure if this has something to do with the Python issue, but this is what I got, playing with @kormitigrov 's scenario; this C code

#include <RF24/RF24.h>

using namespace std;

RF24 radio(RPI_BPLUS_GPIO_J8_22,RPI_BPLUS_GPIO_J8_24, BCM2835_SPI_SPEED_8MHZ);
unsigned char addresses[][6] = {"1Node","2Node"};

int main(int argc, char** argv)
{
  radio.begin();

  // Open a writing and reading pipe on each radio, with opposite addresses
  radio.openWritingPipe(addresses[0]);
  radio.openReadingPipe(1,addresses[1]);

  radio.printDetails();

  while(1)
  {
        radio.stopListening();                                    // First, stop listening so we can talk.

        unsigned long time = millis();                             // Take the time, and send it.  This will block until complete
        radio.printDetails();
        if (radio.write(&time, sizeof(unsigned long)))
          printf("Ping at time %lu delivered\n\r", time);
        else
          printf("Ping at time %lu NOT delivered\n\r", time);

        radio.startListening();                                    // Now, continue listening

        unsigned long started_waiting_at = millis();               // Set up a timeout period, get the current milliseconds
        int timeout = 0;                                   // Set up a variable to indicate if a response was received or not

        while ( ! radio.available() ){                             // While nothing is received
          if (millis() - started_waiting_at > 200 ){            // If waited longer than 200ms, indicate timeout and exit while loop
            timeout = 1;
            break;
          }      
        }

        if ( timeout )                                             // Describe the results
          printf("Timed out while waiting for response\n\r");
        else
        {
          unsigned long got_time;                                  // Grab the response, compare, and send to debugging spew
          while (radio.available())
            radio.read( &got_time, sizeof(unsigned long) );
          printf("Got response: %lu\n", got_time);
        }

        // Try again 1s later
        delay(1000);
    }
}

produces segfault in 2nd loop run after receiving 1st packet from Arduino successfully. Backtrace:

Program received signal SIGSEGV, Segmentation fault.
strlen () at ../ports/sysdeps/arm/strlen.S:29
29      ../ports/sysdeps/arm/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../ports/sysdeps/arm/strlen.S:29
#1  0xb6d55358 in _IO_vfprintf_internal (s=<optimized out>, format=<optimized out>, ap=...) at vfprintf.c:1629
#2  0xb6d5b9c0 in __printf (format=0xb6fb5ea0 "Model\t\t = %s\r\n") at printf.c:35
#3  0xb6fb3960 in RF24::printDetails() () from /usr/local/lib/librf24-bcm.so.1
#4  0x00008908 in main ()
(gdb) 

still checking, but that seems to be really weird :-(

@kormitigrov
Copy link
Author

Yes, I've also noticed that behavior, when the first ping packet is delivered with ack returned, but subsequent fail. But I haven't tried C code on RPi2. The code I used was for Arduino, it might compile differently for RPi2. For instance, unsigned long mean 4 bytes for Arduino.

@kormitigrov
Copy link
Author

Taking your c++ code, i don't get segfault. If i comment out your second printDetails, I get the first ping delivered with ack and answer received, but then: RF24 HARDWARE ERROR

TMRh20 added a commit that referenced this issue Apr 18, 2015
Fix over reading a buffer by 1 byte

#101
@TMRh20
Copy link
Member

TMRh20 commented Apr 18, 2015

Technically, the last commit comment should say issue with printdetails.. blah blah. since the python issue still seems to be happening.

In any case, this was interesting. I found the problem because p_variant was returning a value of 255 after the radio.read() function was called. Taking a closer look at read_payload, it turns out a buffer was being over-read by a single byte, and I guess the memory address was that of p_variant?

The printDetails issue on RPi should be fixed in the updates branch, but the python issues are something else it seems. I've replicated the python issue using BCM and SPIDEV on RPi, and no amount of timing seems to do anything useful.

I've gotten down to one scenario where a delay after radio.available returning true, and before radio.read, where it prevents a successful write, which makes no sense at all. I notice the wrapper uses the -pthread option when compiling, (not very familiar with python) but if it is dong some stuff with threading, it could have something to do with why the timing is all over the place, but thats just a shot-in-the-dark if you will. From all appearances, the radio is just refusing to respond correctly???

@TMRh20
Copy link
Member

TMRh20 commented Apr 18, 2015

Related, should have been caught sooner?: #46

@martin-mat
Copy link
Collaborator

additional observations:

  • removing pthread from the wrapper does not help
  • watching nrf status during the test, it seems that together with ack stopping working, RX_ADDR_P0 changes to a different/random value after radio.startListening() call. After digging deeper into the restoring sequence it seems that pipe0_reading_address is overwritten somewhere.

now just to identify where it is overwritten... what about write_payload, doesn't this suffer from the same issue as read_payload?

@martin-mat
Copy link
Collaborator

I think I got it: if listening pipe 0 is not used, then pipe0_reading_address is left uninitialized. In some circumstances (for example in Python wrapper) it's not initialized to zeros and in startListening it's restored to a random value. Initializing pipe0_reading_address in constructor to zeros should solve it, at least for me it seems to solve it.

@kormitigrov
Copy link
Author

So, eh, should I checkout another version to try it out myself?

@martin-mat
Copy link
Collaborator

Fix is not commited yet. To verify, add this line
pipe0_reading_address[0]=0;
into RF24 constructor (just a provisional fix):

/****************************************************************************/

#if defined (RF24_LINUX) && !defined (MRAA)//RPi constructor
RF24::RF24(uint8_t _cepin, uint8_t _cspin, uint32_t _spi_speed):
  ce_pin(_cepin),csn_pin(_cspin),spi_speed(_spi_speed),p_variant(false), payload_size(32), dynamic_payloads_enabled(false),addr_width(5)//,pipe0_reading_address(0)
{
   pipe0_reading_address[0]=0;
}
#endif

/****************************************************************************/

re-compile RF24 lib, install, try your scenarios and report back.

@kormitigrov
Copy link
Author

I've changed both constructors i've found in RF24.cpp, right?

Well, it seems to solve it, indeed! My python scripts mentioned earlier work beautifully, and including random pieces of code does not seem to break it so far! I've tried both the pinger- and the ponger- sides of the python script - no problem so far.

However, your c code you mentioned recently does not seem to work. I've changed back the initialization to my setup (radio = RF24(RPI_V2_GPIO_P1_15, RPI_V2_GPIO_P1_24, BCM2835_SPI_SPEED_8MHZ)), and after compiling and running it, i get:

printDetails output complete up to PA Power
again printDetails output complete up to PA Power
Ping at time 116 delivered
Got response: 116
again printDetails output BUT INcomplete only up to Data Rate line, and nothing after

the executable silently end there, with no error reported back to the command line.

@martin-mat
Copy link
Collaborator

for fixing the C code you need commit mentioned earlier by @TMRh20. Or remove printDetails :-) However I'm not sure whether my C code works fine, it was intended just for debugging.

@kormitigrov
Copy link
Author

You're right! Made that --size modification, it works here as well. Any chance of seeing this committed to master soon? :)

@martin-mat
Copy link
Collaborator

That's a question to @TMRh20. My opinion is that both fixes should go immediately to master as the bugs may anytime break functionality unexpectably.

TMRh20 added a commit that referenced this issue Apr 19, 2015
Per #101 (comment)
initialize pipe0_reading_address[0] to address "Weird and unstable
behavior of python wrapper" identified by @kormitigrov   Resolution/Root
Cause identified by mz-fuzzy
@TMRh20
Copy link
Member

TMRh20 commented Apr 19, 2015

Updated and merged into master.

Really nice troubleshooting in identifying this. Thank you both for the detailed trouble descriptions with code examples, as well as testing etc. etc.

Initial results seem to confirm a full resolution. Although I haven't done extensive testing yet, I pushed both fixes to Updates and master branches because they address such show-stopping problems. The merge also includes code for the SPI transaction API introduced in Arduino IDE 1.61 or so.

@TMRh20
Copy link
Member

TMRh20 commented Apr 20, 2015

Question for @mz-fuzzy - You might have noticed the recent integration with SPIDEV and MRAA for general Linux support. Included with that is the ability to specify pin numbers using a simpler constructor: RF24 radio(22,0); for example is pin15 (gpio_22) using CE0. RF24 radio(22,1); would use CE1, so the constructor is more consistent across Linux devices.

With the python wrapper, in order to use it once compiled with RF24+SPIDEV I had to comment out all of the BCM2835 stuff, but it worked fine. (compile with make install RF24_SPIDEV=1 )

So my question is, is there a way to keep the BCM code in while allowing use with SPIDEV and/or MRAA? Else, what are your thoughts on removing the BCM specific code and sticking to the generic Linux constructor for the python wrapper? Else is there a better way to address this?

@martin-mat
Copy link
Collaborator

What comes to my mind first is to arrange it the same way as in RF24 lib: to use a compilation parameter, and depending on that include BCM2835 or not. Removing it completely is also fine, I guess it's also a way for RF24's future, right? If there is a working generic implementation, why to keep RPi specific approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants