Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armbian kernel 4.14 rc7 banana pi m2 #1

Closed
sergey-suloev opened this issue Oct 30, 2017 · 26 comments
Closed

Armbian kernel 4.14 rc7 banana pi m2 #1

sergey-suloev opened this issue Oct 30, 2017 · 26 comments

Comments

@sergey-suloev
Copy link
Contributor

sergey-suloev commented Oct 30, 2017

Hello,
I am testing your driver on my bananapi m2 Armbian kernel 4.14. It seems to load fine and I can see 5 devices in /dev/*. But dmesg shows a strange error, see below. My question: do you have any raspberry-specific code in your driver ? Any ideas why does this error happen ?

dmesg output

[ 7.943861] nrf24: loading out-of-tree module taints kernel.
[ 7.948638] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(0)
[ 7.948844] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(1)
[ 7.949005] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(2)
[ 7.949189] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(3)
[ 7.949345] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(4)
[ 7.949565] nrf24 nrf0: nrf24_create_device: device created: major(245), minor(5)
[ 7.956710] Unable to handle kernel NULL pointer dereference at virtual address 0000004d
[ 7.956763] nrf24 nrf0: nrf24_tx_thread: waiting for new messages
[ 7.966041] pgd = c0004000
[ 7.968749] [0000004d] *pgd=00000000
[ 7.972345] Internal error: Oops: 5 [#1] SMP THUMB2
[ 7.977220] Modules linked in: sun6i_drc nrf24(O) sun4i_codec snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer sun4i_ts snd soundcore sun4i_gpadc sun4i_ss sun4i_tcon uio_pdrv_genirq cpufreq_dt uio thermal_sys bonding brcmfmac brcmutil cfg80211 rfkill ip_tables x_tables i2c_sun6i_p2wi pwrseq_simple realtek
[ 8.004281] CPU: 3 PID: 379 Comm: nrf0_rx_thread Tainted: G O 4.14.0-rc7-sunxi #3
[ 8.012889] Hardware name: Allwinner sun6i (A31) Family
[ 8.018112] task: ee3da700 task.stack: ed89e000
[ 8.022642] PC is at __dev_printk+0x10/0x60
[ 8.026822] LR is at dev_printk+0x21/0x2c
[ 8.030825] pc : [] lr : [] psr: 20010033
[ 8.037082] sp : ed89fee0 ip : ed89e03c fp : 00000000
[ 8.042299] r10: ef401860 r9 : ef3f1860 r8 : 00000000
[ 8.047515] r7 : ed8f0218 r6 : ed8f0200 r5 : bfa3d438 r4 : ed89ff00
[ 8.054032] r3 : ed89ff1c r2 : ed89ff00 r1 : 00000001 r0 : bfa3d438
[ 8.060551] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
[ 8.067849] Control: 50c5387d Table: 6e21806a DAC: 00000051
[ 8.073587] Process nrf0_rx_thread (pid: 379, stack limit = 0xed89e210)
[ 8.080190] Stack: (0xed89fee0 to 0xed8a0000)
[ 8.084546] fee0: ed8e0000 00600011 00000006 bfa3d654 ed89fefc c0545e29 ee2fbca8 ed89ff1c
[ 8.092715] ff00: bfa3d654 ed89fefc ed8e0000 ed8e0000 00000000 bfa3a93b bfa3d654 00000000
[ 8.100883] ff20: 00000000 bfa3d668 00000000 00000003 00000000 c0149d0f ef0f0000 00000000
[ 8.109050] ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 8.117218] ff60: ed89ff78 ee357400 ee3573c0 00000000 ed89e000 ed8e0000 bfa3a73d ee2fbca8
[ 8.125386] ff80: ee35741c c012fa2d ffffffff ee3573c0 c012f931 00000000 00000000 00000000
[ 8.133553] ffa0: 00000000 00000000 00000000 c0106589 00000000 00000000 00000000 00000000
[ 8.141721] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 8.149889] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 6fffd861 6fffdc61
[ 8.158070] [] (__dev_printk) from [] (dev_printk+0x21/0x2c)
[ 8.165479] [] (dev_printk) from [] (nrf24_rx_thread+0x1fe/0x244 [nrf24])
[ 8.174026] [] (nrf24_rx_thread [nrf24]) from [] (kthread+0xfd/0x104)
[ 8.182201] [] (kthread) from [] (ret_from_fork+0x11/0x28)
[ 8.189419] Code: eb04 4605 4614 b1e9 (6ccb) 7840
[ 8.194301] ---[ end trace e13b3e22e6f9c21d ]---


My overlay:

/dts-v1/;
/plugin/;

/ {
compatible = "allwinner,sun6i-a31s";

fragment@0 {
            target = <&pio>;
            __overlay__ {
                    nrf1_pins: nrf1_pins {
                            pins = "PE7", "PE6";	/* pins 38,36 on 40-pin GPIO connector */
                            function = "gpio_in", "gpio_out";
                    };
            };
    };

fragment@1 {
	target = <&spi1>;
	__overlay__ {
		#address-cells = <1>;
		#size-cells = <0>;
		status = "okay";

		nrf1: nrf1@0 {
			compatible = "nordic,nrf24";
			reg = <0>; /* CS0 */
			pinctrl-names = "default";
			pinctrl-0 = <&nrf1_pins>;
			#interrupt-cells = <2>;
			interrupt-parent = <&pio>;
			interrupts = <4 7 2>;		/* PE7, falling edge */
			irq-gpio = <&pio 4 7 0>;
			ce-gpio = <&pio 4 6 0>;		/* PE6 */
			spi-max-frequency = <5000000>;
			status = "okay";
		};
	};
};
  };

No errors happen during echo'ing text into the devices.

@mciupak
Copy link
Owner

mciupak commented Nov 9, 2017

Hello Sergey,

thanks for the ticket. I did not add any raspberry pi specific code (except of using raspberry pi kernel).

First dev_printk occurrence from nrf24_rx_thread is called in nrf24_read_reg (called via nrf24_is_rx_fifo_empty)

Could you please check if spi pointer is NULL there? If it is not NULL, check what is in spi->dev (as for example the name of device is taken from there).

Thanks,
Marcin

P.S. I am sorry for late answer, but I did not receive any email notification about that.

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 9, 2017

Hello Marcin,
thanks for the reply.
I will check what you said and report. It seems like the memory is totally corrupted in those receive/transmit threads. I am just curious why you didn't use UART strategy instead of those 2 threads..

I tried a couple more drivers similar to yours but all was unsuccessful. So I am still interested in making it work.

@mciupak
Copy link
Owner

mciupak commented Nov 9, 2017

So, maybe the problem is in the layer below i.e. spi driver?

What do you mean by "the memory is totally corrupted in those receive/transmit threads" ?

I am sure there is a lot of bugs there, so any hints are welcome!

Marcin

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 9, 2017

It may be too... Those sunxi devices are not well supported. "Memory corrupted" meaning that many pointers there are pointing to some strange data. I will give you more info later.
Didn't you think about using UART driver approach? I.e. implement UART driver based on RF stuff.
I tried also this project https://github.com/melomaa/nrf24l01 , it is not working either on my hardware, but at least it is not crashing ...

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

Hello Marcin
I have some news here. I found an issue in rx_thread function.

nrf24/nrf24_if.c

Line 1083 in a9f800a

if (pipe > N_NRF24_PIPES) {

As soon as N_NRF24_PIPES == 6 the condition in this line is not working correctly. Documentation says that "pipe" value can only be 0-5 but I am getting the "pipe" == 6 on my hardware and condition therefore allows the code to step below. As a result device->pipes[pipe] will access uninitialized memory.

It seems like condition should look like the following:

if (pipe > NRF24_PIPE5) {
...
}

But when I change it the way above then my board will hang at startup.

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Hello

this "pipe" in this context is a value of filed RX_P_NO of STATUS register.

Documentation says:
000-101: Data Pipe Number
110: Not Used
111: RX FIFO Empty

Not sure why you are getting pipe == 6 which is "Not Used". I have not observed such behavior.

This condition:

nrf24/nrf24_if.c

Line 1083 in a9f800a

if (pipe > N_NRF24_PIPES) {

is to prohibit reading of data when actually there is not data to be read, meaning RX FIFO Empty is set in STATUS register. However it does not protect against 110: Not used.

Your change is correct, but I would write it like following:
if (pipe >= N_NRF24_PIPES) {

But when I change it the way above then my board will hang at startup.

What do you mean by that? What are the symptoms?

BTW, did you solve the issue with crashing of nrf24_rx_thread?

Marcin

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

Yes, there is no crash anymore because Linux doesn't boot completely with that change.

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

Console says "can't start journal service".. seems like the loop in rx_thread is executed intensively without waiting because NRF chip says that FIFO is not empty but "pipe" == 6 and as soon as I changed the condition there is no crash anymore and thread is executing the loop over and over.
I have no idea why NRF is returning 6 for pipe... That may mean a wrong configuration or the chip itself is broken. Here is my hardware https://www.aliexpress.com/item/Free-Shipping-NRF24L01-PA-LNA-Wireless-Module-with-Antenna-1000-Meters-Long-Distance-FZ0410-We-are/32834661886.html?spm=a2g0s.9042311.0.0.w9m9ec

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Comment out starting of rx and tx thread.

If your system boot up, try to read STATUS register via:
cat /sys/class/nrf0/status (or sth like that)

You will see if SPI communication with your module works. There should be output on the console and in dmesg.

Paste output here please.

Marcin

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

root@aapi-bzdvwv:~# cat /sys/class/nrf24/nrf0/status
STATUS = 0x1C

dmesg says:

[ 100.133599] nrf24 spi32766.0: nrf24_print_status: CONFIG = 0xBC
[ 100.133688] nrf24 spi32766.0: nrf24_print_status: EN_AA = 0x54
[ 100.133747] nrf24 spi32766.0: nrf24_print_status: EN_RXADDR = 0x7E
[ 100.133801] nrf24 spi32766.0: nrf24_print_status: SETUP_AW = 0x06
[ 100.133852] nrf24 spi32766.0: nrf24_print_status: SETUP_RETR = 0xFD
[ 100.133904] nrf24 spi32766.0: nrf24_print_status: RF_CH = 0x04
[ 100.133955] nrf24 spi32766.0: nrf24_print_status: RF_SETUP = 0x28
[ 100.134007] nrf24 spi32766.0: nrf24_print_status: STATUS = 0x1C
[ 100.134058] nrf24 spi32766.0: nrf24_print_status: OBSERVE_TX = 0x00
[ 100.134109] nrf24 spi32766.0: nrf24_print_status: CD = 0x00
[ 100.134160] nrf24 spi32766.0: nrf24_print_status: FIFO_STATUS = 0x22
[ 100.134212] nrf24 spi32766.0: nrf24_print_status: DYNPD = 0x54
[ 100.134263] nrf24 spi32766.0: nrf24_print_status: FEATURE = 0x08

@sergey-suloev
Copy link
Contributor Author

Here interesting thing happens

root@aapi-bzdvwv:~# cat /sys/class/nrf24/nrf0/available_address_width
3 4 5

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

I have one more question. Documentation says that NRF should respond with status byte first when you read any register. But in your code I can't see that you are skipping the 1st byte. Why ?

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Reading address width.
All good here. 3 4 5 means that you can configure module to use 3, 4 or 5 bytes long addresses.

Regarding STATUS register and 1st byte.
Registers are read using spi_w8r8 function which first writes 8 bits and then reads 8 bits while STATUS is being clocked out while first 8 bit are written. That is why it is skipped automatically.

Now, the most important, regarding values of STATUS and other registers. It looks that all your registers values read from module are shifted left by 1 bit.
Eg.:

  • FEATURE is 0x08 while only EN_DPL bit is set and it should be 0x04
  • RF_CH is 0x04 while 0x02 is the reset value.
  • STATUS is 0x1C while it should be 0x0E.

That is the source of your problems. Not sure how to help you now...

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

you are saying that all the registers shifted by 1 bit ?? how is that possible ?
why do you think this may happen ? is there a way to understand if this is a SPI driver issue or hardware issue ?

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Hard to say why it is like that. Try lowering your SPI frequency first to 1 MHz or even lower like 100 kHz and try then.

If if will not help, check spi mode config. I believe it should be SPI_MODE_0, but you could try different ones. There could be also some driver specific config missing. Really hard to guess.

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

screenshot from 2017-11-15 22-10-04

Here is STATUS request. Seems like hardware part is fine?

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Hmm, why clk is clocked out only 8 times? Is is write and then read or just read?

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 15, 2017

This is only reading STATUS register. There is another byte following. Looks like master is sending 0x7 and then is staying at low level. Slave is responding with 0xE and again with 0xE. Both values are STATUS register.
screenshot from 2017-11-15 22-47-00

@sergey-suloev
Copy link
Contributor Author

Here is the whole picture for reading STATUS register
screenshot from 2017-11-15 22-50-46

@mciupak
Copy link
Owner

mciupak commented Nov 15, 2017

Yes, hardware part looks fine. 0x0E id "correct" value of STATUS register. It means that RX Fifo is empty.

And you confirm that your status is not 0x0E while displaying it in driver after that read you showed?

@sergey-suloev
Copy link
Contributor Author

The SPI driver gives me 0x1C. So the problem should be in sun6i SPI driver code.

@mciupak
Copy link
Owner

mciupak commented Nov 17, 2017

One more thing. Have you tried lowering SPI frequency? Screenshots you attached are form logic state analyzer and it does not show how plots actually looks like.

Give it a try with 1 MHz or lower.

Marcin

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 17, 2017

I tried it, no difference. I did debugging in spi-sun6i.c and found something that looks too bad: readb function here http://elixir.free-electrons.com/linux/v4.13.12/source/drivers/spi/spi-sun6i.c#L155 returns already broken value... That seems like hardware issue.. but logic analyzer shows a good picture. I am really confused.

@sergey-suloev
Copy link
Contributor Author

sergey-suloev commented Nov 17, 2017

I tried another device, OrangePi PC (Alwinner H3 Soc) and got with the same problem with SPI... that's so weird!

@sergey-suloev
Copy link
Contributor Author

Ok, I have tested communication using spi-gpio driver (bitbang/software spi emulation) and found that it is working fine.
[ 240.738204] nrf24 spi32766.0: nrf24_print_status: CONFIG = 0x0F
[ 240.738252] nrf24 spi32766.0: nrf24_print_status: EN_AA = 0x3F
[ 240.738296] nrf24 spi32766.0: nrf24_print_status: EN_RXADDR = 0x3F
[ 240.738339] nrf24 spi32766.0: nrf24_print_status: SETUP_AW = 0x03
[ 240.738382] nrf24 spi32766.0: nrf24_print_status: SETUP_RETR = 0xFF
[ 240.738426] nrf24 spi32766.0: nrf24_print_status: RF_CH = 0x02
[ 240.738468] nrf24 spi32766.0: nrf24_print_status: RF_SETUP = 0x07
[ 240.738511] nrf24 spi32766.0: nrf24_print_status: STATUS = 0x0E
[ 240.738554] nrf24 spi32766.0: nrf24_print_status: OBSERVE_TX = 0x00
[ 240.738597] nrf24 spi32766.0: nrf24_print_status: CD = 0x00
[ 240.738640] nrf24 spi32766.0: nrf24_print_status: FIFO_STATUS = 0x11
[ 240.738683] nrf24 spi32766.0: nrf24_print_status: DYNPD = 0x3F
[ 240.738727] nrf24 spi32766.0: nrf24_print_status: FEATURE = 0x04

@sergey-suloev
Copy link
Contributor Author

I am closing the issue as the problem described seems to be hardware-originated. Thanks for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants