Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kexec & kdump on X96Max_Plus2_T #503

Closed
mixmid opened this issue Aug 19, 2022 · 12 comments
Closed

kexec & kdump on X96Max_Plus2_T #503

mixmid opened this issue Aug 19, 2022 · 12 comments

Comments

@mixmid
Copy link

mixmid commented Aug 19, 2022

*** Preface ***

A friend of mine brought this tv box "X96 Max+" complaining of kernel panics the device catches when runs in the pure Linux mode not Android. I successfully used kdump on my Raspberry Pi 4 to debug kernel panics. But i still cannot run kdump & kexec on this Amlogic device, but first things first.

*** Main part ***

The full name of the tv box is Droidlogic X96Max_Plus2_T and the ethernet chip is Realtek RTL8211F as it's seen on Android by Device Info HW app. Device is Amlogic s905x3, kernel 5.15.60 (compiled with debug info)

I used meson-sm1-x96-max-plus-100m.dtb after installation and the triple of:

*meson-sm1-hk1box-vontar-x3.dtb
*meson-sm1-x96-max-plus-100m.dtb
*meson-sm1-x96-max-plus.dtb

and

dd if=/tmp/hk1box-bootloader.img of=/dev/mmcblk0 bs=1 count=442
dd if=/tmp/hk1box-bootloader.img of=/dev/mmcblk0 bs=512 skip=1 seek=1

and after all meson-sm1-x96-max-plus.dtb from flippy's post at ->

https://forum.armbian.com/topic/15376-methods-to-fix-x96-max-pluss905x3-gigabit-ethernet-problem.

Now that the device enters kexec reboot only with the flag 'kexec -s'

kexec -l /boot/vmlinuz-5.15.60-mxdbg2 --initrd=/boot/initrd.img-5.15.60-mxdbg2 --dtb=/boot/dtb/amlogic/meson-sm1-x96-max-plus.dtb --reuse-cmdline -s

arch_process_options:178: command_line: root=LABEL=ROOTFS rootflags=data=writeback rw rootfstype=ext4 console=ttyAML0,115200n8 console=tty0 no_console_suspend consoleblank=0 fsck.fix=yes fsck.repair=yes net.ifnames=0 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 maxcpus=1 mac=9a:29:43:2d:ba:16
arch_process_options:180: initrd: /boot/initrd.img-5.15.60-mxdbg2
arch_process_options:182: dtb: /boot/dtb/amlogic/meson-sm1-x96-max-plus.dtb
arch_process_options:185: console: (null)

*** Ethernet part ***

The First issue is that after kexec reboot ethernet doesn't work, but wi-fi works good. The normal boot of ethernet:

root@armbian:~# dmesg | grep ethernet
[    7.905241] meson8b-dwmac ff3f0000.ethernet: IRQ eth_wake_irq not found
[    7.906442] meson8b-dwmac ff3f0000.ethernet: IRQ eth_lpi not found
[    7.912687] meson8b-dwmac ff3f0000.ethernet: PTP uses main clock
[    7.920641] meson8b-dwmac ff3f0000.ethernet: User ID: 0x11, Synopsys ID: 0x37
[    7.925539] meson8b-dwmac ff3f0000.ethernet:         DWMAC1000
[    7.931505] meson8b-dwmac ff3f0000.ethernet: DMA HW capability register supported
[    7.938112] meson8b-dwmac ff3f0000.ethernet: RX Checksum Offload Engine supported
[    7.946270] meson8b-dwmac ff3f0000.ethernet: COE Type 2
[    7.950694] meson8b-dwmac ff3f0000.ethernet: TX Checksum insertion supported
[    7.957686] meson8b-dwmac ff3f0000.ethernet: Wake-Up On Lan supported
[    7.964124] meson8b-dwmac ff3f0000.ethernet: Normal descriptors
[    7.969932] meson8b-dwmac ff3f0000.ethernet: Ring mode enabled
[    7.975706] meson8b-dwmac ff3f0000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[   23.555361] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=40)
[   23.570472] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[   23.595648] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
[   23.602691] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
[   23.619891] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
[   25.290397] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off

After a kexec reboot:

root@armbian:~# dmesg | grep ethernet
[    5.826678] meson8b-dwmac ff3f0000.ethernet: IRQ eth_wake_irq not found
[    5.827894] meson8b-dwmac ff3f0000.ethernet: IRQ eth_lpi not found
[    5.834149] meson8b-dwmac ff3f0000.ethernet: PTP uses main clock
[    5.841912] meson8b-dwmac ff3f0000.ethernet: User ID: 0x11, Synopsys ID: 0x37
[    5.847004] meson8b-dwmac ff3f0000.ethernet:         DWMAC1000
[    5.852981] meson8b-dwmac ff3f0000.ethernet: DMA HW capability register supported
[    5.859557] meson8b-dwmac ff3f0000.ethernet: RX Checksum Offload Engine supported
[    5.867705] meson8b-dwmac ff3f0000.ethernet: COE Type 2
[    5.872141] meson8b-dwmac ff3f0000.ethernet: TX Checksum insertion supported
[    5.879138] meson8b-dwmac ff3f0000.ethernet: Wake-Up On Lan supported
[    5.885575] meson8b-dwmac ff3f0000.ethernet: Normal descriptors
[    5.891376] meson8b-dwmac ff3f0000.ethernet: Ring mode enabled
[    5.897156] meson8b-dwmac ff3f0000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    5.905088] meson8b-dwmac ff3f0000.ethernet: device MAC address 9e:61:9f:bd:1e:aa
[   20.959437] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   20.967352] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   32.808649] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   32.819560] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.142282] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.163931] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.281170] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.287109] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.331279] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.334375] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.417977] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.427303] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)

The second issue is that the second kernel for kdump doesn't start, it just stops after these lines:

armbian login:
[ 234.915496] kvm: exiting hardware virtualization
[ 234.952516] pwrseq_simple sdio-pwrseq: Turning off mmc
[ 234.955323] kexec_core: Starting new kernel
[ 234.971860] Bye!

*** KASLR part ***

I suspect that it's correlated with the line in kexec -d (debug)

setup_2nd_dtb: no kaslr-seed found

There is no /chosen/kaslr-seed:

# dtc -I dtb -O dts /sys/firmware/fdt | grep -A 10 -i chosen
     chosen {
    ... no 'kaslr-seed' node
}

Is there any chance to rebuild u-boot with some patches like that ->

From fe3dde3e7b0c01d081140fcb28e317a688440fbb Mon Sep 17 00:00:00 2001
From: Chris Morgan <macromorgan@hotmail.com>
Date: Wed, 25 Aug 2021 11:22:57 -0500
Subject: [PATCH] cmd: kaslrseed: add command to generate value from hwrng

Allow the kaslr-seed value in the chosen node to be set from a hardware
rng source.

Or should i just stop to torment this tv box and buy something else? Because again RPI4 works well.

Thanks in advance.

@ophub
Copy link
Owner

ophub commented Aug 20, 2022

https://github.com/unifreq/amlogic-boot-fip
https://github.com/unifreq/u-boot

flippy introduced me to the method of making u-boot, and I also recorded it with notebooks, but I haven't practiced it, and I haven't learned how to do it. You can ask him in the group, or go to his warehouse to communicate with him on how to debug the new device .

@mixmid
Copy link
Author

mixmid commented Aug 20, 2022

I prefer to skip the kdump part and look at the broken eth0 after kexec reboot:

[   20.959437] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   20.967352] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   32.808649] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   32.819560] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.142282] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.163931] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)

Does anybody know where to dig further?

@mixmid
Copy link
Author

mixmid commented Aug 21, 2022

I have compiled files of new u-boot:

-rw-r--r-- 1 1000 1000 1588080 Aug 21 16:31 u-boot.bin
-rw-r--r-- 1 1000 1000 1588592 Aug 21 16:31 u-boot.bin.sd.bin
-rw-r--r-- 1 1000 1000   65536 Aug 21 16:31 u-boot.bin.usb.bl2
-rw-r--r-- 1 1000 1000 1522544 Aug 21 16:31 u-boot.bin.usb.tpl

and cannot flash u-boot.bin.sd.bin to the microsd card. Tried so

    $ DEV=/dev/your_sd_device
    $ dd if=u-boot.bin.sd.bin of=$DEV conv=fsync,notrunc bs=512 skip=1 seek=1
    $ dd if=u-boot.bin.sd.bin of=$DEV conv=fsync,notrunc bs=1 count=444
Still the same old version of u-boot. I certainly do something wrong. 

@mixmid
Copy link
Author

mixmid commented Aug 22, 2022

I have the UART connected and see only this line at the early boot stage

U-Boot 2015.01 (Apr 09 2021 - 15:57:35)

And this later

U-Boot 2021.07-rc3-00183-gd6e1cdad51-dirty (May 31 2021 - 22:33:28 +0800) x96-max-plus

@mixmid
Copy link
Author

mixmid commented Aug 22, 2022

The KASLR part is not the issue now. RPI4 takes kdumps even with nokaslr kernel command liine argument is set. It's even recommended to be nokaslr. The next thought was to add /chosen/kaslr-seed into the action.

I played with FDT a little (press SPACE in the u-boot loading) ->

=> load mmc 1 0x80000000 vmlinuz-5.15.59-mxdbg
=> load mmc 1 0x13000000 uInitrd
=> load mmc 1 0x88000000 dtb/amlogic/meson-sm1-x96-max-plus.dtb

=> env set bootargs root=LABEL=ROOTFS rootflags=data=writeback rw rootfstype=ext4 console=ttyAML0,115200n8 console=tty0 no_console_suspend consoleblank=0 fsck.fix=yes fsck.repair=yes net.ifnames=0 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1 crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M

=> fdt addr 0x88000000  0x100000
=> fdt set /chosen kaslr-seed <0xfeedbeef 0xc0def00d>

=> booti 0x80000000 0x13000000 0x88000000

This means 'add /chosen/kaslr-seed node to the FDT'. But it helped a bit. Something still resets value from /chosen/kaslr-seed

$ dtc -I dtb -O dts /sys/firmware/fdt | grep -A 10 -i chosen

chosen {
        kaslr-seed = <0x00 0x00>;

But even so the Ethernet part stays actual.

@mixmid
Copy link
Author

mixmid commented Aug 23, 2022

https://github.com/unifreq/amlogic-boot-fip https://github.com/unifreq/u-boot

flippy introduced me to the method of making u-boot, and I also recorded it with notebooks, but I haven't practiced it, and I haven't learned how to do it. You can ask him in the group, or go to his warehouse to communicate with him on how to debug the new device .

His sub-projects have no Issues tabs. What is the group or his warehouse i can ask him?

@ophub
Copy link
Owner

ophub commented Aug 23, 2022

@mixmid
Copy link
Author

mixmid commented Sep 7, 2022

I've come with the solution.

*** Ethernet part ***


[   20.959437] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   20.967352] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   32.808649] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   32.819560] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)
[   33.142282] meson8b-dwmac ff3f0000.ethernet eth0: no phy at addr -1
[   33.163931] meson8b-dwmac ff3f0000.ethernet eth0: stmmac_open: Cannot attach to PHY (error: -19)

was fixed by the string in ethernet-phy@0
compatible = "ethernet-phy-id001c.c916", "ethernet-phy-ieee802.3-c22";

&ext_mdio {
	external_phy: ethernet-phy@0 {
		/* 
		 * Realtek RTL8211F (0x001cc916) 
		 * JLSemi JL2101 (0x937c4032)
		 */
		compatible = "ethernet-phy-id001c.c916", "ethernet-phy-ieee802.3-c22";
		reg = <0>;
		max-speed = <1000>;

		reset-assert-us = <30000>;
		reset-deassert-us = <80000>;
		reset-gpios = <&gpio GPIOZ_15 (GPIO_ACTIVE_LOW | GPIO_OPEN_DRAIN)>;

		interrupt-parent = <&gpio_intc>;
		/* MAC_INTR on GPIOZ_14 */
		interrupts = <26 IRQ_TYPE_LEVEL_LOW>;
	};
};

*** kdump part ***

This part was solved by increasing crashkernel=512M in the kernel command line. See it in action:

[  347.695632] SMP: stopping secondary CPUs
[  347.699559] Starting crashdump kernel...
[  347.703390] Bye!
[    0.000000] Booting Linux on physical CPU 0x0000000100 [0x411fd050]
[    0.000000] Linux version 5.15.59-mxdbg (root@ed84223eb023) (clang version 14.0.0, LLD 14.0.0) #8 SMP PREEMPT Sun Aug 7 15:40:07 MSK 2022
[    0.000000] random: crng init done
[    0.000000] Machine model: AMedia X96 Max+
[    0.000000] efi: UEFI not found.
[    0.000000] Reserved memory: created CMA memory pool at 0x00000000d0400000, size 256 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: fdt: elfcorehdr is overlapped

Look to kdumps:

root@armbian:/var/crash/202208301753# ls -l
total 45912
-rw------- 1 root root    40843 Aug 30 17:53 dmesg.202208301753
-rw-r--r-- 1 root root 46969389 Aug 30 17:53 dump.202208301753

I hope that my research will help somebody. Please close this issue after several days.

@ophub
Copy link
Owner

ophub commented Sep 7, 2022

https://github.com/unifreq/linux-5.15.y

Hello, please submit your modification suggestions to the upstream kernel source library if it is convenient.

@mixmid
Copy link
Author

mixmid commented Sep 7, 2022

Both fixes are very specific to my config. My primary aim is to be saved in Google's cache and help people coming from search engines.

@mixmid
Copy link
Author

mixmid commented Sep 7, 2022

I can only help to attach my device tree -- X96Max+ model.
meson-sm1-x96-max-plus.zip

@ophub
Copy link
Owner

ophub commented Sep 7, 2022

ok, thanks for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants