Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ar71xx: ag71xx minor fixes that help debugging #1217

Closed
wants to merge 3 commits into from
Closed

ar71xx: ag71xx minor fixes that help debugging #1217

wants to merge 3 commits into from

Conversation

danielfdickinson
Copy link

This is a trio of minor fixes/tweaks for ar71xx version of ag71xx that help debugging when working with the driver and/or modifying board definitions for ag71xx boards.

Daniel F. Dickinson added 3 commits July 29, 2018 05:56
NB: Error only appears with ag71xx debug messages and dynamic printk
enabled.  This is probably why no one has caught it before.

Previously phy probe debug messages used old (now wrong) functions
to get the phy name for printing.  There was also the chance of
a NULL pointer in the event no phy_device was found.

Signed-off-by: Daniel F. Dickinson <cshored@thecshore.com>
It's a little noisier but makes it obvious when the ar7240 switch was
connected to the MDIO bus, and to which phy device (or the failure
to do so).

Signed-off-by: Daniel F. Dickinson <cshored@thecshore.com>
The driver is written in such a way that with a board defintion that
connects a fixed phy, mdio, and switch in a certain way, a kernel oops could
result because of lack of previously probed mdio bus.

This commit allows for easier debugging in this case by casting the
correct blame with serial console messages.

Signed-off-by: Daniel F. Dickinson <cshored@thecshore.com>
@@ -97,6 +97,17 @@ static int ag71xx_phy_connect_fixed(struct ag71xx *ag)
ag->duplex = pdata->duplex;
ag->speed = pdata->speed;

if (!ret) {
dev_info(dev, "connected to fixed PHY at %s [uid=%08x, driver=%s]\n",
#if LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the kernel rev #if. we use >=4.9 rest of the series is fine

@blogic
Copy link
Contributor

blogic commented Jul 30, 2018

manually fixed the comment and merged the series, Thanks !

@blogic blogic closed this Jul 30, 2018
@hnyman
Copy link
Contributor

hnyman commented Aug 1, 2018

@blogic @cshoredaniel

I propose that the commits from this PR get reverted, until they are fixed.
(Using the merged commit hashes below)

  • commits 0b9f4e8 and 7a3e133 break all connectivity to some ar71xx devices, at least WNDR3800 (and 3700v1 and 3700v2)
  • I already pointed out separately earlier that also the third commit 2655fbe has faulty syntax.

I have problems connecting to my ar71xx WNDR3800 (ar7161 based) after compiling the current master r7713-a2a225517d. The reason seems to be the switch changes made by this PR.

Instead of having both eth0 and eth1, kernel log has an error message "Missing PHY for ag71xx.0" from this PR, and the normal eth1 is assigned to be eth0. No eth1 is defined. That pretty much breaks all connectivity to the router.

kernel log with today's build r7713-a2a225517d

[    3.140448] rtl8366s rtl8366s: using GPIO pins 5 (SDA) and 7 (SCK)
[    3.146773] rtl8366s rtl8366s: RTL8366 ver. 1 chip found
[    3.210298] libphy: rtl8366s: probed
[    3.214482] libphy: Fixed MDIO Bus: probed
[    3.551902] Missing PHY for ag71xx.0
[    3.892725] ag71xx ag71xx.1: connected to PHY at rtl8366s:04 [uid=001cc960, driver=Generic PHY]
[    3.902051] eth0: Atheros AG71xx at 0xba000000, irq 5, mode:RGMII
[    3.909536] NET: Registered protocol family 10

kernel log with 29 July 2018 build r7581-7880a6f7fe:

[    3.140732] rtl8366s rtl8366s: using GPIO pins 5 (SDA) and 7 (SCK)
[    3.146995] rtl8366s rtl8366s: RTL8366 ver. 1 chip found
[    3.209997] libphy: rtl8366s: probed
[    3.214134] libphy: Fixed MDIO Bus: probed
[    3.552250] eth0: Atheros AG71xx at 0xb9000000, irq 4, mode:RGMII
[    3.892480] ag71xx ag71xx.1: connected to PHY at rtl8366s:04 [uid=001cc960, driver=Generic PHY]
[    3.901801] eth1: Atheros AG71xx at 0xba000000, irq 5, mode:RGMII
[    3.909248] NET: Registered protocol family 10

Build r7581-7880a6f7fe from 29 July 2018 worked ok, and that is right before this PR got merged.

Reverting both of these commits fixes things:
0b9f4e8 ar71xx: ag71xx: Add connect message: fixed phy
7a3e133 ar71xx: ag71xx: Prevent kernel oops for board def

Log after reverting matches again the expectations and the router works:

[    3.140193] rtl8366s rtl8366s: using GPIO pins 5 (SDA) and 7 (SCK)
[    3.146516] rtl8366s rtl8366s: RTL8366 ver. 1 chip found
[    3.209889] libphy: rtl8366s: probed
[    3.214063] libphy: Fixed MDIO Bus: probed
[    3.552167] eth0: Atheros AG71xx at 0xb9000000, irq 4, mode:RGMII
[    3.892389] ag71xx ag71xx.1: connected to PHY at rtl8366s:04 [uid=001cc960, driver=Generic PHY]
[    3.901706] eth1: Atheros AG71xx at 0xba000000, irq 5, mode:RGMII
[    3.909137] NET: Registered protocol family 10

If only the "Add connect message: fixed phy" commit is reverted, then a kernel OOPS follows. Log below.

I think that the "Add connect message: fixed phy" tries to give a too informative error message and some information is not found with all devices. That leads to OOPS. The second commit "Prevent kernel oops for board def" tries to mitigate that, but at the same time stops the execution of the function too early.

Log of the OOPS

[    3.134539] Realtek RTL8366S ethernet switch driver version 0.2.2
[    3.140658] rtl8366s rtl8366s: using GPIO pins 5 (SDA) and 7 (SCK)
[    3.146983] rtl8366s rtl8366s: RTL8366 ver. 1 chip found
[    3.210307] libphy: rtl8366s: probed
[    3.214492] libphy: Fixed MDIO Bus: probed
[    3.551943] CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 80286f30, ra == 80285894
[    3.562549] Oops[#1]:
[    3.564824] CPU: 0 PID: 1 Comm: swapper Not tainted 4.9.116 #0
[    3.570638] task: 87c2c000 task.stack: 87c30000
[    3.575155] $ 0   : 00000000 00000000 00000000 000003e8
[    3.580401] $ 4   : 80484cd8 00000000 80450000 0000003f
[    3.585648] $ 8   : 87c31fe0 0000d400 00000000 00000004
[    3.590896] $12   : 0018c25d 19a2c84c 00000001 00000000
[    3.596143] $16   : 87c7cc60 80484dc0 80484dc0 00000200
[    3.601391] $20   : 80484cd8 80490000 87c7cc60 87c7cd2c
[    3.606637] $24   : 8047e834 523b05d3
[    3.611885] $28   : 87c30000 87c31cd0 804f0000 80285894
[    3.617132] Hi    : 00000064
[    3.619999] Lo    : ebf00000
[    3.622882] epc   : 80286f30 ag71xx_phy_connect_fixed+0x80/0xc4
[    3.628783] ra    : 80285894 ag71xx_probe+0x3e4/0x588
[    3.633818] Status: 1000d403 KERNEL EXL IE
[    3.638019] Cause : 00800008 (ExcCode 02)
[    3.642018] BadVA : 0000002c
[    3.644885] PrId  : 00019374 (MIPS 24Kc)
[    3.648789] Modules linked in:
[    3.651842] Process swapper (pid: 1, threadinfo=87c30000, task=87c2c000, tls=00000000)
[    3.659732] Stack : 80484cc8 80484dc0 87c7c800 80484cc8 80484dc0 00000200 80484cd8 87c7c800
[    3.668132]         80484cc8 80285894 19000000 00000001 87d5fb40 00000000 00000000 87c7c800
[    3.676531]         804f0000 80183d10 803c0d04 80484cd8 8049c6e4 80484d0c 80520000 8049c6e4
[    3.684929]         00000000 804a5304 804f0000 80249e8c 87d5e900 87d5e900 87c0e4b0 80183a58
[    3.693329]         80484cd8 80484cd8 80520000 80248384 00000000 80490000 87c5de30 87c31dc0
[    3.701728]         ...
[    3.704180] Call Trace:
[    3.706617] [<80286f30>] ag71xx_phy_connect_fixed+0x80/0xc4
[    3.712179] [<80285894>] ag71xx_probe+0x3e4/0x588
[    3.716874] [<80249e8c>] platform_drv_probe+0x2c/0x70
[    3.721925] [<80248384>] really_probe+0x110/0x278
[    3.726622] [<80248590>] __driver_attach+0xa4/0xdc
[    3.731401] [<80246404>] bus_for_each_dev+0x5c/0xbc
[    3.736267] [<802478b0>] bus_add_driver+0xf4/0x214
[    3.741049] [<80248c40>] driver_register+0xac/0xfc
[    3.745844] [<804ca938>] ag71xx_module_init+0x2c/0x50
[    3.750883] [<80060a5c>] do_one_initcall+0xdc/0x1a0
[    3.755761] [<804a5d0c>] kernel_init_freeable+0x14c/0x20c
[    3.761161] [<803a8198>] kernel_init+0x10/0x104
[    3.765687] [<800668d8>] ret_from_kernel_thread+0x14/0x1c
[    3.771071] Code: ae0300e4  8e230008  ae0300e0 <8c46002c> 14c00002  00000000  8c460008  8c4300f0  8c4700f4
[    3.780861]
[    3.782373] ---[ end trace 246723ef878a10cb ]---
[    3.788017] Kernel panic - not syncing: Fatal exception
[    3.794091] Rebooting in 1 seconds..

jow- added a commit to lede-project/source that referenced this pull request Aug 1, 2018
This reverts commit 0b9f4e8.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
jow- added a commit to lede-project/source that referenced this pull request Aug 1, 2018
This reverts commit 7a3e133.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
@danielfdickinson
Copy link
Author

I believe the issue is devices like @hnyman WNDR that have non ag71xxx switch attached. I don't think it's worth trying to make them work; they're just tweaks that were supposed to make life easier. Sorry.

jollaman999 pushed a commit to jollaman999/openwrt that referenced this pull request Aug 2, 2018
This reverts commit 0b9f4e880807e3cfd22d12b929202e1edcdc577c.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
jollaman999 pushed a commit to jollaman999/openwrt that referenced this pull request Aug 2, 2018
This reverts commit 7a3e133751870a3431e2430386a77a002eb24117.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
beroid pushed a commit to beroid/openwrt that referenced this pull request Aug 20, 2018
This reverts commit 0b9f4e880807e3cfd22d12b929202e1edcdc577c.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
beroid pushed a commit to beroid/openwrt that referenced this pull request Aug 20, 2018
This reverts commit 7a3e133751870a3431e2430386a77a002eb24117.

This change reportedly breaks connectivity on some ar71xx devices, so
revert it for now.

Ref: openwrt/openwrt#1217 (comment)
Reported-by: Hannu Nyman <hannu.nyman@iki.fi>
Signed-off-by: Jo-Philipp Wich <jo@mein.io>
@danielfdickinson danielfdickinson deleted the pr-ag71xx-fixes branch January 2, 2019 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants