Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ath9k / arm64: starting / scanning with ath9k crashes kernel regullary #11275

Closed
PolynomialDivision opened this issue Nov 17, 2022 · 1 comment

Comments

@PolynomialDivision
Copy link
Member

PolynomialDivision commented Nov 17, 2022

I have a Banana Pi R64 with a MikroTik R11e-5HnD Mini-PCIE (AR9580) card. On OpenWrt 22.03 and snapshot this devices crashes completely. Sometimes when I enable the card with ip link set dev wlan1 up it will crash. However, sometimes this works but when a scan iw dev wlan1 scan crashes it.

The log looks like this:

root@OpenWrt:/# ip l set wlan1 up
[   76.893465] rcu: INFO: rcu_sched self-detected stall on CPU
[   76.899050] rcu:     0-....: (6003 ticks this GP) idle=db7/1/0x4000000000000002 softirq=2316/2318 fqs=2835 
[   76.908526]  (t=6000 jiffies g=477 q=983)
[   76.912528] Task dump for CPU 0:
[   76.915746] task:ip              state:R  running task     stack:    0 pid: 2372 ppid:   673 flags:0x00000002
[   76.925657] Call trace:
[   76.928093]  dump_backtrace+0x0/0x15c
[   76.931756]  show_stack+0x14/0x30
[   76.935064]  sched_show_task+0x138/0x164
[   76.938981]  dump_cpu_task+0x40/0x50
[   76.942553]  rcu_dump_cpu_stacks+0xe4/0x128
[   76.946731]  rcu_sched_clock_irq+0x678/0x810
[   76.950996]  update_process_times+0x98/0x140
[   76.955260]  tick_sched_timer+0x54/0xcc
[   76.959091]  __hrtimer_run_queues+0x100/0x210
[   76.963441]  hrtimer_interrupt+0xe4/0x280
[   76.967444]  arch_timer_handler_phys+0x30/0x40
[   76.971881]  handle_percpu_devid_irq+0x80/0x130
[   76.976407]  handle_domain_irq+0x5c/0x8c
[   76.980325]  gic_handle_irq+0x64/0x8c
[   76.983984]  call_on_irq_stack+0x28/0x44
[   76.987900]  do_interrupt_handler+0x4c/0x54
[   76.992075]  el1_interrupt+0x2c/0x4c
[   76.995645]  el1h_64_irq_handler+0x14/0x20
[   76.999736]  el1h_64_irq+0x74/0x78
[   77.003129]  ath_start_rfkill_poll+0x7ec/0x850 [ath9k]
[   77.008267]  ath9k_hw_stopdmarecv+0x264/0x410 [ath9k_hw]
[   77.013585]  ath9k_hw_enable_interrupts+0x48/0x50 [ath9k_hw]
[   77.019243]  ath9k_calculate_summary_state+0x36c/0x730 [ath9k]
[   77.025070]  ath9k_calculate_summary_state+0x5dc/0x730 [ath9k]
[   77.030898]  ath_reset+0x50/0x70 [ath9k]
[   77.034816]  ath_chanctx_set_channel+0x1e4/0x834 [ath9k]
[   77.040124]  ath9k_set_txpower+0x318/0x670 [ath9k]
[   77.044910]  ieee80211_hw_config+0x3c/0x2f0 [mac80211]
[   77.050084]  ieee80211_do_open+0x610/0x810 [mac80211]
[   77.055148]  ieee80211_do_open+0x7cc/0x810 [mac80211]
[   77.060214]  __dev_open+0xbc/0x170
[   77.063612]  __dev_change_flags+0x134/0x190
[   77.067788]  dev_change_flags+0x20/0x60
[   77.071618]  devinet_ioctl+0x534/0x6d0
[   77.075362]  inet_ioctl+0x1f4/0x20c
[   77.078845]  sock_do_ioctl+0x44/0xf0
[   77.082417]  sock_ioctl+0x254/0x390
[   77.085899]  __arm64_sys_ioctl+0xac/0xd0
[   77.089816]  invoke_syscall+0x44/0x110
[   77.093558]  el0_svc_common.constprop.0+0x3c/0xe4
[   77.098255]  do_el0_svc+0x1c/0x2c
[   77.101562]  el0_svc+0x14/0x50
[   77.104610]  el0t_64_sync_handler+0x9c/0x120
[   77.108874]  el0t_64_sync+0x158/0x15c
root@OpenWrt:/# iw dev wl1-ap0 scan
[  108.962962] rcu: INFO: rcu_sched self-detected stall on CPU
[  108.968565] rcu:     0-....: (6011 ticks this GP) idle=a8d/1/0x4000000000000002 softirq=2510/2512 fqs=3000 
[  108.978063]  (t=6002 jiffies g=561 q=41)
[  108.981992] Task dump for CPU 0:
[  108.985218] task:kworker/u4:0    state:R  running task     stack:    0 pid:    9 ppid:     2 flags:0x0000000a
[  108.995152] Workqueue: phy1 ieee80211_scan_work [mac80211]
[  109.000756] Call trace:
[  109.003199]  dump_backtrace+0x0/0x15c
[  109.006878]  show_stack+0x14/0x30
[  109.010202]  sched_show_task+0x138/0x164
[  109.014132]  dump_cpu_task+0x40/0x50
[  109.017721]  rcu_dump_cpu_stacks+0xe4/0x128
[  109.021917]  rcu_sched_clock_irq+0x678/0x810
[  109.026197]  update_process_times+0x98/0x140
[  109.030475]  tick_sched_timer+0x54/0xcc
[  109.034323]  __hrtimer_run_queues+0x100/0x210
[  109.038687]  hrtimer_interrupt+0xe4/0x280
[  109.042705]  arch_timer_handler_phys+0x30/0x40
[  109.047157]  handle_percpu_devid_irq+0x80/0x130
[  109.051697]  handle_domain_irq+0x5c/0x8c
[  109.055633]  gic_handle_irq+0x64/0x8c
[  109.059309]  call_on_irq_stack+0x28/0x44
[  109.063239]  do_interrupt_handler+0x4c/0x54
[  109.067429]  el1_interrupt+0x2c/0x4c
[  109.071016]  el1h_64_irq_handler+0x14/0x20
[  109.075121]  el1h_64_irq+0x74/0x78
[  109.078529]  ath_start_rfkill_poll+0x7ec/0x850 [ath9k]
[  109.083687]  ath9k_hw_stopdmarecv+0x264/0x410 [ath9k_hw]
[  109.089027]  ath9k_hw_enable_interrupts+0x48/0x50 [ath9k_hw]
[  109.094710]  ath9k_calculate_summary_state+0x36c/0x730 [ath9k]
[  109.100561]  ath9k_calculate_summary_state+0x5dc/0x730 [ath9k]
[  109.106412]  ath_reset+0x50/0x70 [ath9k]
[  109.110352]  ath_chanctx_set_channel+0x1e4/0x804 [ath9k]
[  109.115682]  ath9k_set_txpower+0x318/0x670 [ath9k]
[  109.120489]  ieee80211_hw_config+0x3c/0x2f0 [mac80211]
[  109.125701]  ieee80211_scan_work+0x394/0x5a0 [mac80211]
[  109.130999]  process_one_work+0x200/0x3b4
[  109.135019]  worker_thread+0x17c/0x4dc
[  109.138776]  kthread+0x11c/0x130
[  109.142012]  ret_from_fork+0x10/0x20

There is already a complete email thread about it:
https://lists.archive.carbon60.com/linux/kernel/2584449
In the last post someone even finds the line where the stall happens:

After further debugging we know the place it hangs.

In function:
static int ath_reset_internal (struct ath_softc *sc, struct ath9k_channel *hchan)
{
disable_irq(sc->irq);
tasklet_disable(&sc->intr_tq);
tasklet_disable(&sc->bcon_tasklet);
spin_lock_bh(&sc->sc_pcu_lock);
....
....
....
if (!ath_complete_reset(sc, true)) -> This function enables hardware interrupts
r = -EIO;

out:
enable_irq(sc->irq); -> Here IRQ line state is changed to enable state
spin_unlock_bh(&sc->sc_pcu_lock);
tasklet_enable(&sc->bcon_tasklet);
tasklet_enable(&sc->intr_tq);

}

static bool ath_complete_reset(struct ath_softc *sc, bool start)
{
struct ath_hw *ah = sc->sc_ah;
struct ath_common *common = ath9k_hw_common(ah);
unsigned long flags;

ath9k_calculate_summary_state(sc, sc->cur_chan);
ath_startrecv(sc);
....
....

sc->gtt_cnt = 0;

ath9k_hw_set_interrupts(ah); -> Here hardware interrupts are being enabled
ath9k_hw_enable_interrupts(ah); -> We see hang after this line
ieee80211_wake_queues(sc->hw);
ath9k_p2p_ps_timer(sc);

return true;
}

Before changing IRQ line to to enabled state, hardware interrupts are being enabled.
Wont this cause a race condition where within this period of hardware raises an interrupt, but IRQ line state is disabled state, this will
reach the following condition making EP handler not being invoked.

void handle_simple_irq(struct irq_desc *desc)
{
raw_spin_lock(&desc->lock);
...
if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) { // This condition is reaching and becoming true.
desc->istate |= IRQS_PENDING;
goto out_unlock;
}

kstat_incr_irqs_this_cpu(desc);
handle_irq_event(desc);

out_unlock:
raw_spin_unlock(&desc->lock);
}

We see hang at that statement, without reaching back enable_irq, looks like by this time CPU is in stall.

Can any tell why hardware interrupts are being enabled before kernel changing IRQ line state? 

However, noone responded to this mail.

Maybe it is also some drawback of the banana pi r64 not able to deliver enough power:
https://forum.banana-pi.org/t/bpi-r64-possible-hardware-improvements/12711/6?u=schnickidischnack

With a Compex WLE200NX (AR9280) it is working.

@PolynomialDivision PolynomialDivision changed the title ath9k / arm64: starting ath9k crashes kernel sometimes ath9k / arm64: starting ath9k crashes kernel regullary Nov 17, 2022
@PolynomialDivision PolynomialDivision changed the title ath9k / arm64: starting ath9k crashes kernel regullary ath9k / arm64: starting / scanning with ath9k crashes kernel regullary Nov 17, 2022
@nledovskikh
Copy link
Contributor

Hi, the problem resolved or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants