Virtually mapped stacks #57

mpe · 2016-06-21T05:17:41Z

Looks like virtually mapped stacks are coming soon for x86:

https://lkml.org/lkml/2016/6/15/1064

There are some generic changes, but the bulk is arch specific. We will need to enable for power.

mikey · 2016-06-23T05:12:12Z

FWIW nice lwn article here:
https://lwn.net/SubscriberLink/692208/2e5521b846f9a3ea/

bsingharora · 2017-02-20T03:50:56Z

Spent some time looking at what we might need to do

The top things are:

Move allocations to vmalloc()
Guard page/fault handling. I think we should never hit a double fault, unless we get a real overflow with vmap'd stacks
Deal with real mode - discussed with @mpe . Basically use either the current stack pointer saved in paca (real address) or use a small real mode stack.

The rest should be common - fork/exit bits and accounting.

The value of this feature will go down once we get FSTACK_PROTECTOR functional again. But in the meanwhile we could move thread_info out of the stack and do some other things.

A disadvantage of this feature on power would be increased memory utilization with 64K being used instead of 16K per stack.

Draining the transfers in terminate_all callback happens with IRQs disabled, therefore induces huge latency: irqsoff latency trace v1.1.5 on 4.11.0 -------------------------------------------------------------------- latency: 39770 us, torvalds#57/57, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0) ----------------- | task: process-129 (uid:0 nice:0 policy:2 rt_prio:50) ----------------- => started at: _snd_pcm_stream_lock_irqsave => ended at: snd_pcm_stream_unlock_irqrestore _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / delay cmd pid ||||| time | caller \ / ||||| \ | / process-129 0d.s. 3us : _snd_pcm_stream_lock_irqsave process-129 0d.s1 9us : snd_pcm_stream_lock <-_snd_pcm_stream_lock_irqsave process-129 0d.s1 15us : preempt_count_add <-snd_pcm_stream_lock process-129 0d.s2 22us : preempt_count_add <-snd_pcm_stream_lock process-129 0d.s3 32us : snd_pcm_update_hw_ptr0 <-snd_pcm_period_elapsed process-129 0d.s3 41us : soc_pcm_pointer <-snd_pcm_update_hw_ptr0 process-129 0d.s3 50us : dmaengine_pcm_pointer <-soc_pcm_pointer process-129 0d.s3 58us+: snd_dmaengine_pcm_pointer_no_residue <-dmaengine_pcm_pointer process-129 0d.s3 96us : update_audio_tstamp <-snd_pcm_update_hw_ptr0 process-129 0d.s3 103us : snd_pcm_update_state <-snd_pcm_update_hw_ptr0 process-129 0d.s3 112us : xrun <-snd_pcm_update_state process-129 0d.s3 119us : snd_pcm_stop <-xrun process-129 0d.s3 126us : snd_pcm_action <-snd_pcm_stop process-129 0d.s3 134us : snd_pcm_action_single <-snd_pcm_action process-129 0d.s3 141us : snd_pcm_pre_stop <-snd_pcm_action_single process-129 0d.s3 150us : snd_pcm_do_stop <-snd_pcm_action_single process-129 0d.s3 157us : soc_pcm_trigger <-snd_pcm_do_stop process-129 0d.s3 166us : snd_dmaengine_pcm_trigger <-soc_pcm_trigger process-129 0d.s3 175us : ep93xx_dma_terminate_all <-snd_dmaengine_pcm_trigger process-129 0d.s3 182us : preempt_count_add <-ep93xx_dma_terminate_all process-129 0d.s4 189us*: m2p_hw_shutdown <-ep93xx_dma_terminate_all process-129 0d.s4 39472us : m2p_hw_setup <-ep93xx_dma_terminate_all ... rest skipped... process-129 0d.s. 40080us : <stack trace> => ep93xx_dma_tasklet => tasklet_action => __do_softirq => irq_exit => __handle_domain_irq => vic_handle_irq => __irq_usr => 0xb66c6668 Just abort the transfers and warn if the HW state is not what we expect. Move draining into device_synchronize callback. Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Vinod Koul <vinod.koul@intel.com>

If sch_hhf fails in its ->init() function (either due to wrong user-space arguments as below or memory alloc failure of hh_flows) it will do a null pointer deref of q->hh_flows in its ->destroy() function. To reproduce the crash: $ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000 Crash log: [ 690.654882] BUG: unable to handle kernel NULL pointer dereference at (null) [ 690.655565] IP: hhf_destroy+0x48/0xbc [ 690.655944] PGD 37345067 [ 690.655948] P4D 37345067 [ 690.656252] PUD 58402067 [ 690.656554] PMD 0 [ 690.656857] [ 690.657362] Oops: 0000 [#1] SMP [ 690.657696] Modules linked in: [ 690.658032] CPU: 3 PID: 920 Comm: tc Not tainted 4.13.0-rc6+ torvalds#57 [ 690.658525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 690.659255] task: ffff880058578000 task.stack: ffff88005acbc000 [ 690.659747] RIP: 0010:hhf_destroy+0x48/0xbc [ 690.660146] RSP: 0018:ffff88005acbf9e0 EFLAGS: 00010246 [ 690.660601] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000 [ 690.661155] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff821f63f0 [ 690.661710] RBP: ffff88005acbfa08 R08: ffffffff81b10a90 R09: 0000000000000000 [ 690.662267] R10: 00000000f42b7019 R11: ffff880058578000 R12: 00000000ffffffea [ 690.662820] R13: ffff8800372f6400 R14: 0000000000000000 R15: 0000000000000000 [ 690.663769] FS: 00007f8ae5e8b740(0000) GS:ffff88005d980000(0000) knlGS:0000000000000000 [ 690.667069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 690.667965] CR2: 0000000000000000 CR3: 0000000058523000 CR4: 00000000000406e0 [ 690.668918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 690.669945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 690.671003] Call Trace: [ 690.671743] qdisc_create+0x377/0x3fd [ 690.672534] tc_modify_qdisc+0x4d2/0x4fd [ 690.673324] rtnetlink_rcv_msg+0x188/0x197 [ 690.674204] ? rcu_read_unlock+0x3e/0x5f [ 690.675091] ? rtnl_newlink+0x729/0x729 [ 690.675877] netlink_rcv_skb+0x6c/0xce [ 690.676648] rtnetlink_rcv+0x23/0x2a [ 690.677405] netlink_unicast+0x103/0x181 [ 690.678179] netlink_sendmsg+0x326/0x337 [ 690.678958] sock_sendmsg_nosec+0x14/0x3f [ 690.679743] sock_sendmsg+0x29/0x2e [ 690.680506] ___sys_sendmsg+0x209/0x28b [ 690.681283] ? __handle_mm_fault+0xc7d/0xdb1 [ 690.681915] ? check_chain_key+0xb0/0xfd [ 690.682449] __sys_sendmsg+0x45/0x63 [ 690.682954] ? __sys_sendmsg+0x45/0x63 [ 690.683471] SyS_sendmsg+0x19/0x1b [ 690.683974] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 690.684516] RIP: 0033:0x7f8ae529d690 [ 690.685016] RSP: 002b:00007fff26d2d6b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 690.685931] RAX: ffffffffffffffda RBX: ffffffff810d278c RCX: 00007f8ae529d690 [ 690.686573] RDX: 0000000000000000 RSI: 00007fff26d2d700 RDI: 0000000000000003 [ 690.687047] RBP: ffff88005acbff98 R08: 0000000000000001 R09: 0000000000000000 [ 690.687519] R10: 00007fff26d2d480 R11: 0000000000000246 R12: 0000000000000002 [ 690.687996] R13: 0000000001258070 R14: 0000000000000001 R15: 0000000000000000 [ 690.688475] ? trace_hardirqs_off_caller+0xa7/0xcf [ 690.688887] Code: 00 00 e8 2a 02 ae ff 49 8b bc 1d 60 02 00 00 48 83 c3 08 e8 19 02 ae ff 48 83 fb 20 75 dc 45 31 f6 4d 89 f7 4d 03 bd 20 02 00 00 <49> 8b 07 49 39 c7 75 24 49 83 c6 10 49 81 fe 00 40 00 00 75 e1 [ 690.690200] RIP: hhf_destroy+0x48/0xbc RSP: ffff88005acbf9e0 [ 690.690636] CR2: 0000000000000000 Fixes: 87b60cf ("net_sched: fix error recovery at qdisc creation") Fixes: 10239ed ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>

chleroy · 2018-11-18T08:11:35Z

First step is the implementation of THREAD_INFO_IN_TASK, see #187
Once the above is merged, it open the door to implement VMAP_STACK

chleroy · 2018-11-18T08:12:11Z

The challenge is to access vm data from exception handlers entry.

Today, exception prolog determines stack physical address then saves registers like SRR0, SRR1, DAR, ... into the stack before switching back to MMU ON.

For using VM stack, we need to re-activate DATA translation earlier. This implies being able to service DTLB misses without jeopardising in-progress exception prolog, which means:

Using in DTLB miss different scratch registers than the ones used by exception prolog
Saving SRR0, SRR1 and DAR out of stack before re-activating DATA translation.

chleroy · 2018-11-18T08:13:06Z

On the 8xx, the HW assistance serie has allowed us to reduce to two the number of scratch registers used in DLTB miss. As exception prolog uses SPRG0 and SPRG1, we can use M_TW and DAR for DTLB miss.

We should be able to store SRR0, SRR1 and DAR in the thread_struct before re-activating Data translation, then copy them to stack after.

chleroy · 2018-12-06T20:55:18Z

The challenge is to identify and fix all places where stack is used in an unsafe way.
For that, the implementation of #147 would help.

One exemple: the max7301 GPIO driver performs SPI transfers with spi_write() and spi_read() with buffer on stack. This works today because stack is DMA-safe , but fails with Virtually mapped stacks.

chleroy · 2019-09-03T06:37:52Z

First attempt at https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=128398

For the time being, it hangs in lkdtm EXHAUST_STACK test.

The tricky thing is that r1 is updated with stwu. But if the store fails, r1 is not updated at the time the fault exception is taken, so the handler cannot know that r1 is out of bounds.

chleroy · 2019-09-12T05:30:39Z

Working series at https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=129901

Covers 8xx and book3s/32

chleroy · 2020-10-23T15:11:54Z

Merged in v5.6 for 8xx (linuxppc/linux@99b2291) and book3s/32 (linuxppc/linux@cd08f10)

mpe transferred this issue from linuxppc/linux Jan 7, 2019

mpe added enhancement Addition/modification of a feature, not a bug per se hard Probably hard labels Jan 7, 2019

ajdlinux assigned ajdlinux and chleroy Oct 18, 2019

ajdlinux mentioned this issue Nov 20, 2019

Move kernel stack to vmap area (CONFIG_HAVE_ARCH_VMAP_STACK) KSPP/linux#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtually mapped stacks #57

Virtually mapped stacks #57

mpe commented Jun 21, 2016

mikey commented Jun 23, 2016

bsingharora commented Feb 20, 2017

chleroy commented Nov 18, 2018

chleroy commented Nov 18, 2018

chleroy commented Nov 18, 2018

chleroy commented Dec 6, 2018

chleroy commented Sep 3, 2019

chleroy commented Sep 12, 2019

chleroy commented Oct 23, 2020

Virtually mapped stacks #57

Virtually mapped stacks #57

Comments

mpe commented Jun 21, 2016

mikey commented Jun 23, 2016

bsingharora commented Feb 20, 2017

chleroy commented Nov 18, 2018

chleroy commented Nov 18, 2018

chleroy commented Nov 18, 2018

chleroy commented Dec 6, 2018

chleroy commented Sep 3, 2019

chleroy commented Sep 12, 2019

chleroy commented Oct 23, 2020