New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIx/workaround vchan "loopback" connection #951

Open
marmarek opened this Issue Mar 8, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by marmarek on 9 Feb 2015 21:15 UTC
Currently Xen implementation of vchan in R3 crashes when connection is made back to the source domain. This is apparently not supported by xen-gntalloc driver.

The exact message is:

[    9.937990] BUG: Bad page map in process qrexec-agent  pte:80000000f9d41167 pmd:131c3067
[    9.938010] page:ffffea00036a6638 count:1 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[    9.938018] page flags: 0x4000000000000c14(referenced|dirty|reserved|private)
[    9.938033] addr:00007fa856d47000 vm_flags:140400fb anon_vma:          (null) mapping:ffff880011efe940 index:11
[    9.938042] vma->vm_ops->fault:           (null)
[    9.938057] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1c0 [   9.938066](xen_gntalloc]
[) CPU: 0 PID: 1108 Comm: qrexec-agent Tainted: G           O 3.12.23-1.pvops.qubes.x86_64 #1
[    9.938074]  ffff8800131f3818 ffff88001316fc78 ffffffff814db550 00007fa856d47000
[    9.938085]  ffff88001316fcb8 ffffffff81139413 ffff880011efe940 ffff8800131c3a38
[    9.938096]  ffffea00036a6638 00007fa856d47000 00007fa856d57000 ffff88001316fe18
[    9.938107] Call Trace:
[    9.938117]  [dump_stack+0x45/0x56
[    9.938126](<ffffffff814db550>])  [print_bad_pte+0x1a3/0x240
[    9.938133](<ffffffff81139413>])  [unmap_page_range+0x6ee/0x7d0
[    9.938142](<ffffffff8113ac9e>])  [unmap_single_vma+0x76/0xa0
[    9.938149](<ffffffff8113adf6>])  [unmap_vmas+0x49/0x90
[    9.938157](<ffffffff8113be09>])  [exit_mmap+0x9c/0x170
[    9.938166](<ffffffff8114443c>])  [mmput+0x5c/0x110
[    9.938175](<ffffffff8105950c>])  [do_exit+0x27c/0xa20
[    9.938184](<ffffffff8105d74c>])  [? vtime_account_user+0x4f/0x60
[    9.938194](<ffffffff810908ef>])  [? context_tracking_user_exit+0x52/0xc0
[    9.938203](<ffffffff81116502>])  [do_group_exit+0x3a/0xa0
[    9.938211](<ffffffff8105ed2a>])  [SyS_exit_group+0xf/0x10
[    9.938220](<ffffffff8105ed9f>])  [<ffffffff814ea907>] tracesys+0xdd/0xe2

Needs either fix in the kernel, or some special case in vchan-xen code (use simple shm instead of Xen shared memory?).

Migrated-From: https://wiki.qubes-os.org/ticket/951

@marmarek marmarek added this to the Release 3 milestone Mar 8, 2015

@marmarek

This comment has been minimized.

Member

marmarek commented Apr 3, 2015

Currently vchan reject to open such connection, so at least there is no kernel crash. But still, it would be nice to fix this properly.

@marmarek marmarek modified the milestones: Release 3.1, Release 3.0 May 13, 2015

@esote

This comment has been minimized.

esote commented Nov 28, 2018

@marmarek Is this still an issue considering both Xen and Qubes have had newer releases since then? You say it might need a fix in the kernel, do you mean the Xen microkernel?

@marmarek

This comment has been minimized.

Member

marmarek commented Nov 28, 2018

@esote Yes, this is still an issue, exactly as originally described. Just checked on Xen 4.12-unstable and Linux 4.14.74. The one possible fix would be in Linux kernel, but not sure if that's the right thing to do.

BTW Many thanks @esote for reviewing and cleaning up old issues!

@esote

This comment has been minimized.

esote commented Nov 28, 2018

Yep, no problem. I usually don't have time to dive into code outside of college and work, so I figured cleaning up issues is the least I could do.

For patching the Linux kernel, how likely would it be to end up in a long term release (4.14 or 4.19) -- or would it be a patch only for Qubes' kernel?

I haven't looked at the vchan code, so I flipped a coin, assigning "kernel" as heads and "vchan" as tails, and it landed on heads. If that helps, because otherwise my input would be essentially a coin flip.

@marmarek

This comment has been minimized.

Member

marmarek commented Nov 29, 2018

Exact message from 4.14.74 kernel:

[1332916.029255] BUG: unable to handle kernel paging request at ffff880850d7b008
[1332916.029290] IP: __tlb_remove_page_size+0x29/0xc0
[1332916.029306] PGD 2a75067 P4D 2a75067 PUD 0 
[1332916.029325] Oops: 0002 [#1] SMP PTI
[1332916.029339] Modules linked in: fuse ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xen_netfront intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr intel_rapl_perf u2mfn(O) xen_gntdev xen_gntalloc xenfs xen_blkback xen_privcmd xen_evtchn xen_blkfront
[1332916.029457] CPU: 0 PID: 23450 Comm: strace Tainted: G           O    4.14.74-1.pvops.qubes.x86_64 #1
[1332916.029484] task: ffff880033f3bc80 task.stack: ffffc90003688000
[1332916.029506] RIP: 0010:__tlb_remove_page_size+0x29/0xc0
[1332916.029523] RSP: 0018:ffffc9000368bca0 EFLAGS: 00010246
[1332916.029540] RAX: ffff880050d7b000 RBX: ffffc9000368bdd0 RCX: 0000000000000000
[1332916.029563] RDX: 00000000ffffffff RSI: ffffea0002ca7c00 RDI: ffffc9000368bdd0
[1332916.029587] RBP: ffffea0002ca7c00 R08: 00000000000247e0 R09: ffff88009a493898
[1332916.029610] R10: 00000000000fa000 R11: 0000000000000001 R12: 000058c11e2a0000
[1332916.029634] R13: 000058c11e2a1000 R14: ffffc9000368bdd0 R15: ffffea0002ca7c00
[1332916.029658] FS:  0000000000000000(0000) GS:ffff8800f9c00000(0000) knlGS:0000000000000000
[1332916.029682] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1332916.029702] CR2: ffff880850d7b008 CR3: 000000000220a006 CR4: 00000000001606f0
[1332916.029728] Call Trace:
[1332916.029742]  unmap_page_range+0x86c/0xc50
[1332916.029757]  unmap_vmas+0x4c/0xa0
[1332916.029772]  exit_mmap+0xb5/0x1c0
[1332916.029787]  mmput+0x5f/0x140
[1332916.029801]  do_exit+0x288/0xbb0
[1332916.029816]  ? __audit_syscall_entry+0xae/0x100
[1332916.029834]  ? syscall_trace_enter+0x1ae/0x2c0
[1332916.029851]  do_group_exit+0x3a/0xa0
[1332916.029865]  SyS_exit_group+0x10/0x10
[1332916.029879]  do_syscall_64+0x74/0x180
[1332916.035058]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[1332916.035077] RIP: 0033:0x7fa9a2c23a26
[1332916.035090] RSP: 002b:00007ffcbaf5dbe8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[1332916.035114] RAX: ffffffffffffffda RBX: 00007fa9a2d16740 RCX: 00007fa9a2c23a26
[1332916.035138] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[1332916.035161] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
[1332916.035185] R10: 00007ffcbaf5da74 R11: 0000000000000246 R12: 00007fa9a2d16740
[1332916.035210] R13: 0000000000000001 R14: 00007fa9a2d1f448 R15: 0000000000000000
[1332916.035234] Code: 00 00 0f 1f 44 00 00 48 83 7f 18 00 74 4a 55 53 39 97 84 00 00 00 75 42 48 8b 47 28 48 89 f5 48 89 fb 8b 50 08 8d 4a 01 89 48 08 <48> 89 74 d0 10 8b 50 0c 39 d1 74 09 31 c0 39 ca 72 21 5b 5d c3 
[1332916.035318] RIP: __tlb_remove_page_size+0x29/0xc0 RSP: ffffc9000368bca0
[1332916.035338] CR2: ffff880850d7b008
[1332916.035353] ---[ end trace b871d7772ace7b61 ]---
[1332916.035370] Kernel panic - not syncing: Fatal exception
[1332916.035576] Kernel Offset: disabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment