Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drgn reads zeros for memory allocated by vm_map_ram() #217

Closed
brenns10 opened this issue Oct 26, 2022 · 4 comments
Closed

Drgn reads zeros for memory allocated by vm_map_ram() #217

brenns10 opened this issue Oct 26, 2022 · 4 comments

Comments

@brenns10
Copy link
Contributor

I've tested this on Oracle UEK 5 and 7 (4.14 and 5.15 based). I'll try to reproduce this on mainline as well, maybe hacking it into the vmtest for a quick check. I had a coworker report that drgn was reading all zero's for a structure. We determined the memory address came from the vmalloc subsystem, particularly vm_map_ram(). I went ahead and created a reproducer kernel module to demo the issue.

The module allocates some pages, uses vm_map_ram() to map them, and then writes a pattern of data. When opened with crash, the pattern written is visible. But when viewed in drgn, the pattern is read as all 0's:

# dmesg after insmod
[  962.825481] drgn_vmalloc_test: 0xffff9e43c2400000

# drgn output
>>> prog.read(0xffff9e43c2400000, 64)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

# crash output
crash> rd 0xffff9e43c2400000 8
ffff9e43c2400000:  0000000100000000 0000000300000002   ................
ffff9e43c2400010:  0000000500000004 0000000700000006   ................
ffff9e43c2400020:  0000000900000008 0000000b0000000a   ................
ffff9e43c2400030:  0000000d0000000c 0000000f0000000e   ................

@osandov, if you have an idea what the root cause is, without too much work, then by all means let me know. But otherwise I thought it might be a useful way for me to explore the memory reader subsystem and learn a bit more about drgn internals.

@osandov
Copy link
Owner

osandov commented Oct 26, 2022

This is on the live kernel, right? My wild guess is that this is a bug in /proc/kcore, because drgn isn't super fancy about reading memory. Do you get back the correct results if you read with access_remote_vm(prog["init_mm"].address_of_(), address, size) instead of prog.read()? That will translate the virtual address to a physical address from the kernel page table and then read based on the physical address. If that works, it's very likely a bug in /proc/kcore.

@brenns10
Copy link
Contributor Author

Yeah - live kernel. I tracked it through the /proc/kcore reads with some printf debugging and was coming to that conclusion myself.

The access_remote_vm() worked! So I guess it's time to look into the kernel for the bug.

I'm rather impressed that crash is doing the virt to phys translation manually given that /proc/kcore seems to have an ELF segment for this region of memory. Maybe the original /dev/mem did not, and crash was architected around that?

@brenns10
Copy link
Contributor Author

The issue must be in vread() not supporting memory which comes from vm_map_ram(), if the allocation is small it comes from vb_alloc().

@brenns10
Copy link
Contributor Author

As an FYI, looks like there will be an upstream fix for this!
https://lore.kernel.org/linux-mm/87ilk6gos2.fsf@oracle.com/T/#u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants