Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STRICT RWX breaks live patching because of late relocations #375

Open
iamjpn opened this issue Aug 14, 2021 · 17 comments
Open

STRICT RWX breaks live patching because of late relocations #375

iamjpn opened this issue Aug 14, 2021 · 17 comments
Assignees
Labels
in-progress Some work has been done on this issue

Comments

@iamjpn
Copy link

iamjpn commented Aug 14, 2021

Commit linuxppc/linux@88fc078 ("x86/module: Use text_poke() for late relocations"):

Because of late module patching, a livepatch module needs to be able to
    apply some of its relocations well after it has been loaded.  Instead of
    playing games with module_{dis,en}able_ro(), use existing text poking
    mechanisms to apply relocations after module loading.
    
    So far only x86, s390 and Power have HAVE_LIVEPATCH but only the first
    two also have STRICT_MODULE_RWX.
    
    This will allow removal of the last module_disable_ro() usage in
    livepatch.  The ultimate goal is to completely disallow making
    executable mappings writable.

Now that we have STRICT_MODULE_RWX, I think arch/powerpc/kernel/module_64.c, etc need to be updated to use code patching like in commit linuxppc/linux@be24226 ("s390/module: Use s390_kernel_write() for late relocations").

@mpe
Copy link
Member

mpe commented Nov 1, 2021

@mpe mpe added the bug It's a bug label Nov 1, 2021
@ruscur
Copy link

ruscur commented Nov 2, 2021

Using the poke area is working but we'll need to implement a memcpy-that-patches like x86 and s390 since patch_instruction() is too inflexible.

@mpe
Copy link
Member

mpe commented Nov 2, 2021

More info in dynup/kpatch#1228

@mpe mpe added the in-progress Some work has been done on this issue label Nov 2, 2021
@mpe mpe added this to the v5.16 milestone Nov 2, 2021
@mpe mpe changed the title Use code patching for late relocations? STRICT RWX breaks live patching because of late relocations Nov 4, 2021
@ruscur
Copy link

ruscur commented Nov 16, 2021

WIP tree https://github.com/ruscur/linux/commits/poke_write_klp-convert if anyone wants to take a look.

LIVEPATCH depends on HAVE_DYNAMIC_FTRACE_WITH_REGS depends on MPROFILE_KERNEL depends on PPC64 && CPU_LITTLE_ENDIAN && FUNCTION_TRACER - so we shouldn't have to care about 32bit, right?

Is there anything else that would apply_relocate_add() on an already initialised module?

@chleroy
Copy link

chleroy commented Nov 16, 2021

For 32 bits we have a proposed series, so I think you should also handle 32 bits.

See https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=269339&state=*

@mpe
Copy link
Member

mpe commented Nov 16, 2021

We need a minimal (as possible) backportable fix, so that should be 64-bit only. We can do 32-bit as a subsequent patch.

@ruscur
Copy link

ruscur commented Nov 16, 2021

I can make a fix way smaller (only changing the paths that I can get livepatching to touch on active modules - which thus far I think is just R_PPC_REL24) but I'll need to spin up more livepatch testing stuff to make sure.

Happy enough to do module_32 as well, but may need help testing it.

@chleroy
Copy link

chleroy commented Nov 17, 2021

No worry, I can also do the 32 bits later, the only thing I'd like is that if some code is meant to be common to both PPC32 and PPC64, please put it in module.c right now to avoid having to move it later.

By the way, if we are looking for a minimal backport, do we need that poke_write() stuff ? Isn't it possible to just use patch_instruction(), allthough that's less performant ? After all it is only now and then that we have to load a livepatch module, so it's probably not a big issue if it takes hundred more milliseconds.

@ruscur
Copy link

ruscur commented Nov 20, 2021

Yes, I'll just use patch_instruction() for a minimal backport.

poke_write() lives in lib/code-patching.cso it can be used by both, everything in module_64.c is just replacing direct writes. We could potentially have a generic apply_relocate_add() but that's another story.

@joe-lawrence
Copy link

I can make a fix way smaller (only changing the paths that I can get livepatching to touch on active modules - which thus far I think is just R_PPC_REL24) but I'll need to spin up more livepatch testing stuff to make sure.

Hi @ruscur, let me know if you need to see any more livepatch examples or tests, but see the attached module that includes a few klp-relocation types:

% readelf --wide --relocs livepatch-data-read-mostly.ko | awk '/.klp.sym./' RS="\n\n" ORS="\n\n"
...
Relocation section '.klp.rela.vmlinux..text.__netif_receive_skb_core.constprop.0' at offset 0x2502f0 contains 5 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
000000000000023c  0000009b0000000a R_PPC64_REL24          0000000000000000 .klp.sym.vmlinux.do_xdp_generic.part.0,1 + 0
...

Relocation section '.klp.rela.vmlinux..toc' at offset 0x250368 contains 64 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
00000000000000a0  0000011600000026 R_PPC64_ADDR64         0000000000000000 .klp.sym.vmlinux.ptype_all,0 + 0
...

Relocation section '.klp.rela.vmlinux.__jump_table' at offset 0x250968 contains 6 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000008  000000a50000002c R_PPC64_REL64          0000000000000000 .klp.sym.vmlinux.netstamp_needed_key,1 + 0
...

(This one was generated by the kpatch build tools and not the klp-convert selftests in my kernel tree.)
livepatch-data-read-mostly.ko.tar.gz

@ruscur
Copy link

ruscur commented Nov 24, 2021

I don't think putting everything through patch_instruction()is a great solution. The code is gross (I'm sure I could've written it nicer, but it's always going to be messy), extending that to cover more cases (more relocation types as above, what about create_ftrace_stub(), etc) gets to the point where it's of similar complexity to a full solution while being slower and harder to understand.

@chleroy
Copy link

chleroy commented Nov 24, 2021

Ok, let see what we can get with that poke_write(), limited to the strict necessary for fixing the current issue and backporting.

By the way, I don't really like the name poke_write(), which is too close to the text_poke() we have on x86 and far from the patch_instruction() we have on powerpc. What about patch_data() or patch_memory() or patch_raw() instead ?

@ruscur
Copy link

ruscur commented Nov 24, 2021

Sure, not married to the name.

I still don't know what the smallest viable scope is, i.e. which relocation types are impossible for kpatch to generate, or the ftrace stuff.

@joe-lawrence
Copy link

I still don't know what the smallest viable scope is, i.e. which relocation types are impossible for kpatch to generate, or the ftrace stuff.

Updated testing results with the [PATCH] powerpc/module_64: Fix livepatching for RO modules patch:

Test1 - add to klp-convert patchset

  • all livepatching selftests pass 👍

Test2 - backport on top of rhel-9's 5.14.0-16.el9

  • kpatch-build of read-data-mostly.patch (as in previous comment)
  • livepatch-data-read-mostly.ko loads without issue 👍

I was initially surprised that the second test passed after posting those additional relocation types above. However, after it did successfully load, I considered that the other relocation types (R_PPC64_ADDR64 and R_PPC64_REL64) must not be locked down by STRICT_MODULE_RWX. I looked at a few other big ppc64le kpatches that we have recently shipped and noticed the same pattern:

  1. R_PPC64_REL24 to symbols in other .text sections
  2. R_PPC64_ADDR64 to symbols thru .TOC
  3. R_PPC64_REL64 to static key symbols

The kpatch-build code that converts relocations to klp-relocations lives in need_dynrela() ("dynarela" is kpatch parlance for "klp-relocation") and it returns false for a bunch of other ppc64-specifc relocation types at the beginning. I'm not really sure how to determine the range of relocation types that gcc may generate for various scenarios, so perhaps the minimal fix for R_PPC64_REL24 is the way to go until we find out otherwise?

Adding @jpoimboe and @kamalesh-babulal, who implemented most of the ppc64le kpatch-build code, in case I missed anything obvious.

@mpe
Copy link
Member

mpe commented Dec 20, 2021

v2 minimal patch with error checking: https://lore.kernel.org/all/20211214121248.777249-1-mpe@ellerman.id.au

Merged as torvalds/linux@8734b41b3efe0fc6082

I'll leave the issue open until we get the more full featured solution merged.

@mpe mpe modified the milestones: v5.16, v5.19 Mar 9, 2022
@chleroy
Copy link

chleroy commented Mar 10, 2022

PPC32 fix merged as linuxppc/linux@0c85096

@mpe mpe added blocked Blocked due to something backlog Not being actively worked, needs some work to revive. and removed in-progress Some work has been done on this issue bug It's a bug labels Aug 15, 2022
@mpe mpe modified the milestones: v5.19, v6.1 Aug 15, 2022
@mpe
Copy link
Member

mpe commented Sep 1, 2022

@joe-lawrence mentioned that this is probably causing more issues: https://lore.kernel.org/linuxppc-dev/YxAc87dTmclHGCUy@redhat.com/

We probably need to do a full solution.

@ruscur ruscur added in-progress Some work has been done on this issue and removed backlog Not being actively worked, needs some work to revive. labels Sep 8, 2022
@mpe mpe modified the milestones: v6.1, v6.4 Mar 1, 2023
@mpe mpe removed this from the v6.4 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-progress Some work has been done on this issue
Projects
Status: 🏗 In progress
Development

No branches or pull requests

5 participants