Skip to content

Conversation

@clairechingching
Copy link
Contributor

@clairechingching clairechingching commented Nov 7, 2025

This proposal adds allows-misaligned-mem-access target feature to BPF target that lets users enable allowing misaligned memory accesses.

The motivation behind the proposal is user space eBPF VMs (interpreters or JITs running in user space) typically run on real CPUs where unaligned memory accesses are acceptable (or handled efficiently) and can be enabled to simplify lowering and improve performance. In contrast, kernel eBPF must obey verifier constraints and platform-specific alignment restrictions.

A new CLI option keeps kernel behavior unchanged while giving userspace VMs an explicit opt-in to enable more permissive codegen. It supports both use-cases without diverging codebases.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@alexrp
Copy link
Member

alexrp commented Nov 7, 2025

Most backends use a subtarget feature to indicate whether unaligned access is allowed. See AArch64, AMDGPU, ARM, LoongArch, etc. Besides consistency, this is also significantly more convenient for frontends, especially those that produce bitcode directly (e.g. Zig), and less problematic for library users too as it avoids global state.


On a more general note (not directed at you @clairechingching!): I don't know what the community consensus is on this (if there is one), but IMHO, backends severely over(ab)use cl::opt for options that should be subtarget features, and putting on my frontend maintainer hat for a second, I'd really prefer if all backend maintainers made an effort to avoid this going forward.

@clairechingching
Copy link
Contributor Author

Hey @yonghong-song I tried moving the -bpf-allow-misaligned-mem-access option into BPFSubtarget, but realized that the parameters managed there are mostly about CPU instruction-set features. It seems more consistent to keep -bpf-allow-misaligned-mem-access where it currently is next to -bpf-expand-memcpy-in-order, but I’m curious what you think.

@yonghong-song
Copy link
Contributor

misaligned memory access is bad for performance and may have issues for verification (or make verification more complex). Do you have concrete C code to illustrate this? Can the C code easily converted to aligned memory access?

cc @4ast

@clairechingching
Copy link
Contributor Author

@yonghong-song The kernel verifier is indeed very restrictive and for good reason. This feature is intended for user-space eBPF, where the decision to allow misaligned access is up to the implementer. In such environments, allowing misaligned accesses is far more performant as it drastically reduces the number of instructions required to perform common memory operations. By making it optional, we leave kernel BPF behavior unchanged while allowing the implementer to improve user space performance when the platform supports it.

@alexrp
Copy link
Member

alexrp commented Nov 10, 2025

Most backends use a subtarget feature to indicate whether unaligned access is allowed.

Just to back up this point:

❯ git grep -E 'SubtargetFeature<".*(align|ual).*"' */*.td
AArch64/AArch64Features.td:696:26:def FeatureStrictAlign : SubtargetFeature<"strict-align",
AArch64/AArch64Features.td:738:37:def FeatureSlowMisaligned128Store : SubtargetFeature<"slow-misaligned-128store",
AArch64/AArch64Features.td:874:29:def FeatureLdpAlignedOnly : SubtargetFeature<"ldp-aligned-only", "HasLdpAlignedOnly",
AArch64/AArch64Features.td:877:29:def FeatureStpAlignedOnly : SubtargetFeature<"stp-aligned-only", "HasStpAlignedOnly",
AArch64/AArch64Features.td:880:46:def FeatureUseFixedOverScalableIfEqualCost : SubtargetFeature<"use-fixed-over-scalable-if-equal-cost",
AMDGPU/AMDGPU.td:107:36:def FeatureUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access",
AMDGPU/AMDGPU.td:119:37:def FeatureUnalignedScratchAccess : SubtargetFeature<"unaligned-scratch-access",
AMDGPU/AMDGPU.td:125:32:def FeatureUnalignedDSAccess : SubtargetFeature<"unaligned-ds-access",
AMDGPU/AMDGPU.td:238:31:def FeatureLdsMisalignedBug : SubtargetFeature<"lds-misaligned-bug",
AMDGPU/AMDGPU.td:343:48:def FeatureNegativeUnalignedScratchOffsetBug : SubtargetFeature<"negative-unaligned-scratch-offset-bug",
AMDGPU/AMDGPU.td:419:35:def FeatureRequiresAlignedVGPRs : SubtargetFeature<"vgpr-align2",
AMDGPU/AMDGPU.td:1201:34:def FeatureBVHDualAndBVH8Insts : SubtargetFeature<"bvh-dual-bvh-8-insts",
AMDGPU/AMDGPU.td:1357:34:def FeatureUnalignedAccessMode : SubtargetFeature<"unaligned-access-mode",
ARM/ARMFeatures.td:339:29:def FeatureCheckVLDnAlign : SubtargetFeature<"vldn-align", "CheckVLDnAccessAlignment",
ARM/ARMFeatures.td:375:34:def FeaturePreferBranchAlign32 : SubtargetFeature<"loop-align", "PreferBranchLogAlignment","2",
ARM/ARMFeatures.td:378:34:def FeaturePreferBranchAlign64 : SubtargetFeature<"branch-align-64", "PreferBranchLogAlignment","3",
ARM/ARMFeatures.td:449:29:def FeatureVirtualization : SubtargetFeature<"virtualization",
ARM/ARMFeatures.td:457:29:def FeatureStrictAlign    : SubtargetFeature<"strict-align",
LoongArch/LoongArch.td:111:7:    : SubtargetFeature<"ual", "HasUAL", "true",
Mips/Mips.td:212:7:    : SubtargetFeature<"strict-align", "StrictAlign", "true",
PowerPC/PPC.td:241:3:  SubtargetFeature<"allow-unaligned-fp-access", "AllowsUnalignedFPAccess",
RISCV/RISCVFeatures.td:1781:6:   : SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
RISCV/RISCVFeatures.td:1786:6:   : SubtargetFeature<"unaligned-vector-mem", "EnableUnalignedVectorMem",
X86/X86.td:198:30:def FeatureSSEUnalignedMem : SubtargetFeature<"sse-unaligned-mem",
X86/X86.td:493:25:def TuningSlowUAMem16 : SubtargetFeature<"slow-unaligned-mem-16",
X86/X86.td:497:25:def TuningSlowUAMem32 : SubtargetFeature<"slow-unaligned-mem-32",

@clairechingching
Copy link
Contributor Author

Apologies for the rebase noise! I realized I initially submitted the PR against the release branch I'm targeting and not main. I've just rebased on main!

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@yonghong-song
Copy link
Contributor

yonghong-song commented Nov 11, 2025

In linux kernel, BPF_F_ANY_ALIGNMENT flag indicates the verifier will handle misalignment. This is not on by default and need user space to enable that during program load. I did a hack below in kernel to enable BPF_F_ANY_ALIGNMENT by default like below:

diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 42ae8d595c2c..0902d6dd9cb6 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -235,7 +235,7 @@ int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr)
            attr->prog_type != BPF_PROG_TYPE_XDP)
                return -EINVAL;
 
-       if (attr->prog_flags & ~(BPF_F_XDP_DEV_BOUND_ONLY | BPF_F_XDP_HAS_FRAGS))
+       if (attr->prog_flags & ~(BPF_F_XDP_DEV_BOUND_ONLY | BPF_F_XDP_HAS_FRAGS | BPF_F_ANY_ALIGNMENT))
                return -EINVAL;
 
        /* Frags are allowed only if program is dev-bound-only, but not
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f62d61b6730a..85b6f0123032 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2898,6 +2898,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
                                 BPF_F_TOKEN_FD))
                return -EINVAL;
 
+       if (!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT))
+               attr->prog_flags |= BPF_F_ANY_ALIGNMENT;
+
        bpf_prog_load_fixup_attach_type(attr);
 
        if (attr->prog_flags & BPF_F_TOKEN_FD) {

With the above kernel hack, I enabled the unaligned memory access by default like below

diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index 3c61216cd932..5cab9967b2e8 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -274,6 +274,18 @@ BPFTargetLowering::getConstraintType(StringRef Constraint) const {
   return TargetLowering::getConstraintType(Constraint);
 }
 
+bool BPFTargetLowering::allowsMisalignedMemoryAccesses(EVT VT,
+                                                       unsigned AddrSpace,
+                                                       Align A,
+                                                       MachineMemOperand::Flags,
+                                                       unsigned *Fast) const {
+  if (Fast) {
+    // It's fast anytime on VE
+    *Fast = 1;
+  }
+  return true;
+}
+
 std::pair<unsigned, const TargetRegisterClass *>
 BPFTargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,
                                                 StringRef Constraint,
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 3d6e7c70df28..322ac798804d 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -54,6 +54,10 @@ public:
 
   unsigned getJumpTableEncoding() const override;
 
+  bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS, Align A,
+                                      MachineMemOperand::Flags Flags,
+                                      unsigned *Fast) const override;
+
 private:
   // Control Instruction Selection Features
   bool HasAlu32;

With the combination of the above llvm and kernel changes, I tried and still some selftest failures. Some kernel changes will be needed or compiler needs to be more selective allows misaligned memory access.

Since the kernel already supports misalignment, I off-line discussed with @4ast and we think maybe it is a good idea to have kernel misalignment support on by default since kernel has support for BPF_F_ANY_ALIGNMENT already.

In that sense, llvm support for misalignment sounds a good idea. Please check BTF.td file which already has a few feature examples.

@4ast
Copy link
Member

4ast commented Nov 11, 2025

With the combination of the above llvm and kernel changes, all selftests (at cpu v3 level) passed.

what tests failed without kernel hack?
If we enable it by default, we will still need a subtarget flag to disable it.

@yonghong-song
Copy link
Contributor

With the combination of the above llvm and kernel changes, all selftests (at cpu v3 level) passed.

what tests failed without kernel hack?

A bunch of failures like below:

...
11: (b7) r5 = 0                       ; R5=0
12: (7b) *(u64 *)(r10 -20) = r5
misaligned stack access off 0+0+-20 size 8
processed 13 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0                      -- END PROG LOAD LOG --
libbpf: prog '_xdp_tx_iptunnel': failed to load: -EACCES

or

...
24: (16) if w1 == 0x1000000 goto pc+1 26: R0=0 R1=0x1000000 R6=0 R7=ctx() R10=fp0 fp-8=mmmm????
; if (ctx->msg_src_ip6[3] == bpf_htonl(1) || @ sendmsg6_prog.c:40
26: (18) r1 = 0x600000000000000       ; R1=0x600000000000000
; ctx->msg_src_ip6[2] = bpf_htonl(SRC_REWRITE_IP6_2); @ sendmsg6_prog.c:44
28: (7b) *(u64 *)(r7 +52) = r1
invalid bpf_context access off=52 size=8
processed 30 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'sendmsg_v6_prog': failed to load: -EACCES

etc.

If we enable it by default, we will still need a subtarget flag to disable it.

Agree.

@clairechingching
Copy link
Contributor Author

@4ast @yonghong-song @alexrp hey I've moved it to subtarget as that seems to be the preference, but I haven't made allowing misalignment the default. I'm happy to do that but below tests will fail because hardcode specific alignments etc

Failed Tests (10):
LLVM :: CodeGen/BPF/cc_args.ll
LLVM :: CodeGen/BPF/cc_args_be.ll
LLVM :: CodeGen/BPF/ex1.ll
LLVM :: CodeGen/BPF/memcmp.ll
LLVM :: CodeGen/BPF/pr57872.ll
LLVM :: CodeGen/BPF/rodata_1.ll
LLVM :: CodeGen/BPF/rodata_2.ll
LLVM :: CodeGen/BPF/rodata_4.ll
LLVM :: CodeGen/BPF/store_imm.ll
LLVM :: CodeGen/BPF/undef.ll

@clairechingching clairechingching changed the title [BPF] Add CLI option to enable misaligned memory access [BPF] add allows-misaligned-mem-access target feature Nov 12, 2025
@yonghong-song
Copy link
Contributor

Let us have allowing misaligned memory access off by default now and your pull request already did this. I will need to sort out in kernel side. If kernel side eventually allows misaligned memory access by default, at this point, we can flip to have llvm misaligned memory access on by default.

BTW, there is a format issue flagged by llvm test. Please fix.

@yonghong-song
Copy link
Contributor

For failed tests you mentioned above with misaligned access by default, we can add flag to disable misaligned access for this tests. This will be future work.

@yonghong-song
Copy link
Contributor

My previous experiments with llvm mis-alignment support plus kernel change actually does not solve verification failure. (Maybe I accidentally used the wrong compiler).

For one example, in kernel/bpf/verifier.c, we have

static int check_ptr_alignment(struct bpf_verifier_env *env,
                               const struct bpf_reg_state *reg, int off,
                               int size, bool strict_alignment_once)
{
        bool strict = env->strict_alignment || strict_alignment_once;
        const char *pointer_desc = "";

        switch (reg->type) {
        case PTR_TO_PACKET:
        case PTR_TO_PACKET_META:
                /* Special case, because of NET_IP_ALIGN. Given metadata sits
                 * right in front, treat it the very same way.
                 */
                return check_pkt_ptr_alignment(env, reg, off, size, strict);
        case PTR_TO_FLOW_KEYS:
                pointer_desc = "flow keys ";
                break;
        case PTR_TO_MAP_KEY:
                pointer_desc = "key ";
                break;
        case PTR_TO_MAP_VALUE:
                pointer_desc = "value ";
                break;
        case PTR_TO_CTX:
                pointer_desc = "context ";
                break;
        case PTR_TO_STACK:
                pointer_desc = "stack ";
                /* The stack spill tracking logic in check_stack_write_fixed_off()
                 * and check_stack_read_fixed_off() relies on stack accesses being
                 * aligned.
                 */
                strict = true;
                break;
        case PTR_TO_SOCKET:
                pointer_desc = "sock ";
                break;  
        case PTR_TO_SOCK_COMMON:
                pointer_desc = "sock_common ";
                break;
        case PTR_TO_TCP_SOCK:
                pointer_desc = "tcp_sock ";
                break;
        case PTR_TO_XDP_SOCK:
                pointer_desc = "xdp_sock ";
                break;
        case PTR_TO_ARENA:
                return 0;
        default:
                break;
        }
        return check_generic_ptr_alignment(env, reg, pointer_desc, off, size,
                                           strict);
}

Note that for case PTR_TO_STACK, where 'strict' is set to be true to enforce aligned memory access. If the llvm generates mis-aligned stack access, verification will fail. Need more investigation.

This enables misaligned memory access when the feature is enabled
@clairechingching
Copy link
Contributor Author

all feedback addressed

@yonghong-song yonghong-song merged commit fb2563d into llvm:main Nov 13, 2025
10 checks passed
@github-actions
Copy link

@clairechingching Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants