[BPF] add cast_{user,kern} instructions #79902

eddyz87 · 2024-01-29T21:26:00Z

This commit aims to support BPF arena kernel side feature:

arena is a memory region accessible from both BPF program and userspace;
base pointers for this memory region differ between kernel and user spaces;
dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space) translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants;
number 0 is assigned to kernel address space;
number 1 is assigned to user address space.

On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs:

assume that pointers with non-zero address space are pointers to
arena memory;
assume that arena is identified by address space number;
assume that address space zero corresponds to kernel address space;
assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using addr_space_cast(src_reg, 0, 1);

Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation.

For example, the following C code:

   #define __as __attribute__((address_space(1)))
   void copy(int __as *from, int __as *to) { *to = *from; }

Compiled to the following IR:

    define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
    entry:
      %0 = load i32, ptr addrspace(1) %from, align 4
      store i32 %0, ptr addrspace(1) %to, align 4
      ret void
    }

Is transformed to:

    %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
    %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
    %0 = load i32, ptr %from1, align 4, !tbaa !3
    store i32 %0, ptr %to2, align 4, !tbaa !3
    ret void

And compiled as:

    r2 = addr_space_cast(r2, 0, 1)
    r1 = addr_space_cast(r1, 0, 1)
    r1 = *(u32 *)(r1 + 0)
    *(u32 *)(r2 + 0) = r1
    exit

github-actions · 2024-01-29T21:29:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space occasionally consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. It is not shared with user space. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The memory is not shared with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff llvm/llvm-project#79902 taught LLVM BPF backend to generate the bpf_cast_kern() instruction before dereference of the arena pointer and the bpf_cast_user() instruction when the arena pointer is formed. In a typical bpf program there will be very few bpf_cast_user(). From LLVM's point of view, arena pointers are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_cast_kern(rY, addr_space) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_cast_user(rY, addr_space) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts cast_user to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rX) rX |= arena->user_vm_start & ~(u64)~0U; After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org>

llvmbot · 2024-02-07T23:03:49Z

@llvm/pr-subscribers-clang

Author: None (eddyz87)

Changes

This commit aims to support BPF arena kernel side feature [0]:

arena is a memory region accessible from both BPF program and userspace;
base pointers for this memory region differ between kernel and user spaces;
dst_reg = cast_kern(src_reg, addr_space_no) translates src_reg, user space pointer within arena, to dst_reg, equivalent kernel space pointer within arena (both pointers have identical offset from arena start), addr_space_no is an immediate constant, used to identify the particular arena;
dst_reg = cast_user(src_reg, addr_space_no) is similar but in opposite direction: converts kernel space arena pointer to user space arena pointer.

On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs:

assume that pointers with non-zero address space are pointers to
arena memory;
assume that arena is identified by address space number;
assume that address space zero corresponds to kernel address space;
assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using cast_kern;

Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation.

For example, the following C code:

void copy(int __as *from, int __as *to) { *to = *from; }

Compiled to the following IR:

define void @<!-- -->copy(ptr addrspace(1) %from, ptr addrspace(1) %to) {
entry:
  %0 = load i32, ptr addrspace(1) %from, align 4
  store i32 %0, ptr addrspace(1) %to, align 4
  ret void
}

Is transformed to:

  %to2 = addrspacecast ptr addrspace(1) %to to ptr     ;; !
  %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; !
  %0 = load i32, ptr %from1, align 4, !tbaa !3
  store i32 %0, ptr %to2, align 4, !tbaa !3
  ret void

And compiled as:

  r2 = cast_kern(r2, 1)
  r1 = cast_kern(r1, 1)
  r1 = *(u32 *)(r1 + 0)
  *(u32 *)(r2 + 0) = r1
  exit

Patch is 33.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79902.diff

21 Files Affected:

(modified) clang/lib/Basic/Targets/BPF.cpp (+3)
(modified) clang/test/Preprocessor/bpf-predefined-macros.c (+8)
(modified) llvm/lib/Target/BPF/BPF.h (+8)
(added) llvm/lib/Target/BPF/BPFASpaceCastSimplifyPass.cpp (+92)
(modified) llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp (+116)
(modified) llvm/lib/Target/BPF/BPFISelLowering.cpp (+46)
(modified) llvm/lib/Target/BPF/BPFISelLowering.h (+3-1)
(modified) llvm/lib/Target/BPF/BPFInstrInfo.td (+25)
(modified) llvm/lib/Target/BPF/BPFTargetMachine.cpp (+5)
(modified) llvm/lib/Target/BPF/CMakeLists.txt (+1)
(added) llvm/test/CodeGen/BPF/addr-space-auto-casts.ll (+78)
(added) llvm/test/CodeGen/BPF/addr-space-cast.ll (+22)
(added) llvm/test/CodeGen/BPF/addr-space-gep-chain.ll (+25)
(added) llvm/test/CodeGen/BPF/addr-space-globals.ll (+30)
(added) llvm/test/CodeGen/BPF/addr-space-globals2.ll (+25)
(added) llvm/test/CodeGen/BPF/addr-space-phi.ll (+52)
(added) llvm/test/CodeGen/BPF/addr-space-simplify-1.ll (+19)
(added) llvm/test/CodeGen/BPF/addr-space-simplify-2.ll (+21)
(added) llvm/test/CodeGen/BPF/addr-space-simplify-3.ll (+26)
(added) llvm/test/CodeGen/BPF/addr-space-simplify-4.ll (+21)
(added) llvm/test/CodeGen/BPF/addr-space-simplify-5.ll (+25)

diff --git a/clang/lib/Basic/Targets/BPF.cpp b/clang/lib/Basic/Targets/BPF.cpp
index e3fbbb720d0694..26a54f631fcfc4 100644
--- a/clang/lib/Basic/Targets/BPF.cpp
+++ b/clang/lib/Basic/Targets/BPF.cpp
@@ -35,6 +35,9 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts,
     Builder.defineMacro("__BPF_CPU_VERSION__", "0");
     return;
   }
+
+  Builder.defineMacro("__BPF_FEATURE_ARENA_CAST");
+
   if (CPU.empty() || CPU == "generic" || CPU == "v1") {
     Builder.defineMacro("__BPF_CPU_VERSION__", "1");
     return;
diff --git a/clang/test/Preprocessor/bpf-predefined-macros.c b/clang/test/Preprocessor/bpf-predefined-macros.c
index ff4d00ac3bcfcc..fea24d1ea0ff7b 100644
--- a/clang/test/Preprocessor/bpf-predefined-macros.c
+++ b/clang/test/Preprocessor/bpf-predefined-macros.c
@@ -61,6 +61,9 @@ int r;
 #ifdef __BPF_FEATURE_ST
 int s;
 #endif
+#ifdef __BPF_FEATURE_ARENA_CAST
+int t;
+#endif
 
 // CHECK: int b;
 // CHECK: int c;
@@ -90,6 +93,11 @@ int s;
 // CPU_V4: int r;
 // CPU_V4: int s;
 
+// CPU_V1: int t;
+// CPU_V2: int t;
+// CPU_V3: int t;
+// CPU_V4: int t;
+
 // CPU_GENERIC: int g;
 
 // CPU_PROBE: int f;
diff --git a/llvm/lib/Target/BPF/BPF.h b/llvm/lib/Target/BPF/BPF.h
index 5c77d183e1ef3d..bbdbdbbde53228 100644
--- a/llvm/lib/Target/BPF/BPF.h
+++ b/llvm/lib/Target/BPF/BPF.h
@@ -66,6 +66,14 @@ class BPFIRPeepholePass : public PassInfoMixin<BPFIRPeepholePass> {
   static bool isRequired() { return true; }
 };
 
+class BPFASpaceCastSimplifyPass
+    : public PassInfoMixin<BPFASpaceCastSimplifyPass> {
+public:
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+
+  static bool isRequired() { return true; }
+};
+
 class BPFAdjustOptPass : public PassInfoMixin<BPFAdjustOptPass> {
 public:
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
diff --git a/llvm/lib/Target/BPF/BPFASpaceCastSimplifyPass.cpp b/llvm/lib/Target/BPF/BPFASpaceCastSimplifyPass.cpp
new file mode 100644
index 00000000000000..d32cecc073fab9
--- /dev/null
+++ b/llvm/lib/Target/BPF/BPFASpaceCastSimplifyPass.cpp
@@ -0,0 +1,92 @@
+//===------------ BPFIRPeephole.cpp - IR Peephole Transformation ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BPF.h"
+#include <optional>
+
+#define DEBUG_TYPE "bpf-aspace-simplify"
+
+using namespace llvm;
+
+namespace {
+
+struct CastGEPCast {
+  AddrSpaceCastInst *OuterCast;
+
+  // Match chain of instructions:
+  //   %inner = addrspacecast N->M
+  //   %gep   = getelementptr %inner, ...
+  //   %outer = addrspacecast M->N %gep
+  // Where I is %outer.
+  static std::optional<CastGEPCast> match(Value *I) {
+    auto *OuterCast = dyn_cast<AddrSpaceCastInst>(I);
+    if (!OuterCast)
+      return std::nullopt;
+    auto *GEP = dyn_cast<GetElementPtrInst>(OuterCast->getPointerOperand());
+    if (!GEP)
+      return std::nullopt;
+    auto *InnerCast = dyn_cast<AddrSpaceCastInst>(GEP->getPointerOperand());
+    if (!InnerCast)
+      return std::nullopt;
+    if (InnerCast->getSrcAddressSpace() != OuterCast->getDestAddressSpace())
+      return std::nullopt;
+    if (InnerCast->getDestAddressSpace() != OuterCast->getSrcAddressSpace())
+      return std::nullopt;
+    return CastGEPCast{OuterCast};
+  }
+
+  static PointerType *changeAddressSpace(PointerType *Ty, unsigned AS) {
+    return Ty->get(Ty->getContext(), AS);
+  }
+
+  // Assuming match(this->OuterCast) is true, convert:
+  //   (addrspacecast M->N (getelementptr (addrspacecast N->M ptr) ...))
+  // To:
+  //   (getelementptr ptr ...)
+  GetElementPtrInst *rewrite() {
+    auto *GEP = cast<GetElementPtrInst>(OuterCast->getPointerOperand());
+    auto *InnerCast = cast<AddrSpaceCastInst>(GEP->getPointerOperand());
+    unsigned AS = OuterCast->getDestAddressSpace();
+    auto *NewGEP = cast<GetElementPtrInst>(GEP->clone());
+    NewGEP->setName(GEP->getName());
+    NewGEP->insertAfter(OuterCast);
+    NewGEP->setOperand(0, InnerCast->getPointerOperand());
+    auto *GEPTy = cast<PointerType>(GEP->getType());
+    NewGEP->mutateType(changeAddressSpace(GEPTy, AS));
+    OuterCast->replaceAllUsesWith(NewGEP);
+    OuterCast->eraseFromParent();
+    if (GEP->use_empty())
+      GEP->eraseFromParent();
+    if (InnerCast->use_empty())
+      InnerCast->eraseFromParent();
+    return NewGEP;
+  }
+};
+
+} // anonymous namespace
+
+PreservedAnalyses BPFASpaceCastSimplifyPass::run(Function &F,
+                                                 FunctionAnalysisManager &AM) {
+  SmallVector<CastGEPCast, 16> WorkList;
+  bool Changed = false;
+  for (BasicBlock &BB : F) {
+    for (Instruction &I : BB)
+      if (auto It = CastGEPCast::match(&I))
+        WorkList.push_back(It.value());
+    Changed |= !WorkList.empty();
+
+    while (!WorkList.empty()) {
+      CastGEPCast InsnChain = WorkList.pop_back_val();
+      GetElementPtrInst *NewGEP = InsnChain.rewrite();
+      for (User *U : NewGEP->users())
+        if (auto It = CastGEPCast::match(U))
+          WorkList.push_back(It.value());
+    }
+  }
+  return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp b/llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp
index 81effc9b1db46c..edd59aaa6d01d2 100644
--- a/llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp
+++ b/llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp
@@ -14,6 +14,8 @@
 //     optimizations are done and those builtins can be removed.
 //   - remove llvm.bpf.getelementptr.and.load builtins.
 //   - remove llvm.bpf.getelementptr.and.store builtins.
+//   - for loads and stores with base addresses from non-zero address space
+//     cast base address to zero address space (support for BPF arenas).
 //
 //===----------------------------------------------------------------------===//
 
@@ -55,6 +57,7 @@ class BPFCheckAndAdjustIR final : public ModulePass {
   bool removeCompareBuiltin(Module &M);
   bool sinkMinMax(Module &M);
   bool removeGEPBuiltins(Module &M);
+  bool insertASpaceCasts(Module &M);
 };
 } // End anonymous namespace
 
@@ -416,11 +419,124 @@ bool BPFCheckAndAdjustIR::removeGEPBuiltins(Module &M) {
   return Changed;
 }
 
+// Wrap ToWrap with cast to address space zero:
+// - if ToWrap is a getelementptr,
+//   wrap it's base pointer instead and return a copy;
+// - if ToWrap is Instruction, insert address space cast
+//   immediately after ToWrap;
+// - if ToWrap is not an Instruction (function parameter
+//   or a global value), insert address space cast at the
+//   beginning of the Function F;
+// - use Cache to avoid inserting too many casts;
+static Value *aspaceWrapValue(DenseMap<Value *, Value *> &Cache, Function *F,
+                              Value *ToWrap) {
+  auto It = Cache.find(ToWrap);
+  if (It != Cache.end())
+    return It->getSecond();
+
+  if (auto *GEP = dyn_cast<GetElementPtrInst>(ToWrap)) {
+    Value *Ptr = GEP->getPointerOperand();
+    Value *WrappedPtr = aspaceWrapValue(Cache, F, Ptr);
+    auto *GEPTy = cast<PointerType>(GEP->getType());
+    auto *NewGEP = GEP->clone();
+    NewGEP->insertAfter(GEP);
+    NewGEP->mutateType(GEPTy->getPointerTo(0));
+    NewGEP->setOperand(GEP->getPointerOperandIndex(), WrappedPtr);
+    NewGEP->setName(GEP->getName());
+    Cache[ToWrap] = NewGEP;
+    return NewGEP;
+  }
+
+  IRBuilder IB(F->getContext());
+  if (Instruction *InsnPtr = dyn_cast<Instruction>(ToWrap))
+    IB.SetInsertPoint(*InsnPtr->getInsertionPointAfterDef());
+  else
+    IB.SetInsertPoint(F->getEntryBlock().getFirstInsertionPt());
+  auto *PtrTy = cast<PointerType>(ToWrap->getType());
+  auto *ASZeroPtrTy = PtrTy->getPointerTo(0);
+  auto *ACast = IB.CreateAddrSpaceCast(ToWrap, ASZeroPtrTy, ToWrap->getName());
+  Cache[ToWrap] = ACast;
+  return ACast;
+}
+
+// Wrap a pointer operand OpNum of instruction I
+// with cast to address space zero
+static void aspaceWrapOperand(DenseMap<Value *, Value *> &Cache, Instruction *I,
+                              unsigned OpNum) {
+  Value *OldOp = I->getOperand(OpNum);
+  if (OldOp->getType()->getPointerAddressSpace() == 0)
+    return;
+
+  Value *NewOp = aspaceWrapValue(Cache, I->getFunction(), OldOp);
+  I->setOperand(OpNum, NewOp);
+  // Check if there are any remaining users of old GEP,
+  // delete those w/o users
+  for (;;) {
+    auto *OldGEP = dyn_cast<GetElementPtrInst>(OldOp);
+    if (!OldGEP)
+      break;
+    if (!OldGEP->use_empty())
+      break;
+    OldOp = OldGEP->getPointerOperand();
+    OldGEP->eraseFromParent();
+  }
+}
+
+// Support for BPF arenas:
+// - for each function in the module M, update pointer operand of
+//   each memory access instruction (load/store/cmpxchg/atomicrmw)
+//   by casting it from non-zero address space to zero address space, e.g:
+//
+//   (load (ptr addrspace (N) %p) ...)
+//     -> (load (addrspacecast ptr addrspace (N) %p to ptr))
+//
+// - assign section with name .arena.N for globals defined in
+//   non-zero address space N
+bool BPFCheckAndAdjustIR::insertASpaceCasts(Module &M) {
+  bool Changed = false;
+  for (Function &F : M) {
+    DenseMap<Value *, Value *> CastsCache;
+    for (BasicBlock &BB : F) {
+      for (Instruction &I : BB) {
+        unsigned PtrOpNum;
+
+        if (auto *LD = dyn_cast<LoadInst>(&I))
+          PtrOpNum = LD->getPointerOperandIndex();
+        else if (auto *ST = dyn_cast<StoreInst>(&I))
+          PtrOpNum = ST->getPointerOperandIndex();
+        else if (auto *CmpXchg = dyn_cast<AtomicCmpXchgInst>(&I))
+          PtrOpNum = CmpXchg->getPointerOperandIndex();
+        else if (auto *RMW = dyn_cast<AtomicRMWInst>(&I))
+          PtrOpNum = RMW->getPointerOperandIndex();
+        else
+          continue;
+
+        aspaceWrapOperand(CastsCache, &I, PtrOpNum);
+      }
+    }
+    Changed |= !CastsCache.empty();
+  }
+  // Merge all globals within same address space into single
+  // .arena.<addr space no> section
+  for (GlobalVariable &G : M.globals()) {
+    if (G.getAddressSpace() == 0 || G.hasSection())
+      continue;
+    SmallString<16> SecName;
+    raw_svector_ostream OS(SecName);
+    OS << ".arena." << G.getAddressSpace();
+    G.setSection(SecName);
+    // Prevent having separate section for constants
+    G.setConstant(false);
+  }
+  return Changed;
+}
+
 bool BPFCheckAndAdjustIR::adjustIR(Module &M) {
   bool Changed = removePassThroughBuiltin(M);
   Changed = removeCompareBuiltin(M) || Changed;
   Changed = sinkMinMax(M) || Changed;
   Changed = removeGEPBuiltins(M) || Changed;
+  Changed = insertASpaceCasts(M) || Changed;
   return Changed;
 }
 
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index 4d8ace7c1ece02..0b6d4eccba1b86 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -24,6 +24,7 @@
 #include "llvm/CodeGen/ValueTypes.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/DiagnosticPrinter.h"
+#include "llvm/IR/IntrinsicsBPF.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
@@ -75,6 +76,8 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::STACKSAVE, MVT::Other, Expand);
   setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand);
 
+  setOperationAction(ISD::ADDRSPACECAST, MVT::i64, Custom);
+
   // Set unsupported atomic operations as Custom so
   // we can emit better error messages than fatal error
   // from selectiondag.
@@ -315,6 +318,8 @@ SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
     return LowerSDIVSREM(Op, DAG);
   case ISD::DYNAMIC_STACKALLOC:
     return LowerDYNAMIC_STACKALLOC(Op, DAG);
+  case ISD::ADDRSPACECAST:
+    return LowerADDRSPACECAST(Op, DAG);
   }
 }
 
@@ -638,6 +643,45 @@ SDValue BPFTargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
   return DAG.getMergeValues(Ops, SDLoc());
 }
 
+enum {
+  CAST_TO_KERNEL = 1,
+  CAST_TO_USER = 2,
+};
+
+SDValue BPFTargetLowering::LowerADDRSPACECAST(SDValue Op,
+                                              SelectionDAG &DAG) const {
+  auto *ACast = dyn_cast<AddrSpaceCastSDNode>(Op);
+  const SDValue &Ptr = ACast->getOperand(0);
+  unsigned SrcAS = ACast->getSrcAddressSpace();
+  unsigned DstAS = ACast->getDestAddressSpace();
+  SDLoc DL(Op);
+
+  if (SrcAS > 0 && DstAS > 0) {
+    SmallString<64> Msg;
+    raw_svector_ostream OS(Msg);
+    OS << "Can't cast address space " << SrcAS << " to " << DstAS
+       << ": either source or destination address space has to be zero";
+    fail(DL, DAG, Msg);
+    return Op.getOperand(0);
+  }
+
+  if (SrcAS == 0 && DstAS == 0)
+    return Op.getOperand(0);
+
+  unsigned Code;
+  unsigned ASpace;
+  if (SrcAS > 0) {
+    Code = CAST_TO_KERNEL;
+    ASpace = SrcAS;
+  } else {
+    Code = CAST_TO_USER;
+    ASpace = DstAS;
+  }
+  return DAG.getNode(BPFISD::ADDR_SPACE_CAST, DL, Op.getValueType(), Ptr,
+                     DAG.getTargetConstant(Code, DL, MVT::i64),
+                     DAG.getTargetConstant(ASpace, DL, MVT::i64));
+}
+
 SDValue BPFTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
   SDValue Chain = Op.getOperand(0);
   ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
@@ -687,6 +731,8 @@ const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const {
     return "BPFISD::Wrapper";
   case BPFISD::MEMCPY:
     return "BPFISD::MEMCPY";
+  case BPFISD::ADDR_SPACE_CAST:
+    return "BPFISD::ADDR_SPACE_CAST";
   }
   return nullptr;
 }
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 819711b650c15f..437f0e89af56b8 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -28,7 +28,8 @@ enum NodeType : unsigned {
   SELECT_CC,
   BR_CC,
   Wrapper,
-  MEMCPY
+  MEMCPY,
+  ADDR_SPACE_CAST,
 };
 }
 
@@ -75,6 +76,7 @@ class BPFTargetLowering : public TargetLowering {
 
   SDValue LowerSDIVSREM(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
+  SDValue LowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index 7d443a34490146..62e285dbc4c31e 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -31,6 +31,9 @@ def SDT_BPFMEMCPY       : SDTypeProfile<0, 4, [SDTCisVT<0, i64>,
                                                SDTCisVT<1, i64>,
                                                SDTCisVT<2, i64>,
                                                SDTCisVT<3, i64>]>;
+def SDT_BPFAddrSpaceCast : SDTypeProfile<0, 3, [SDTCisPtrTy<0>,
+                                                SDTCisVT<1, i64>,
+                                                SDTCisVT<2, i64>]>;
 
 def BPFcall         : SDNode<"BPFISD::CALL", SDT_BPFCall,
                              [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue,
@@ -49,6 +52,7 @@ def BPFWrapper      : SDNode<"BPFISD::Wrapper", SDT_BPFWrapper>;
 def BPFmemcpy       : SDNode<"BPFISD::MEMCPY", SDT_BPFMEMCPY,
                              [SDNPHasChain, SDNPInGlue, SDNPOutGlue,
                               SDNPMayStore, SDNPMayLoad]>;
+def BPFAddrSpaceCast  : SDNode<"BPFISD::ADDR_SPACE_CAST", SDT_BPFAddrSpaceCast>;
 def BPFIsLittleEndian : Predicate<"Subtarget->isLittleEndian()">;
 def BPFIsBigEndian    : Predicate<"!Subtarget->isLittleEndian()">;
 def BPFHasALU32 : Predicate<"Subtarget->getHasAlu32()">;
@@ -420,6 +424,27 @@ let Predicates = [BPFHasMovsx] in {
 }
 }
 
+def CAST_DIR {
+    int TO_KERN = 1;
+    int TO_USER = 2;
+}
+
+class ADDR_SPACE_CAST<int Code, string AsmPattern>
+    : ALU_RR<BPF_ALU64, BPF_MOV, 64,
+             (outs GPR:$dst),
+             (ins GPR:$src, i64imm:$imm),
+             AsmPattern,
+             []> {
+  bits<64> imm;
+  let Inst{47-32} = Code;
+  let Inst{31-0} = imm{31-0};
+}
+def CAST_KERN : ADDR_SPACE_CAST<CAST_DIR.TO_KERN, "$dst = cast_kern($src, $imm)">;
+def CAST_USER : ADDR_SPACE_CAST<CAST_DIR.TO_USER, "$dst = cast_user($src, $imm)">;
+
+def : Pat<(BPFAddrSpaceCast GPR:$src, CAST_DIR.TO_KERN, i64:$as), (CAST_KERN $src, $as)>;
+def : Pat<(BPFAddrSpaceCast GPR:$src, CAST_DIR.TO_USER, i64:$as), (CAST_USER $src, $as)>;
+
 def FI_ri
     : TYPE_LD_ST<BPF_IMM.Value, BPF_DW.Value,
                  (outs GPR:$dst),
diff --git a/llvm/lib/Target/BPF/BPFTargetMachine.cpp b/llvm/lib/Target/BPF/BPFTargetMachine.cpp
index 8a6e7ae3663e0d..2e956e6c25b1b1 100644
--- a/llvm/lib/Target/BPF/BPFTargetMachine.cpp
+++ b/llvm/lib/Target/BPF/BPFTargetMachine.cpp
@@ -121,6 +121,10 @@ void BPFTargetMachine::registerPassBuilderCallbacks(
           FPM.addPass(BPFPreserveStaticOffsetPass(false));
           return true;
         }
+        if (PassName == "bpf-aspace-simplify") {
+          FPM.addPass(BPFASpaceCastSimplifyPass());
+          return true;
+        }
         return false;
       });
   PB.registerPipelineStartEPCallback(
@@ -135,6 +139,7 @@ void BPFTargetMachine::registerPassBuilderCallbacks(
   PB.registerPeepholeEPCallback([=](FunctionPassManager &FPM,
                                     OptimizationLevel Level) {
     FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().hoistCommonInsts(true)));
+    FPM.addPass(BPFASpaceCastSimplifyPass());
   });
   PB.registerScalarOptimizerLateEPCallback(
       [=](FunctionPassManager &FPM, OptimizationLevel Level) {
diff --git a/llvm/lib/Target/BPF/CMakeLists.txt b/llvm/lib/Target/BPF/CMakeLists.txt
index d88e7ade40b9a0..cb21ed03a86c1e 100644
--- a/llvm/lib/Target/BPF/CMakeLists.txt
+++ b/llvm/lib/Target/BPF/CMakeLists.txt
@@ -24,6 +24,7 @@ add_llvm_target(BPFCodeGen
   BPFAbstractMemberAccess.cpp
   BPFAdjustOpt.cpp
   BPFAsmPrinter.cpp
+  BPFASpaceCastSimplifyPass.cpp
   BPFCheckAndAdjustIR.cpp
   BPFFrameLowering.cpp
   BPFInstrInfo.cpp
diff --git a/llvm/test/CodeGen/BPF/addr-space-auto-casts.ll b/llvm/test/CodeGen/BPF/addr-space-auto-casts.ll
new file mode 100644
index 00000000000000..08e11e861c71cb
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/addr-space-auto-casts.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt --bpf-check-and-opt-ir -S -mtriple=bpf-pc-linux < %s | FileCheck %s
+
+define void @simple_store(ptr addrspace(272) %foo) {
+; CHECK-LABEL: define void @simple_store(
+; CHECK-SAME: ptr addrspace(272) [[FOO:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[FOO1:%.*]] = addrspacecast ptr addrspace(272) [[FOO]] to ptr
+; CHECK-NEXT:    [[ADD_PTR2:%.*]] = getelementptr inbounds i8, ptr [[FOO1]], i64 16
+; CHECK-NEXT:    store volatile i32 57005, ptr [[ADD_PTR2]], align 4
+; CHECK-NEXT:    [[ADD_PTR13:%.*]] = getelementptr inbounds i8, ptr [[FOO1]], i64 12
+; CHECK-NEXT:    store volatile i32 48879, ptr [[ADD_PTR13]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %add.ptr = getelementptr inbounds i8, ptr addrspace(272) %foo, i64 16
+  store volatile i32 57005, ptr addrspace(272) %add.ptr, align 4
+  %add.ptr1 = getelementptr inbounds i8, ptr addrspace(272) %foo, i64 12
+  store volatile i32 48879, ptr addrspace(272) %add.ptr1, align 4
+  ret void
+}
+
+define void @separate_addr_store(ptr addrspace(272) %foo, ptr addrspace(272) %bar) {
+; CHECK-LABEL: define void @separate_addr_store(
+; CHECK-SAME: ptr addrspace(272) [[FOO:%.*]], ptr addrspace(272) [[BAR:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[BAR3:%.*]] = addrspacecast ptr addrspace(272) [[BAR]] to ptr
+; CHECK-NEXT:    [[FOO1:%.*]] = addrspacecast ptr addrspace(272) [[FOO]] to ptr
+; CHECK-NEXT:    [[ADD_PTR2:%.*]] = getelementptr inbounds i8, ptr [[FOO1]], i64 16
+; CHECK-NEXT:    store volatile i32 57005, ptr [[ADD_PTR2]], align 4
+; CHECK-NEXT:    [[ADD_PTR14:%.*]] = getelementptr inbounds i8, ptr [[BAR3]], i64 12
+; CHECK-NEXT:    store volatile i32 48879, ptr [[ADD_PTR14]], align 4
+; CHECK-NEXT:    ret void
+;
+entry:
+  %add.ptr = getelementptr inbounds i8, ptr addrspace(272) %foo, i64 16
+  store volatile i32 57005, ptr addrspace...
[truncated]

Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space occasionally consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. It is not shared with user space. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The memory is not shared with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff llvm/llvm-project#79902 taught LLVM BPF backend to generate the bpf_cast_kern() instruction before dereference of the arena pointer and the bpf_cast_user() instruction when the arena pointer is formed. In a typical bpf program there will be very few bpf_cast_user(). From LLVM's point of view, arena pointers are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_cast_kern(rY, addr_space) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_cast_user(rY, addr_space) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts cast_user to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rX) rX |= arena->user_vm_start & ~(u64)~0U; After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. The user space may mmap it, but bpf program will not convert pointers to user base at run-time to improve bpf program speed. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of bpf program is more important than ease of sharing with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff llvm/llvm-project#79902 taught LLVM BPF backend to generate the bpf_cast_kern() instruction before dereference of the arena pointer and the bpf_cast_user() instruction when the arena pointer is formed. In a typical bpf program there will be very few bpf_cast_user(). From LLVM's point of view, arena pointers are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_cast_kern(rY, addr_space) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_cast_user(rY, addr_space) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts cast_user to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rY) rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */ After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. The user space may mmap it, but bpf program will not convert pointers to user base at run-time to improve bpf program speed. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of bpf program is more important than ease of sharing with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff llvm/llvm-project#79902 taught LLVM BPF backend to generate the bpf_cast_kern() instruction before dereference of the arena pointer and the bpf_cast_user() instruction when the arena pointer is formed. In a typical bpf program there will be very few bpf_cast_user(). >From LLVM's point of view, arena pointers are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_cast_kern(rY, addr_space) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_cast_user(rY, addr_space) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts cast_user to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rY) rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */ After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4ast · 2024-02-13T20:27:48Z

llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp

+bool BPFCheckAndAdjustIR::insertASpaceCasts(Module &M) {
+  bool Changed = false;
+  for (Function &F : M) {
+    DenseMap<Value *, Value *> CastsCache;


Overall looks great. This Cache of ACast is per function. Is it correct to reuse that operand out of every BB ? It won't break 'volatile' pointer derefs ?

I don't think this would be a problem:

the cache is for wrapped Value, and each Value is defined only once

the cast instruction, put into cache, is placed right after Value definition (if it is an instruction, otherwise it is put at the beginning of the function), so it dominates same set of instructions as wrapped value did before the transformation

Introduce bpf_arena, which is a sparse shared memory region between the bpf program and user space. Use cases: 1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed anonymous region, like memcached or any key/value storage. The bpf program implements an in-kernel accelerator. XDP prog can search for a key in bpf_arena and return a value without going to user space. 2. The bpf program builds arbitrary data structures in bpf_arena (hash tables, rb-trees, sparse arrays), while user space consumes it. 3. bpf_arena is a "heap" of memory from the bpf program's point of view. The user space may mmap it, but bpf program will not convert pointers to user base at run-time to improve bpf program speed. Initially, the kernel vm_area and user vma are not populated. User space can fault in pages within the range. While servicing a page fault, bpf_arena logic will insert a new page into the kernel and user vmas. The bpf program can allocate pages from that region via bpf_arena_alloc_pages(). This kernel function will insert pages into the kernel vm_area. The subsequent fault-in from user space will populate that page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time can be used to prevent fault-in from user space. In such a case, if a page is not allocated by the bpf program and not present in the kernel vm_area, the user process will segfault. This is useful for use cases 2 and 3 above. bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages either at a specific address within the arena or allocates a range with the maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees pages and removes the range from the kernel vm_area and from user process vmas. bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of bpf program is more important than ease of sharing with user space. This is use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended. It will tell the verifier to treat the rX = bpf_arena_cast_user(rY) instruction as a 32-bit move wX = wY, which will improve bpf prog performance. Otherwise, bpf_arena_cast_user is translated by JIT to conditionally add the upper 32 bits of user vm_start (if the pointer is not NULL) to arena pointers before they are stored into memory. This way, user space sees them as valid 64-bit pointers. Diff llvm/llvm-project#79902 enables LLVM BPF backend generate the bpf_addr_space_cast() instruction to cast pointers between address_space(1) which is reserved for bpf_arena pointers and default address space zero. All arena pointers in a bpf program written in C language are tagged as __attribute__((address_space(1))). Hence, clang provides helpful diagnostics when pointers cross address space. Libbpf and the kernel support only address_space == 1. All other address space identifiers are reserved. rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the verifier that rX->type = PTR_TO_ARENA. Any further operations on PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar to copy_from_kernel_nofault() except that no address checks are necessary. The address is guaranteed to be in the 4GB range. If the page is not present, the destination register is zeroed on read, and the operation is ignored on write. rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type = unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the verifier converts such cast instructions to mov32. Otherwise, JIT will emit native code equivalent to: rX = (u32)rY; if (rY) rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */ After such conversion, the pointer becomes a valid user pointer within bpf_arena range. The user process can access data structures created in bpf_arena without any additional computations. For example, a linked list built by a bpf program can be walked natively by user space. Reviewed-by: Barret Rhoden <brho@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

@copy

This commit aims to support BPF arena kernel side feature [0]: - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: #define __as __attribute__((address_space(1))) void copy(int __as *from, int __as *to) { *to = *from; } Compiled to the following IR: define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } Is transformed to: %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void And compiled as: r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = *(u32 *)(r1 + 0) *(u32 *)(r2 + 0) = r1 exit Internally: - piggy-back `BPFCheckAndAdjustIR` pass to insert address space casts for base pointer of memory access instructions, when base pointer has non-zero address space; - modify `BPFInstrInfo.td` and `BPFIselLowering.cpp` to allow translation of `addrspacecast` instruction: - define new SDNode type `BPFAddrSpaceCast`; - lower `addrspacecast` as such SDNode; - define new machine instruction: `ADDR_SPACE_CAST`; - define pattern to select `ADDR_SPACE_CAST` for `BPFAddrSpaceCast` nodes. [0] https://lore.kernel.org/bpf/20240206220441.38311-1-alexei.starovoitov@gmail.com/

Make it so that all globals within same address space reside in section with name ".arena.N", where N is number of the address space. E.g. for the following C program: ```c __as const char a[2] = {1,2}; __as char b[2] = {3,4}; __as char c[2]; ... ``` Generate the following layout: ``` $ clang -O2 --target=bpf t.c -c -o - \ | llvm-readelf --sections --symbols - ... Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al ... [ 4] .arena.272 PROGBITS 0000000000000000 0000e8 000018 00 WA 0 0 4 ... Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 3: 0000000000000000 8 OBJECT GLOBAL DEFAULT 4 a 4: 0000000000000008 8 OBJECT GLOBAL DEFAULT 4 b 5: 0000000000000010 8 OBJECT GLOBAL DEFAULT 4 c ... ^^^ Note section index ```

For BPF GEP would adjust pointer using same offset in any address space, thus transformation of form: %inner = addrspacecast N->M %ptr %gep = getelementptr %inner, ... %outer = addrspacecast M->N %gep to just: %gep = getelementptr %ptr, ... is valid. Applying such transformation helps with C patterns that use e.g. (void *) casts to offsets w/o actual memory access: #define container_of(ptr, type, member) \ ({ \ void __arena *__mptr = (void *)(ptr); \ ((type *)(__mptr - offsetof(type, member))); \ }) (Note the address space cast on first body line)

`__BPF_FEATURE_ARENA_CAST` macro is defined if compiler supports emission of `cast_kern` and `cast_user` BPF instructions.

eddyz87 · 2024-03-08T20:14:00Z

Closing this in favor of #84410

eddyz87 force-pushed the bpf-addr-space-insn branch 2 times, most recently from 4d55f49 to bd30d9e Compare February 1, 2024 01:35

eddyz87 force-pushed the bpf-addr-space-insn branch 2 times, most recently from 42d0c09 to 1dc855e Compare February 7, 2024 22:50

eddyz87 requested a review from yonghong-song February 7, 2024 22:55

eddyz87 force-pushed the bpf-addr-space-insn branch from 1dc855e to 90c9846 Compare February 7, 2024 23:01

eddyz87 marked this pull request as ready for review February 7, 2024 23:03

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 7, 2024

eddyz87 force-pushed the bpf-addr-space-insn branch from 90c9846 to a08a25e Compare February 9, 2024 20:32

eddyz87 requested a review from 4ast February 13, 2024 16:50

4ast approved these changes Feb 13, 2024

View reviewed changes

eddyz87 force-pushed the bpf-addr-space-insn branch from a08a25e to 81422d1 Compare March 7, 2024 18:11

eddyz87 added 4 commits March 8, 2024 18:11

[BPF][CLANG] Front-end support for __BPF_FEATURE_ARENA_CAST macro

750b347

`__BPF_FEATURE_ARENA_CAST` macro is defined if compiler supports emission of `cast_kern` and `cast_user` BPF instructions.

eddyz87 force-pushed the bpf-addr-space-insn branch from 81422d1 to 750b347 Compare March 8, 2024 20:09

eddyz87 closed this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BPF] add cast_{user,kern} instructions #79902

[BPF] add cast_{user,kern} instructions #79902

Uh oh!

eddyz87 commented Jan 29, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jan 29, 2024 •

edited

Loading

Uh oh!

llvmbot commented Feb 7, 2024

Uh oh!

4ast Feb 13, 2024

Uh oh!

eddyz87 Feb 14, 2024

Uh oh!

eddyz87 commented Mar 8, 2024

Uh oh!

Uh oh!

[BPF] add cast_{user,kern} instructions #79902

[BPF] add cast_{user,kern} instructions #79902

Uh oh!

Conversation

eddyz87 commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 7, 2024

Uh oh!

4ast Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

eddyz87 Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

eddyz87 commented Mar 8, 2024

Uh oh!

Uh oh!

eddyz87 commented Jan 29, 2024 •

edited

Loading

github-actions bot commented Jan 29, 2024 •

edited

Loading