Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Const integer bounds approach to bounds checking #194

Closed
jon-chuang opened this issue Jul 12, 2021 · 6 comments
Closed

Const integer bounds approach to bounds checking #194

jon-chuang opened this issue Jul 12, 2021 · 6 comments

Comments

@jon-chuang
Copy link

jon-chuang commented Jul 12, 2021

Related: #193
Here is the x86 generated by rust.godbolt with flag --O from the following source:

source
const MM_INPUT_START: i64 = 0x400000000;
const MM_HEAP_START: i64 = 0x300000000;

const INPUT_HOST_ADDR: i64 = 500_000;
const HEAP_HOST_ADDR: i64 = 1_000_000;

const OFFSET: i16 = 0;

const INPUT_LEN: i64 = 16;
// 1MiB
const HEAP_LEN: i64 = 100_000;

const TYPE_LEN: i64 = 0x8;

pub fn emit_addr_translation(vm_addr: u64) -> u64 {
    if vm_addr >= (MM_INPUT_START - OFFSET as i64) as u64 {
        if vm_addr < (MM_INPUT_START + INPUT_LEN - OFFSET as i64 - TYPE_LEN) as u64 {
            return (INPUT_HOST_ADDR + OFFSET as i64 - MM_INPUT_START) as u64 + vm_addr;
        } else {
            return u64::MAX;
        }
    } else if vm_addr >= (MM_HEAP_START - OFFSET as i64) as u64 {
        if vm_addr < (MM_HEAP_START + HEAP_LEN - OFFSET as i64 - TYPE_LEN) as u64 {
            return (HEAP_HOST_ADDR + OFFSET as i64 - MM_HEAP_START) as u64 + vm_addr;
        } else {
            return u64::MAX;
        }
    } else {
        return u64::MAX;
    }
}

pub fn main(){
    println!("{:?}", emit_addr_translation(MM_INPUT_START as u64 + 5));
    println!("{:?}", emit_addr_translation(MM_HEAP_START as u64 + 5));
    println!("{:?}", emit_addr_translation(MM_HEAP_START as u64 + 1_000_000));
    println!("{:?}", emit_addr_translation(MM_HEAP_START as u64 + 90_000));
    println!("{:?}", emit_addr_translation(90_000));
}
example::emit_addr_translation:
        mov     rax, rdi
        shr     rax, 34 ; check if vm_addr >= (lower bound: MM_INPUT_START - OFFSET)
        jne     .LBB1_3
        movabs  rax, -12884901888 ; rax <- (lower bound) MM_HEAP_START - OFFSET
        lea     rcx, [rdi + rax] ; vm_addr - lower bound
        
        ; check lower_bound <= vm_addr < upper bound (INPUT_LEN + OFFSET - TYPE_LEN)
        ; i.e. 0 <= vm_addr - lower_bound < upper bound (INPUT_LEN + OFFSET - TYPE_LEN) - lower_bound
        cmp     rcx, 99992
        lea     rcx, [rdi + rax + 1000000] ; rcx <- host_ptr
        jmp     .LBB1_2
.LBB1_3:
        movabs  rax, 17179869184 ; 0x400000000 
        add     rax, 8 ; (upper bound) + INPUT_LEN - OFFSET - TYPE_LEN
        movabs  rcx, -17179369184 ; INPUT_HOST_ADDR + OFFSET - MM_INPUT_START
        add     rcx, rdi ; rcx <- host_ptr
        cmp     rdi, rax ; check vm_addr ptr upper bound
.LBB1_2:
        mov     rax, -1 ; error return address (u64::MAX)
        cmovb   rax, rcx ; if (condition) rax <- host_ptr
        ret

Here is the equivalent with offset = -4:

example::emit_addr_translation:
        movabs  rax, 17179869188
        cmp     rdi, rax
        jae     .LBB1_3
        movabs  rax, -12884901892
        lea     rcx, [rdi + rax]
        cmp     rcx, 99992
        lea     rcx, [rdi + rax + 1000000]
        jmp     .LBB1_2
.LBB1_3:
        add     rax, 8
        movabs  rcx, -17179369188
        add     rcx, rdi
        cmp     rdi, rax
.LBB1_2:
        mov     rax, -1
        cmovb   rax, rcx
        ret

offset = 4:

example::emit_addr_translation:
        movabs  rax, 17179869179
        cmp     rdi, rax
        jbe     .LBB3_1
        add     rax, 9
        movabs  rcx, -17179369180
        add     rcx, rdi
        cmp     rdi, rax
        jmp     .LBB3_2
.LBB3_1:
        movabs  rax, -12884901884
        lea     rcx, [rdi + rax]
        cmp     rcx, 99992
        lea     rcx, [rdi + rax + 1000000]
.LBB3_2:
        mov     rax, -1
        cmovb   rax, rcx
        ret

You will get a different result with -C opt-level=2 and -C opt-level=3

Just to give some basic ideas for how to inline the bounds checks as x86.


As an idea of the overhead, this is about 10 instructions, and at IPC = 1, about 2.5ns. This is compared to the current measured overhead of addr translation of about 9-12ns (4GHz).

So the potential savings here is about 3.6x per addr translation. Which doesn't seem that great.

It does seem like BCE might be necessary here...

I believe that in order to achieve WASM-like performance, you really do need linear memory. The VM address is an offset, which doesn't require any lower bounds checking, so can be directly inlined as a u64 address by adding the base pointer. This makes a WASM bounds check just a single cmp and jmp, which maybe takes 1 or 2 cycles, and makes it easier for the processor to do speculative prefetching.

@jon-chuang jon-chuang changed the title Documentation for one approach to bounds checking Documentation for const integer bounds approach to bounds checking Jul 12, 2021
@jon-chuang jon-chuang changed the title Documentation for const integer bounds approach to bounds checking Const integer bounds approach to bounds checking Jul 12, 2021
@jon-chuang
Copy link
Author

jon-chuang commented Jul 13, 2021

@Lichtso
Here is how to do dynamic memory mapping:

Maintain the following metadata in EbpfElf and JitProgram:

// fields should always be positive
struct SectionInfo {
    bp_host_addr: i64,
    len: i64,
}

struct EbpfElf {
  ..
  // sections as indexed from 1 to 4
  section_infos: [SectionInfo; 4],
}

// memory remapping ops
SUBTRACT_HOST_ADDR: u8 = 0;
ADD_HOST_ADDR: u8 = 1;
SUBTRACT_LEN: u8 = 2;
ADD_LEN: u8 = 3;

struct OffsetEntry {
  offset: isize,
  op: u8,
}

struct JitProgram {
  ..
  // sections as indexed from 1 to 4
  section_infos: [SectionInfo; 4],
  remap_offsets: [Vec<OffsetEntry>; 4],
}

impl JitProgram {
  fn remap_memory(&mut self, new_section_infos: [SectionInfo; 4]) {
    let text_ptr_i64_loc = jit_program._sections.text_section as *mut i64;
    for ((info, new_info), offsets) self.section_infos.iter().zip(&new_section_infos).zip(&self.remap_offsets) {
      for entry in &offsets {
        let modifier = match entry.op {
          0 => info.bp_host_addr - new_info.bp_host_addr,
          1 => new_info.bp_host_addr - info.bp_host_addr,
          2 => info.len - new_info.len,
          3 => new_info.len - info.len,
          _ => Err,
        };
        unsafe { *text_ptr_i64_loc.offset(entry.offset) += modifier; }
      }
    }
    self.section_infos = new_infos;
  }
}

impl JitCompiler {
..
  fn emit_memory_translation(..) {
    // subtract input section's host base ptr addr from x86 i64 immediate.
    jit.add_offset_entry(
      section_index::INPUT, 
      remap_ops::SUBTRACT_HOST_ADDR,
      (jit.offset_in_text_section + opcode_leading_size) as isize, 
    );
    ..
  }
}

@Lichtso
Copy link

Lichtso commented Jul 13, 2021

I think this part here won't work:

struct EbpfElf {
  ..
  // sections as indexed from 1 to 4
  section_infos: [SectionInfo; 4],
}

The MemoryMapping must be in the VM, not the Executable, because the same Executable can be instanced and run in different VMs across multiple threads simultaneously (each having a different set of host addresses). Also, this is only intended for the "input" memory region, right?

@jon-chuang
Copy link
Author

jon-chuang commented Jul 13, 2021

@Lichtso

Hmm, I believe this approach requires making a local copy of the executable everytime a VM is created.

That's because each instance should have its own memory mapping. So each instance needs to have a unique remapped executable.

So maybe I ought to first do a local copy into the VM. i.e. the VM should have a local copy of the JitProgram.

@jon-chuang
Copy link
Author

jon-chuang commented Jul 14, 2021

@Lichtso , maybe in order to save on copies, I should make the semantics copy-on-write if shared, else write in place.

Well, one executor could be about 1MB or so, so this can save on copies. 1MB copy will take about 50-120us. This is already somewhat long. It's not clear the remapping operation will be quick either, since it needs to write to many random memory locations.

@jon-chuang
Copy link
Author

jon-chuang commented Jul 14, 2021

Here is a simplified strategy to bounds checking:

  1. First, add the offset to the register.
  2. Check via the 33rd bit which section it is pointing to. jmp to section's bounds logic.
  3. Do the upperbound check. If fail, jmp to OOB error. Add the required constant (host_addr_bp - section_addr_bp) to the vm_addr, and store it in the register. jmp to return.
  4. perform operation

@Lichtso
Copy link

Lichtso commented Jul 23, 2021

I will close this for now, as #196 was merged.
The next thing to approach is solana-labs/solana#14523 and redesigning the entrypoint / ABI.
We might come back to another redesign of the address translation, but for now I think we have a reasonably good solution while maintaining backwards compatibility.

@Lichtso Lichtso closed this as completed Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants