-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] API to build BPF programs #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @alex-diez, thanks a lot for this RFC, the API looks really promising! I had in mind to make an assembler in the same way as uBPF does, but I had not really thought about any particular Rust API to build a program so far, and the one you started looks just fine.
I take it you would eventually merge it in the crate, and not as a separate crate as in this PR, is this correct?
I added inline comments about my thoughts on the implementation of the API (keep in mind I am no Rust expert, I may be wrong). Overall I have two issues with this first version:
- Some naming choices for structs or functions may be misleading and could be improved in my opinion;
- I do not like so much that
struct Operations
should have a program as an attribute. See details in comments.
Anyway, great work, thanks for working on this!
instructions/src/lib.rs
Outdated
pub enum MemSize { | ||
Double, | ||
Half | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Missing Word and Byte here, but not a problem since it's work in progress)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add other two enum's variants some day with TDD 😉
instructions/src/lib.rs
Outdated
|
||
pub fn bind_register(&'p mut self) -> Operation<'p> { | ||
Operation::new(self, 0xbf) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming for bind_register()
and bind_value()
does not really sound familiar for BPF operations. I think it would be easier for users if we have name closer to what is used already in the BPF ecosystem, something like move_value()
(or immediate), or move_from_reg()
, or anything closer to mov
than bind
. At least I feel it would help me. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change to move
, it is more domain specific name
instructions/src/lib.rs
Outdated
MemSize::Half => 0x69 | ||
}; | ||
Operation::new(self, opcode) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have a single load_mem()
function for all memory sizes!
instructions/src/lib.rs
Outdated
} | ||
self.program | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So most of the functions in struct Operation
could probably be moved to ebpf::Insn
, or to another struct Instruction
that has a ebpf::Insn
attribute maybe, or something like that. Not the push()
function, though. I think it would make sense to move it to the ProgramCodeBuilder
? Would that work? What's your opinion about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that yet. I wanted to write a BPF program by calling chain of functions.
For instance:
program.bind_value().dst(0x00).value(0x00_00_11_22).push().bind_value().dst(0x01).value(0x11_22_00_00).push().add_registers().dst(0x00).src(0x01).push() ... /* and so forth */.exit()
Here I called function push
referring push on stack, however I am not sure that it is a good name for it.
However, I have not considered to write it in this way:
program.add(Instruction::new(Class::Move).dst(0x00).src(0x01));
instructions/src/lib.rs
Outdated
Half | ||
} | ||
|
||
pub struct ProgramCodeBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just ProgramBuilder
, or ProgBuilder
? The Code
makes it longer to type/read, and I am not sure it brings additional information?
instructions/src/lib.rs
Outdated
ins.push((self.immediate & 0x00_00_00_FF) as u8); | ||
ins.push(((self.immediate & 0x00_00_FF_00) >> 8) as u8); | ||
ins.push(((self.immediate & 0x00_FF_00_00) >> 16) as u8); | ||
ins.push(((self.immediate & 0xFF_00_00_00) >> 24) as u8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking for style consistency: I used lowercase letters for other hexadecimal values in the crate. Anyway, do keep the underscores, I did not know we could use them—thanks!—and I'll add them in the rest of the crate to make it more readable.
instructions/src/lib.rs
Outdated
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have unit tests, thanks for that!
instructions/src/lib.rs
Outdated
self | ||
} | ||
|
||
pub fn value(&'p mut self, immediate: u32) -> &'p mut Operation<'p> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immediate()
or imm()
instead of value()
?
Yes, it is just faster to run
Me either, I just write tests for each functionality that I want to implement and run I read through you comments ... Anyway I am developing in TDD so names and missed Long names don't bother me a lot for now. Auto-complition does its job well. |
Yeah, the new version of the test is way more readable than the slice of hexa code… I'd be glad to merge your API once it's ready.
Sure, but then there's no need to clutter the doc with long names if you gain nothing by making them long ;-). In the case of |
Writing tests for load instructions I noticed something interesting.
according LD/LDX/ST/STX opcode structure
have Instructions in the above table, are they some kind of special? Does their class have special name in terms of BPF? Edit: I got it. |
That's it, the difference is the source of the load operand, immediate or register. As you surely noticed already, it uses these definitions from
(If I remember correctly the previous BPF version had a register called |
Pushed second version of Rust API. I have a couple of questions:
|
instructions/src/lib.rs
Outdated
} | ||
|
||
impl Load { | ||
pub fn new(mem_size: MemSize, addressing: Addressing, address_source: AddressSource) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this Addressing
and AddressSource
enums, though. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easier to decouple the different kinds of load into several operation type instead of using this Adressing
everywhere? Maybe also for the AddressSource
I prefer what you had in your v1?
Hey thanks for the hard work, that's impressive! Please can you give me a few days so that I can study and play with it? I don't have too much time right now to dive into the details. Please do not start any rewriting of the tests coming from uBPF ( |
OK so, sorry for the delay, here's some feedback. I don't really know how to say it with tact, but the thing is, I preferred your first version with chaining the commands. I know I told you having a program as an attribute in the Regarding v2, I tend to find it quite verbose, and too “low-level”. I realize you've made a great effort to reconstruct the operation codes with the different class / memory / etc. codes, but I don't believe the user is interested in this level of details, and should have so many steps to build an instruction (for example, using The idea to have a different struct for each operation type is good, though. Maybe we could push this further and restrict the operand-related methods to fill only the operands that are relevant for the current operation, raising an error/panic!ing otherwise? So again, sorry for changing my mind, I realize you've done a lot on this v2. What's your opinion on the matter, do you think one version is specifically better than the other? |
instructions/src/lib.rs
Outdated
} | ||
|
||
impl Load { | ||
pub fn new(mem_size: MemSize, addressing: Addressing, address_source: AddressSource) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easier to decouple the different kinds of load into several operation type instead of using this Adressing
everywhere? Maybe also for the AddressSource
I prefer what you had in your v1?
instructions/src/lib.rs
Outdated
Addressing::Ind => 0x40 | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we have to use a match for extracting values from enums. Apparently you can give each item a discriminant value, and then cast as an integer type, see https://doc.rust-lang.org/reference.html#enumerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Didn't know that.
instructions/src/lib.rs
Outdated
src: u8, | ||
offset: u16, | ||
imm: u32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imm
and offset
are considered as signed. We'll have to fix it at some point if we use ebpf::Insn
.
Thanks for the review. First of all v2 was just a try 😄 as v1. The more Rust code I write the better Rust developer I become. That's one of a reason why I start contributing to this project 😉 When I finished v2 I thougth that I would have to merge it with |
Hi @alex-diez, and sorry for being so slow to answer, I'm on a business trip and the WiFi is awful… It might be nice to unify the instructions for API/assembler/disassembler, I haven't given a thought into that so far. But I'm not sure I see what you have in mind with this I'm being very cautious with the instructions in |
* add Assemble trait * add Disassemble trait
Pushed last changes. I left some todos.
For example: let bpf_code = RawBpfCode::from_elf("/path/to/elf_file");
let verified_code = bpf_code.verify().unwrap();
let vm = rbpf::EbpfVmNoData::new(verified_code.assemble());
vm.prog_exec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alex-diez!
I like the current shape of the API used to build the programs, but I still have several comments.
-
The PR is getting hard to review, since commits are stacked on one another. I forked on my side and squashed your commit to have a clean view of the changes. That's a detail. But also, you push a lot of changes in a single PR;
-
In particular, I am not convinced having this
Dissemble
trait and implementation in this API is a good thing (and not in the same PR at any rate). We already have a disassembler, so you'll have to convince me we need another here :). Plus I already feel concerned that the assembler and disassembler we have do not share the same text strings and we'll have to keep them in sync if any of them moves, so I'm not very keen on adding a third similar component. Thoughts? -
By the way,
Assemble
/Asm
names are really close to what we have in the assembler.rs module, may lead to confusion. MaybeBuild
? (Or more figurative,Shape
,Craft
? I don't know.) -
Same remark for
parse
,from_elf
andverify
: please do not implement them in this PR. I am not sure we really need any of this, so it might be worth talking about it first before implementing?
This being said, I do like the current basic API to build the programs (without Disassemble
trait I mean), you could consider moving this code to its own module so that you can access ebpf::Insn
and finalize a version we could merge.
instructions/src/lib.rs
Outdated
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
#[cfg(test)] | ||
mod disassembling { | ||
use super::super::*; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespaces 1/3
instructions/src/lib.rs
Outdated
#[test] | ||
fn exit() { | ||
let mut program = BpfCode::new(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespaces 2/3
instructions/src/lib.rs
Outdated
|
||
assert_eq!(call.disassemble(), "call"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespaces 3/3
instructions/src/lib.rs
Outdated
pub fn new() -> Self { | ||
BpfCode { instructions: vec![] } | ||
} | ||
|
||
pub fn as_bytes(&self) -> &[u8] { | ||
self.instructions.as_slice() | ||
pub fn mov_add(&mut self, source: Source, arch: Arch) -> Move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mov_
prefix, here and in the following functions, is misleading to new eBPF users. I understand you refer to the operation class that is used to form the operation code, but I don't think the user should have to care about that. Thoughts?
First of all, I don't want to have two disassemble APIs. 😄 I can remove PS: working on this API I start thinking about this let bpf_vm = ... // create VM of any type
RawBpfCode::new().load(...).push().store(...).push()
// or RawBpfCode::parse(...)
// or RawBpfCode::from_elf(...)
.verify().map(|verified| verified.execute_with(bpf_vm)).err().map(...); and PS2: Maybe it is better to create an |
Sure, thanks!
Well, we most probably could. But is this something we should do? This is a more OOP approach indeed, but does it make the lib easier to use? I mean, we already have the “procedural-like” approach that works well, why change it (real question, I'm open to listening to arguments)? Opening an issue about may be a good idea indeed. I am not sure anyone else will immediately take part in the discussion, but this is something we can leave open a while and see if other people express interest in it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks once more @alex-diez! This is getting better. I still have (not-so-)minor comments. In particular, it's great to have those per-instruction-class structs (they add some constraints to keep users from setting wrong fields), but this makes the code lengthy and we could probably refactor it. I believe we can put a default definition of the functions for the trait in the trait declaration, might be worth to try?
src/rust_api.rs
Outdated
@@ -0,0 +1,1561 @@ | |||
// Copyright 2016 6WIND S.A. <quentin.monnet@6wind.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the boilerplate, but I'm not the author of this file! ;) Just set the copyright to your own name (or company if need be).
src/lib.rs
Outdated
@@ -28,6 +28,7 @@ pub mod ebpf; | |||
pub mod helpers; | |||
mod jit; | |||
mod verifier; | |||
pub mod rust_api; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably find a better name. Of course it's Rust, all the project is ;). Not sure also we need to specify it is an API. The VM and JIT offer an API also. Maybe something like builder
or prog_build
or whatever?
Once you have picked a name, could you please:
- insert the line in the list of public module inclusions, in the alphabetical order,
- and make sure this compiles on top of the master branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code_build
, insn_build
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't know. I don't like code_build
much, we're not building “code”. I would accept insn_build
I think. We can keep searching for a name until we merge. Should ideally be short, for people that do not include <module>::*
but will prefer to call <module>::<function>()
a lot.
src/rust_api.rs
Outdated
BpfCode { instructions: vec![] } | ||
} | ||
|
||
pub fn mov_add(&mut self, source: Source, arch: Arch) -> Move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not sure the mov_
prefix will help end users understanding what the function does.
src/rust_api.rs
Outdated
SwapBytes { | ||
bpf_code: self, | ||
endian: endian, | ||
insn: Insn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I should implement or derive the Default
trait for ebpf::Insn
…
src/rust_api.rs
Outdated
|
||
#[derive(Copy, Clone)] | ||
enum OpBits { | ||
Add = 0x00, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the values from ebpf
module instead: ebpf::BPF_ADD
and so on.
src/rust_api.rs
Outdated
|
||
#[derive(Copy, Clone, PartialEq)] | ||
pub enum Cond { | ||
Abs = 0x00, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…
src/rust_api.rs
Outdated
fn immediate(self, imm: i32) -> Self; | ||
} | ||
|
||
pub trait IntoBytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, this name is way better than assemble
here!
src/rust_api.rs
Outdated
} | ||
|
||
#[inline] | ||
fn mov_internal(&mut self, source: Source, arch_bits: Arch, op_bits: OpBits) -> Move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No objection to keep mov_
for this function, though)
tests/misc.rs
Outdated
@@ -253,6 +253,8 @@ fn test_jit_block_port() { | |||
} | |||
} | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You somehow left two additional empty lines here, please remove.
src/rust_api.rs
Outdated
|
||
/// sets immediate value | ||
fn immediate(self, imm: i32) -> Self; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the names for the getters and setters are quite similar, and I find them confusing (dst
to read the value, dst_reg
to set it? And offset_bytes
seems uselessly long). Could you find something else? Maybe adding set_
prefix to the setters?
src/rust_api.rs
Outdated
} | ||
|
||
pub fn load_x(&mut self, mem_size: MemSize) -> Load { | ||
self.load_internal(mem_size, Addressing::Undef, 0x61) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BPF_MEM = 0x60 ... but I can find the meaning of 0x01 ... only BPF_LDX 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's BPF_LDX
here I guess. In ebpf
module you have for example
pub const LD_B_REG : u8 = BPF_LDX | BPF_MEM | BPF_B;
(for “load byte from register” operation), or from Linux (linux/kernel/bpf/core.c):
[BPF_LDX | BPF_MEM | BPF_B] = &&LDX_MEM_B,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had better make last changes on weekends ... Yesterday evening I saw store_x
instead of load_x
. Hallucinations 😄
src/rust_api.rs
Outdated
} | ||
|
||
pub fn store_x(&mut self, mem_size: MemSize) -> Store { | ||
self.store_internal(mem_size, 0x61) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I didn't have hallucinations yesterday 😄. Here, I didn't know what constant I should use. BPF_MEM | BPF_STX
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, maybe you simply should not have 0x61 there? I think it should be BPF_STX | BPF_MEM
, so 0x63 maybe? I haven't checked whether the tests pass if we change this value.
Forget to add changed files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you manage to implement the functions in the trait after all, that's great!
We're nearly good, I only suggest a small change for the code this time. But then we have all the warnings because of the missing documentation, could you add it for all the items that are publicly exported? I know this is not the funniest part, but that's quite mandatory for an API if we want people to use it.
src/insn_builder.rs
Outdated
buffer.push(((self.get_imm() & 0xff_00_00_00) >> 24) as u8); | ||
buffer | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two warnings are raised by this function, regarding the int values that are used for the masks on lines 79 and 83. In fact, all those masks are useless, and you can do as follows:
fn into_bytes(self) -> Self::Bytes {
let mut buffer = Vec::with_capacity(8);
buffer.push( self.opt_code_byte());
buffer.push( self.get_src() << 4 | self.get_dst());
buffer.push( self.get_off() as u8);
buffer.push((self.get_off() >> 8) as u8);
buffer.push( self.get_imm() as u8);
buffer.push((self.get_imm() >> 8) as u8);
buffer.push((self.get_imm() >> 16) as u8);
buffer.push((self.get_imm() >> 24) as u8);
buffer
}
Casting to u8
after the division automatically trims the unwanted part on the “left”.
(Of course you don't have to keep the same formating, aligning stuff is just my personal preference in such case).
src/insn_builder.rs
Outdated
@@ -0,0 +1,1389 @@ | |||
// Copyright 2017 <alex.dukhno@icloud.com> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe with your name in addition to the email?
// Copyright 2017 Alex Dukhno <alex.dukhno@icloud.com>
Ok, warnings are gone, I think we're good this time! :). The commit history on this PR is a bit chaotic though, so I intend to squash commits. Is is ok for you, or do you prefer to split your changes yourself into a new set of (logical) commits? |
Just squash it |
Done! |
100th commit, by the way |
Anniversary 😄 |
Hi, I want to learn BPF too 😄
I decided to start from instruction set. This PR was created just to share my idea how API of generating BPF instruction could look like.
I look forward to hearing your opinion about it.