-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPF VM stack frame size limit is too limiting for programs #13391
Comments
BPF stack consists of stack frames rather than a mutable stack pointer. The unfortunate side effect of this is that each function call uses its own stack frame. Increasing the stack frame size to 10MB would be a very large burden given we support call depths 64 deep. There has been discussions recently about doubling the stack frame size to 8k since most of the reports we've got from developers and internally are within 8k. We could make that larger than 8k to be on the safe side but any increase in the stack frame size results in that increase being applied to each call in a call chain. There are optimizations we can make to help miniumize the cost like allocate chunks of frames as needed or use a heap pool but the sheer amount of memory being allocated for a program instantiation should be taken into consideration for performance. We've also considered breaking away further from standard eBPF by changing from stack frames to stack pointers... @jstarry from your experience did you have a stack frame size that you think would meet most folk's needs? |
Doubling to 8k would certainly help! I did a bit of digging and it seems the biggest hits are rather unexpected like when LLVM inlines functions and when local variables are added again to the stack when passed in a function call. |
@jstarry why can't you use Box<> to allocate on the heap? |
You can but Rust doesn't make it easy to avoid an allocation on the stack when allocating to the heap. You have to use a library or do some unsafe stuff let boxed = unsafe {
let layout = Layout::new::<BigStruct>();
let raw_allocation = alloc_zeroed(layout) as *mut BigStruct;
Box::from_raw(raw_allocation)
}; |
And I stopped short of recommending devs do this in our docs because I feel like it's an anti-pattern that will affect performance. Stack allocation should be faster if the BPF VM allows it right? |
@jstarry it will make the programs larger and more expensive to load, since it effects every frame. Can we write a macro for heap alloc? |
Yeah we can, I laid out a few other work arounds too. I think zero-copy deserialization could be helpful, I'm not sure why we haven't been doing that in SPL. |
porting borsh to zerocopy from account will help a lot for stack sizes, and i guess for performance. borsh looks simplistic serialization format. also some coold third parties, like enum_dispatch may not work with scoped references out of box, will have to impl the patter (swich instead of dyn) manually. |
Experimenting with https://github.com/djkoloski/rkyv would be cool. @dzmitry-lahoda are you interested in trying it out to see what the stack usage improvements would be? |
I very believe that zero-copy is the best to approach for having proper coding of the solana programs. ideally, brosh would provide option to use either serde onto stack or into heap or generating zero copy view. but if they will not go that route, than yes - rkyv is option to research. what would be minimal research output to consider accepted using rkuv in spl? |
a working demo that deserializes a large account struct without blowing up the stack or using heap allocation would be cool |
There are no default rkyv codes which uses a slice of account data... It works well only with an allocated vector. It panics(hangs VM) or does not have traits implemented for doing serde against sliced serializers https://github.com/dzmitry-lahoda/solana-zero-copy-serde-example/blob/290d2c36a6460e6d23dd3b4b4d889acb0c14361d/src/processor.rs#L41 So at the first attempt, it does not work in VM, and for sure will need impl something to work with a slice of account data. As of now rkyv generates wrappers around raw data (hopefully not copies them, why zero copy api so insisting on allocator?). That wrapper provides plain structs fields access, not view via functions. So it is zero copy (if), then data on storage = data in mem (because no view generated). Not sure I like |
Hey @djkoloski 👋🏻 we are exploring the use of rkyv and it looks really promising but it's not clear how to leverage zero-copy deserialization over a slice of data. Can you provide some pointers? |
Hey all, thanks for the interest in rkyv. Let me know if I missed addressing anything: Responses
Yeah, this is annoying for stuff like
I'm the first to admit that
All of the work rkyv does is at serialization time, which makes sense as it's a zero-copy deserialization framework. 🙂 You can read more about how it works in the book. As a quick summary, regular types aren't suitable for zero-copying if they contain any data like pointers that may change from execution to execution. rkyv generates "archived" variants of types that are made of types that act the same but are zero-copy safe. Then when you serialize a regular type, rkyv converts it to that archived variant and writes that to your output stream. The process of getting data back out is as simple as a pointer cast to the archived type.
There are a few situations that can cause this to happen, especially surrounding buffer alignment. As of 0.4.3 there should be some checks verifying that, but they're behind a Sample codeHere's an example of serializing a slice: let my_slice = // ...
let writer = // ...
let mut serializer = WriteSerializer::new(writer);
let pos = serializer.serialize_unsized_value(my_slice)?; And accessing it: // Make sure that your data buffer is properly aligned for the types inside
let data = // ...
let archived = unsafe { archived_unsized_value::<[MySliceType]>(data, pos) }; If you have any other questions feel free to ping me or hop in the discord and ask some questions. |
Produce zero copy view
How to mutate state?Views are immutable. As I got of https://davidkoloski.me/rkyv/architecture/deserialize.html , mutating is next:
Looks a little bit complicated and low level (for something that should be verified by auditors of smart contract codes, may be). Mutation idealSimplified options to steps above (some wrappers, written by solana may be) can simplify steps above so. How it is easy to do with But I thought of some traits generated for each |
For mutation, you can also use #[derive(Archive, Serialize, Deserialize)]
pub struct AccountState {
pub number: i32,
pub string: String,
}
impl ArchivedAccountState {
// i32 is Unpin, so we can choose `number` to have non-structural pinning
// We could choose to have structural pinning instead if we wanted
pub fn number_pin(self: Pin<&mut Self>) -> &mut i32 {
unsafe { &mut self.get_unchecked_mut().number }
}
// ArchivedString is not Unpin, so `string` must have structural pinning
pub fn string_pin(self: Pin<&mut Self>) -> Pin<&mut Archived<String>> {
unsafe { self.map_unchecked_mut(|s| &mut s.string) }
}
}
fn execute(account_info: &AccountInfo) {
let data = account_info.data.try_borrow_mut().unwrap();
let buf = &mut data[..];
// Not sure where you'd store the root object position, but there are some
// alternative ways to do this that don't require storing the data pos
let pos = account_info.data.pos();
let mut value = unsafe { archived_value_mut::<AccountState>(Pin::new(buf), pos) };
*value.number_pin() = 42;
println!("archived string: {}", value.get_ref()
} I haven't tested this 100%, so my apologies if anything doesn't work out of the box. If you use the mutable API for rkyv, you should always have structural pinning for fields that are not |
@djkoloski Hi, glad to see the maintainer of rkyv here. :) rkyv has been on my watch list for awhile. If we can adapt rkyv in some way, it'll be huge. :)
I'm wondering we could realize this by combining rust's unstable allocator_api(rust-lang/rust#32838). My idea is that Archived can be type-parameterized like ArchiveAllocator. Then, it can lend (mmap-backed) rkyv-controlled memory region which is basically convertible to rkyv's RelativePtr on collection re-allocation etc. Some space inefficiency/fragmentation of it can be ok. Can this wild idea come true from eyes of actual writer of it? Then, this will allow for zero-copy growable |
@djkoloski thanks. @jstarry it works for deep layered structures https://github.com/dzmitry-lahoda/solana-zero-copy-serde-example/tree/dz/zerocopy . borsh overflows (see comment). It may seem possible to wrap I dislike the need to align to 8 bytes (wastes 7 bytes per instruction input). Also, because structs are generated, my rust-analyzer in vscode is not very friendly to typing, but hope that issues will be fixed in future Rust. Another issue, how to provide memory layout to external (non Rust) consumers (could C headers be generated from generated structs?) |
Minor update, you should be able to use the |
other concerns:
|
@dmakarov Would you work up what would be necessary to increase the stack size? |
Getting some requests for 8kb stack size, think we could prioritize this? |
Yes, I'll work on this on Monday. |
How about we drop the fixed stack frame size all together, so that the reserved stack area can be utilized better? IIRC the only reason we use aligned stack frames is for easier debugging. |
Fixed BPF stack frames are part of the BPF arch spec but not ideal for our use case for many reasons. We've agreed that a single stack region with a mutable stack pointer like more current architecture use is the way we want to go. The decision on when that happens is based on how far we want to diverge from BPF and when. @dmakarov do you have an idea of what it would take to modify the BPF backend to make the switch from fixed-sized stack frame to a mutable stack pointer? |
We'd have to implement emitPrologue/emitEpilogue methods in BPFFrameLowering class (currently these are empty functions). Maybe some other minor changes are needed. Other than that I don't anticipate big changes. |
This kind of architectural change should coincide with our move from BPF to SBF as a target architecture. @dmakarov any objection to adding this to the list of changes that are needed when we do that move? |
No objections. I think you're right. |
See parent issue: #20323 "Variable stack pointer instead of fixed stack frames" is done. |
@alessandrod can we close this as you have enabled variable size stacks? |
This is done but not available yet, since it's part of SBFv2 which hasn't been rolled out yet. Up to you if you want to close or leave open until rollout! |
This is resolved by solana-labs/rbpf#274 Closing this for now. |
Problem
The BPF VM limits stack frames to 4KB which can easily be surpassed by non-trivial programs. There are workarounds for this limit but they sacrifice developer ergonomics and program efficiency. Since accounts can be up to 10MB in size, it's not unreasonable to think devs will want stack allocated objects that use a moderate portion of that limit. For instance, the Serum order book state is 3228 bytes. Processing more than order book state at the same time in a program becomes impossible without heap allocations or breaking up the state into smaller views.
Work arounds:
This is a big headache for Rust programs since Rust allocates on the stack by default and doesn't yet provide an easy way to avoid stack allocations when allocating on the heap:
https://docs.rs/bytemuck/1.4.1/bytemuck/allocation/fn.try_zeroed_box.html
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c78da94118e5e09925138548cd7093cb
Rather than deserializing the full account state, programs can be more efficient about the data they deserialize to reduce stack allocations and instruction count. There isn't a convention in the Rust SDK yet that makes this easy.
Same as 2) above, this would probably be a good pattern to promote but there isn't a convention or tooling that makes this easy yet.
There's probably lots of small tricks that could be done to break up program logic to minimize stack allocations. Rust programs can eagerly drop stack allocated objects by using nested scopes or they can split up logic into smaller functions to avoid accumulating too many allocations.
Proposed Solution
Increase stack frame size considerably
tag @jackcmay
The text was updated successfully, but these errors were encountered: