New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.4.0 stack usage increase? #484
Comments
It's kind-of expected, as there hasn't been as much optimization done after the large changes in 0.4.0. But 60% increase does sound like a bit much. |
Similar issue. Switched to latest nanopb 0.4.0 from master as per last week from 0.3.6. We were defining PB_FIELD_16BIT in 0.3.6 code. |
This is the last build before the merge of the new field descriptor format: That's the 580d7c8 commit which is the last before merge. |
Ok, I've now looked into this a bit and the stack usage is indeed ridiculous. I'm not yet exactly sure why, but it seems to boil down to how compilers consider stack space essentially free. So in e.g. I'll try to get some basic fixes in 0.4.1 that should go out in February, and will see later on bringing the stack usage down more. |
Gives about 15% stack usage reduction.
PetteriAimonen, thank you for the feedback. |
Moving the message initialization to pb_decode_inner() avoids allocating a second pb_field_iter_t on the stack.
Gives about 15% stack usage reduction.
Moving the message initialization to pb_decode_inner() avoids allocating a second pb_field_iter_t on the stack.
I have a similar question about stack usage. I am using GCC 9.2.1, with -Os, with nanopb 0.4.4. When decoding a (fairly large) protobuf struct into a statically allocated buffer (target size ~925 bytes), This is the GDB backtrace: (gdb) bt
#0 0x10059896 in load_descriptor_values (iter=iter@entry=0x20000700 <snt_task_stack+148>) at ../../../libs/nanopb/pb_common.c:14
#1 0x100599e8 in pb_field_iter_begin (iter=iter@entry=0x20000700 <snt_task_stack+148>, desc=0x10067c44 <firmware_version_t_msg>, message=0x20001168 <profile+312>) at ../../../libs/nanopb/pb_common.c:163
#2 0x10058eb4 in pb_decode_inner (stream=stream@entry=0x20000778 <snt_task_stack+268>, fields=<optimized out>, dest_struct=<optimized out>, flags=0) at ../../../libs/nanopb/pb_decode.c:986
#3 0x10058a6c in pb_dec_submessage (field=0x20000810 <snt_task_stack+420>, stream=0x20000888 <snt_task_stack+540>) at ../../../libs/nanopb/pb_decode.c:1594
#4 decode_basic_field (stream=stream@entry=0x20000888 <snt_task_stack+540>, wire_type=wire_type@entry=PB_WT_STRING, field=field@entry=0x20000810 <snt_task_stack+420>) at ../../../libs/nanopb/pb_decode.c:451
#5 0x10058bb2 in decode_static_field (field=<optimized out>, wire_type=<optimized out>, stream=0x20000888 <snt_task_stack+540>) at ../../../libs/nanopb/pb_decode.c:538
#6 decode_field (stream=0x20000888 <snt_task_stack+540>, wire_type=<optimized out>, field=0x20000810 <snt_task_stack+420>) at ../../../libs/nanopb/pb_decode.c:792
#7 0x10059072 in pb_decode_inner (stream=stream@entry=0x20000888 <snt_task_stack+540>, fields=<optimized out>, dest_struct=<optimized out>, flags=0) at ../../../libs/nanopb/pb_decode.c:1089
#8 0x10058a6c in pb_dec_submessage (field=0x20000920 <snt_task_stack+692>, stream=0x20000998 <snt_task_stack+812>) at ../../../libs/nanopb/pb_decode.c:1594
#9 decode_basic_field (stream=stream@entry=0x20000998 <snt_task_stack+812>, wire_type=wire_type@entry=PB_WT_STRING, field=field@entry=0x20000920 <snt_task_stack+692>) at ../../../libs/nanopb/pb_decode.c:451
#10 0x10058bb2 in decode_static_field (field=<optimized out>, wire_type=<optimized out>, stream=0x20000998 <snt_task_stack+812>) at ../../../libs/nanopb/pb_decode.c:538
#11 decode_field (stream=0x20000998 <snt_task_stack+812>, wire_type=<optimized out>, field=0x20000920 <snt_task_stack+692>) at ../../../libs/nanopb/pb_decode.c:792
#12 0x10059072 in pb_decode_inner (stream=stream@entry=0x20000998 <snt_task_stack+812>, fields=<optimized out>, dest_struct=<optimized out>, flags=1) at ../../../libs/nanopb/pb_decode.c:1089
#13 0x10058a6c in pb_dec_submessage (field=0x20000a30 <snt_task_stack+964>, stream=0x20000ab0 <snt_task_stack+1092>) at ../../../libs/nanopb/pb_decode.c:1594
#14 decode_basic_field (stream=stream@entry=0x20000ab0 <snt_task_stack+1092>, wire_type=wire_type@entry=PB_WT_STRING, field=field@entry=0x20000a30 <snt_task_stack+964>) at ../../../libs/nanopb/pb_decode.c:451
#15 0x10058bb2 in decode_static_field (field=<optimized out>, wire_type=<optimized out>, stream=0x20000ab0 <snt_task_stack+1092>) at ../../../libs/nanopb/pb_decode.c:538
#16 decode_field (stream=0x20000ab0 <snt_task_stack+1092>, wire_type=<optimized out>, field=0x20000a30 <snt_task_stack+964>) at ../../../libs/nanopb/pb_decode.c:792
#17 0x10059072 in pb_decode_inner (stream=stream@entry=0x20000ab0 <snt_task_stack+1092>, fields=<optimized out>, dest_struct=dest_struct@entry=0x20001030 <profile>, flags=flags@entry=0) at ../../../libs/nanopb/pb_decode.c:1089
#18 0x100590e8 in pb_decode (stream=stream@entry=0x20000ab0 <snt_task_stack+1092>, fields=<optimized out>, dest_struct=dest_struct@entry=0x20001030 <profile>) at ../../../libs/nanopb/pb_decode.c:1165
#19 0x1005d5b0 in pb_profile_decode (dst_profile=0x20001030 <profile>, src_pb=src_pb@entry=0x2000182c <smalloc_pool> "\b\356\237\360\326\v\022\006\066\065\061\063\067e\032\004\b\001\020\t\"P\n\020\b\254\002\020(\030\310\001 \002(\024\060\003H\001\022\022\b<\020",
len=len@entry=578) at ../../../libs/libsnt/src/protobuf/profile_protobuf.c:79
<...>
Is this to be expected? Is there a way to work around this issue? Thanks, |
@maximevince Yeah, it is to be expected that stack usage grows in relation to the number of submessage levels. Callbacks don't directly make a difference. In long term it would be better to make nanopb store only the required information (buffer pointer etc.) on each recursion level, instead of the full stack frame needed by recursive C functions, but that is a quite large change. I'm not aware of any particularly effective workaround, other than changing the .proto structure to have less submessage levels. |
@PetteriAimonen thanks for your swift reply. Now that we know the limitations, we can work with/around them. |
Recently I upgraded some code to use nanopb .0.4.0, but I'm having some problems with significantly higher stack usage with the new version. We're building this for an embedded ARM-based microcontroller (STM32F4). We're compiling with GCC, optimization level -O2.
Our compile options for nanopb are:
One example is a schema with the following definition:
We specify a maximum length in the options file:
For that reason, decoding the ItemList pb is very simple:
Here's the 5-byte buffer that is being decoded:
In my experiments, nanopb 0.4.0 used about 60% more stack than the old version we were using, going from 236 to 376 bytes of stack to decode the buffer above. The old version was a pull of master from late 2018, I believe--git describe says
nanopb-0.4.0-dev-32-g580d7c8
.I also ran a similar test with a much more complicated schema (see below) and got similar results: an increase from 884 bytes to 1445 bytes to decode the message.
Is this change intentional/expected? Are there any options I can set to reduce stack usage?
Schema for the more complicated message:
Options:
The text was updated successfully, but these errors were encountered: