Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 252 lines (189 sloc) 8.146 kB
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
1 Keeping data small
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
2
3 When many applets are compiled into busybox, all rw data and
4 bss for each applet are concatenated. Including those from libc,
e84aeb5 update docs
Denis Vlasenko authored
5 if static busybox is built. When busybox is started, _all_ this data
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
6 is allocated, not just that one part for selected applet.
7
8 What "allocated" exactly means, depends on arch.
e84aeb5 update docs
Denis Vlasenko authored
9 On NOMMU it's probably bites the most, actually using real
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
10 RAM for rwdata and bss. On i386, bss is lazily allocated
11 by COWed zero pages. Not sure about rwdata - also COW?
12
e84aeb5 update docs
Denis Vlasenko authored
13 In order to keep busybox NOMMU and small-mem systems friendly
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
14 we should avoid large global data in our applets, and should
15 minimize usage of libc functions which implicitly use
e84aeb5 update docs
Denis Vlasenko authored
16 such structures.
17
18 Small experiment to measure "parasitic" bbox memory consumption:
19 here we start 1000 "busybox sleep 10" in parallel.
20 busybox binary is practically allyesconfig static one,
21 built against uclibc. Run on x86-64 machine with 64-bit kernel:
22
23 bash-3.2# nmeter '%t %c %m %p %[pn]'
24 23:17:28 .......... 168M 0 147
25 23:17:29 .......... 168M 0 147
26 23:17:30 U......... 168M 1 147
27 23:17:31 SU........ 181M 244 391
28 23:17:32 SSSSUUU... 223M 757 1147
29 23:17:33 UUU....... 223M 0 1147
30 23:17:34 U......... 223M 1 1147
31 23:17:35 .......... 223M 0 1147
32 23:17:36 .......... 223M 0 1147
33 23:17:37 S......... 223M 0 1147
34 23:17:38 .......... 223M 1 1147
35 23:17:39 .......... 223M 0 1147
36 23:17:40 .......... 223M 0 1147
37 23:17:41 .......... 210M 0 906
38 23:17:42 .......... 168M 1 147
39 23:17:43 .......... 168M 0 147
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
40
41 This requires 55M of memory. Thus 1 trivial busybox applet
e84aeb5 update docs
Denis Vlasenko authored
42 takes 55k of memory on 64-bit x86 kernel.
43
44 On 32-bit kernel we need ~26k per applet.
45
5a65447 top: add config option and code for global CPU % display
Denis Vlasenko authored
46 Script:
47
48 i=1000; while test $i != 0; do
49 echo -n .
50 busybox sleep 30 &
51 i=$((i - 1))
52 done
53 echo
54 wait
55
e84aeb5 update docs
Denis Vlasenko authored
56 (Data from NOMMU arches are sought. Provide 'size busybox' output too)
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
57
58
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
59 Example 1
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
60
61 One example how to reduce global data usage is in
62 archival/libunarchive/decompress_unzip.c:
63
64 /* This is somewhat complex-looking arrangement, but it allows
65 * to place decompressor state either in bss or in
66 * malloc'ed space simply by changing #defines below.
67 * Sizes on i386:
68 * text data bss dec hex
69 * 5256 0 108 5364 14f4 - bss
70 * 4915 0 0 4915 1333 - malloc
71 */
72 #define STATE_IN_BSS 0
73 #define STATE_IN_MALLOC 1
74
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
75 (see the rest of the file to get the idea)
76
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
77 This example completely eliminates globals in that module.
c14d39e rmp: add optional support for bz2 data. +50 bytes of code
Denis Vlasenko authored
78 Required memory is allocated in unpack_gz_stream() [its main module]
972288e modify ptr_to_globals trick so that we do not violate
Denis Vlasenko authored
79 and then passed down to all subroutines which need to access 'globals'
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
80 as a parameter.
81
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
82
83 Example 2
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
84
85 In case you don't want to pass this additional parameter everywhere,
86 take a look at archival/gzip.c. Here all global data is replaced by
972288e modify ptr_to_globals trick so that we do not violate
Denis Vlasenko authored
87 single global pointer (ptr_to_globals) to allocated storage.
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
88
89 In order to not duplicate ptr_to_globals in every applet, you can
90 reuse single common one. It is defined in libbb/messages.c
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
91 as struct globals *const ptr_to_globals, but the struct globals is
972288e modify ptr_to_globals trick so that we do not violate
Denis Vlasenko authored
92 NOT defined in libbb.h. You first define your own struct:
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
93
972288e modify ptr_to_globals trick so that we do not violate
Denis Vlasenko authored
94 struct globals { int a; char buf[1000]; };
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
95
96 and then declare that ptr_to_globals is a pointer to it:
97
98 #define G (*ptr_to_globals)
99
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
100 ptr_to_globals is declared as constant pointer.
101 This helps gcc understand that it won't change, resulting in noticeably
574f2f4 *: add optimization barrier to all "G trick" locations
Denis Vlasenko authored
102 smaller code. In order to assign it, use SET_PTR_TO_GLOBALS macro:
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
103
574f2f4 *: add optimization barrier to all "G trick" locations
Denis Vlasenko authored
104 SET_PTR_TO_GLOBALS(xzalloc(sizeof(G)));
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
105
106 Typically it is done in <applet>_main().
107
108 Now you can reference "globals" by G.a, G.buf and so on, in any function.
109
110
111 bb_common_bufsiz1
112
113 There is one big common buffer in bss - bb_common_bufsiz1. It is a much
114 earlier mechanism to reduce bss usage. Each applet can use it for
115 its needs. Library functions are prohibited from using it.
116
117 'G.' trick can be done using bb_common_bufsiz1 instead of malloced buffer:
118
119 #define G (*(struct globals*)&bb_common_bufsiz1)
120
e84aeb5 update docs
Denis Vlasenko authored
121 Be careful, though, and use it only if globals fit into bb_common_bufsiz1.
122 Since bb_common_bufsiz1 is BUFSIZ + 1 bytes long and BUFSIZ can change
123 from one libc to another, you have to add compile-time check for it:
124
17a1526 sed: a communal variable managed to slip past 'size'
Denis Vlasenko authored
125 if (sizeof(struct globals) > sizeof(bb_common_bufsiz1))
e84aeb5 update docs
Denis Vlasenko authored
126 BUG_<applet>_globals_too_big();
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
127
128
129 Drawbacks
130
131 You have to initialize it by hand. xzalloc() can be helpful in clearing
132 allocated storage to 0, but anything more must be done by hand.
133
134 All global variables are prefixed by 'G.' now. If this makes code
135 less readable, use #defines:
136
137 #define dev_fd (G.dev_fd)
138 #define sector (G.sector)
7560578 gzip: use common bbunzip infrastructure - ~700 bytes code less
Denis Vlasenko authored
139
140
4efeaee mkfs_minix: stop using lots of bss/data.
Denis Vlasenko authored
141 Word of caution
972288e modify ptr_to_globals trick so that we do not violate
Denis Vlasenko authored
142
486e7ca @aldot - touchup wording a bit
aldot authored
143 If applet doesn't use much of global data, converting it to use
144 one of above methods is not worth the resulting code obfuscation.
145 If you have less than ~300 bytes of global data - don't bother.
3d101dd expand documentation
Denis Vlasenko authored
146
147
148 gcc's data alignment problem
149
150 The following attribute added in vi.c:
151
152 static int tabstop;
153 static struct termios term_orig __attribute__ ((aligned (4)));
154 static struct termios term_vi __attribute__ ((aligned (4)));
155
e84aeb5 update docs
Denis Vlasenko authored
156 reduces bss size by 32 bytes, because gcc sometimes aligns structures to
3d101dd expand documentation
Denis Vlasenko authored
157 ridiculously large values. asm output diff for above example:
158
159 tabstop:
160 .zero 4
161 .section .bss.term_orig,"aw",@nobits
162 - .align 32
163 + .align 4
164 .type term_orig, @object
165 .size term_orig, 60
166 term_orig:
167 .zero 60
168 .section .bss.term_vi,"aw",@nobits
169 - .align 32
170 + .align 4
171 .type term_vi, @object
172 .size term_vi, 60
173
174 gcc doesn't seem to have options for altering this behaviour.
e84aeb5 update docs
Denis Vlasenko authored
175
f363065 small doc update
Denis Vlasenko authored
176 gcc 3.4.3 and 4.1.1 tested:
177 char c = 1;
e84aeb5 update docs
Denis Vlasenko authored
178 // gcc aligns to 32 bytes if sizeof(struct) >= 32
f363065 small doc update
Denis Vlasenko authored
179 struct {
180 int a,b,c,d;
181 int i1,i2,i3;
182 } s28 = { 1 }; // struct will be aligned to 4 bytes
183 struct {
184 int a,b,c,d;
185 int i1,i2,i3,i4;
186 } s32 = { 1 }; // struct will be aligned to 32 bytes
e84aeb5 update docs
Denis Vlasenko authored
187 // same for arrays
188 char vc31[31] = { 1 }; // unaligned
189 char vc32[32] = { 1 }; // aligned to 32 bytes
f363065 small doc update
Denis Vlasenko authored
190
b8e72fd add info about gcc's sadistic alignment - and how to disable it
Denis Vlasenko authored
191 -fpack-struct=1 reduces alignment of s28 to 1 (but probably
192 will break layout of many libc structs) but s32 and vc32
193 are still aligned to 32 bytes.
194
195 I will try to cook up a patch to add a gcc option for disabling it.
196 Meanwhile, this is where it can be disabled in gcc source:
197
198 gcc/config/i386/i386.c
199 int
200 ix86_data_alignment (tree type, int align)
201 {
202 #if 0
203 if (AGGREGATE_TYPE_P (type)
204 && TYPE_SIZE (type)
205 && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
206 && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 256
207 || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 256)
208 return 256;
209 #endif
210
211 Result (non-static busybox built against glibc):
212
213 # size /usr/srcdevel/bbox/fix/busybox.t0/busybox busybox
214 text data bss dec hex filename
215 634416 2736 23856 661008 a1610 busybox
216 632580 2672 22944 658196 a0b14 busybox_noalign
a7bb3c1 *: code shrink via NOINLINE
Denys Vlasenko authored
217
218
219
220 Keeping code small
221
222 Set CONFIG_EXTRA_CFLAGS="-fno-inline-functions-called-once",
223 produce "make bloatcheck", see the biggest auto-inlined functions.
224 Now, set CONFIG_EXTRA_CFLAGS back to "", but add NOINLINE
225 to some of these functions. In 1.16.x timeframe, the results were
226 (annotated "make bloatcheck" output):
227
228 function old new delta
229 expand_vars_to_list - 1712 +1712 win
230 lzo1x_optimize - 1429 +1429 win
231 arith_apply - 1326 +1326 win
232 read_interfaces - 1163 +1163 loss, leave w/o NOINLINE
233 logdir_open - 1148 +1148 win
234 check_deps - 1148 +1148 loss
235 rewrite - 1039 +1039 win
236 run_pipe 358 1396 +1038 win
237 write_status_file - 1029 +1029 almost the same, leave w/o NOINLINE
238 dump_identity - 987 +987 win
239 mainQSort3 - 921 +921 win
240 parse_one_line - 916 +916 loss
241 summarize - 897 +897 almost the same
242 do_shm - 884 +884 win
243 cpio_o - 863 +863 win
244 subCommand - 841 +841 loss
245 receive - 834 +834 loss
246
247 855 bytes saved in total.
adf922e *: a few more NOINLINEs
Denys Vlasenko authored
248
249 scripts/mkdiff_obj_bloat may be useful to automate this process: run
250 "scripts/mkdiff_obj_bloat NORMALLY_BUILT_TREE FORCED_NOINLINE_TREE"
251 and select modules which shrank.
Something went wrong with that request. Please try again.