Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware too large and constant strings #574

Closed
vpelletier opened this issue Oct 20, 2022 · 18 comments
Closed

Firmware too large and constant strings #574

vpelletier opened this issue Oct 20, 2022 · 18 comments

Comments

@vpelletier
Copy link
Collaborator

vpelletier commented Oct 20, 2022

Note: I am completely new to arduino in general, so I may be missing something.

I cannot get the latest source (as of 9d80d24) to fit with all modules enabled. On its own this may not be a big issue (I guess this is why they are optional and selectable to begin with). The linker complains that the code is 2444 bytes too large.

Then I took a look at the produced binary (after disabling the NES module just to have a file to analyse) to check what is going on. And here is an extract of what I found:

$ strings Cart_Reader.ino.bin | sort | uniq -c | sort -nr | head -n30
     70 Press Button...
     44 SD Error
     36 Reset
     30 did not verify.
     28 m/|/
     28  bytes
     27 Error:
     27 A,Q,2
     26 /...
     23 (_?O
     22 /)/3'
     21 Saving to
     21 Can't create file on SD
     18 Can't open file on SD
     18 Can't open file
     17 File size exceeds flash size.
     17 Done
     16 Press left to Change
     16 and right to Select
     15 ROM Size:
     13 Verifying...
     13 Saved to
     12 Flashing file
     11 Verified OK
     11 Read ROM
     11 /_?O
     11 Erasing...
     10 ATTENTION 3.3V
      9 Writing...
      9 This will erase your

The first obvious finding is that -fmerge-constants (which is supposed to be enabled in -Os, which is used in the build) is somehow doing a terrible job. I tried editing my platform.txt to explicitly add -fmerge-constants, without any improvement.

So, how much space is used by these ? Ignoring the few strings above which are likely accidental finds by strings and not "true" strings, I get:

>>> 70 * len("Press Button... ") + 44 * len("SD Error ") + 36 * len("Reset ") + 30 * len("did not verify. ") + 28 * len(" bytes ") + 27 * len("Error: ") + 26 * len("/... ") + 21 * len("Saving to ") + 21 * len("Can't create file on SD ") + 18 * len("Can't open file on SD ") + 18 * len("Can't open file ") + 17 * len("File size exceeds flash size. ") + 17 * len("Done ") + 16 * len("Press left to Change ") + 16 * len("and right to Select ") + 15 * len("ROM Size: ") + 13 * len("Verifying... ") + 13 * len("Saved to ") + 12 * len("Flashing file ") + 11 * len("Verified OK ") + 11 * len("Read ROM ") + 11 * len("Erasing... ") + 10 * len("ATTENTION 3.3V ") +  9 * len("Writing... ") +  9 * len("This will erase your ")
6770

Of course, one copy of each strings would still be necessary, and this ignores the (low ?) probability that some of these would actually be non-constant but actually local variable initial values, so the final gain would not be strictly this much. Still, this is 3 times larger than the extra space needed to fit all modules in the available program space. Also, it obviously ignores any possible duplication from the NES module.

I do not have a magic solution. The best idea I can come up with is terrible: put all constant (including inline) strings in some header file and reference them everywhere. But at least this should not leave gcc the chance to mess things up. This will not help if these duplicate strings come from libraries, but the strings visible above seem to be rather specific to this project, so the losses from libraries seems likely to be low.

@sanni
Copy link
Owner

sanni commented Oct 21, 2022

I'm not a programmer, I only read the first ~70 pages of Kernighan's "The C programming language". Stopped reading at pointers. So the whole code base is certainly missing some optimization.

I guess the strings are not getting merged by the compiler because they are stored in the program memory instead of the SRAM since we are even lower on SRAM than on ROM.
There is a solution posted here: https://www.arduino.cc/reference/en/language/variables/utilities/progmem/
Could just use an array of strings in progmem for the most common expressions.

I'll try it out and report back.

@vpelletier
Copy link
Collaborator Author

I only read the first ~70 pages of Kernighan's "The C programming language".

That's still 70 more than I did :) .

<off-topic>BTW, thank you very much for this project. I had original N64 batteries which were magically still hanging on, keeping game saves alive after all this time. I could dump the saves, replace the batteries and write them back.</off-topic>

I've tried disabling the -f{function,data}-sections arguments to avr-gcc and avr-g++, with no change to the output.

I realised that avr-gcc is version 5.4.0, which is getting quite old now. I have no idea if constant merging has been improved since, but it makes it unlikely that anyone on the gcc side would be willing to take a look at why it does not work as expected.

Such automated de-duplication would be significantly more convenient than having to maintain a "list" (at least in the general sense, if not in the technical "array" sense) of strings in order to save space, so it is unfortunate it does not work.

@sanni
Copy link
Owner

sanni commented Oct 21, 2022

It does seem to work, just with the default modules enabled and two strings replaced I already get results:
Default -> Sketch uses 230230 bytes (90%) of program storage space.
"Reset" replaced -> Sketch uses 229518 bytes (90%) of program storage space.
"Reset" + "Press Button..." replaced -> Sketch uses 228386 bytes (89%) of program storage space.
ten most used strings replaced -> Sketch uses 225798 bytes (88%) of program storage space.

I just replace

println_Msg(F("Press Button..."));

with new function

print_STR(press_button_STR, 1);
/******************************************
  Common Strings
 *****************************************/
#define press_button_STR 0
#define sd_error_STR 1
#define reset_STR 2
#define did_not_verify_STR 3
#define _bytes_STR 4
#define error_STR 5
#define create_file_STR 6
#define open_file_STR 7
#define file_too_big_STR 8
#define done_STR 9

// This arrays holds the most often uses strings
static const char string_press_button0[] PROGMEM = "Press Button...";
static const char string_sd_error1[] PROGMEM = "SD Error";
static const char string_reset2[] PROGMEM = "Reset";
static const char string_did_not_verify3[] PROGMEM = "did not verify";
static const char string_bytes4[] PROGMEM = " bytes";
static const char string_error5[] PROGMEM = "Error:";
static const char string_create_file6[] PROGMEM = "Can't create file";
static const char string_open_file7[] PROGMEM = "Can't open file";
static const char string_file_too_big8[] PROGMEM = "File too big";
static const char string_done9[] PROGMEM = "Done";
static const char* const string_table[] PROGMEM = { string_press_button0, string_sd_error1, string_reset2, string_did_not_verify3, string_bytes4, string_error5, string_create_file6, string_open_file7, string_file_too_big8, string_done9 };

void print_STR(byte string_number, boolean newline) {
  char string_buffer[18];
  strcpy_P(string_buffer, (char*)pgm_read_word(&(string_table[string_number])));
  if (newline)
    println_Msg(string_buffer);
  else
    print_Msg(string_buffer);
}

@vpelletier
Copy link
Collaborator Author

It does seem to work, just with the default modules enabled and two strings replaced I already get results:

Cool !

For reference, here is the updated top-30 most duplicated strings with default modules (I am on HW3, which I believe affects the menu strings, also I am still on the same commit as above):

$ avr-objcopy -O binary -R .eeprom  Cart_Reader.ino.elf Cart_Reader.ino.bin
$ strings Cart_Reader.ino.bin | sort | uniq -c | sort -nr | head -n30
     62 Press Button...
     40 SD Error
     28 Reset
     28 m/|/
     27 (_?O
     26 did not verify.
     25 A,Q,2
     24  bytes
     23 Error:
     22 0x`/
     18 Can't open file on SD
     18 Can't open file
     17 File size exceeds flash size.
     17 /...
     15 Saving to
     13 Verifying...
     12 Flashing file
     11 Verified OK
     11 /_?O
     11 Erasing...
     11 Done
     11 Can't create file on SD
     11 /)/3'
     10 Press right to select
     10 Press left to change
     10 Flash ID:
     10 ATTENTION 3.3V
      9 V2H/
      9 This will erase your
      9 Select file

Default -> Sketch uses 230230 bytes (90%) of program storage space.
"Reset" replaced -> Sketch uses 229518 bytes (90%) of program storage space.

So 712 bytes saved.

>>> 27 * len("Reset ")
162

This is a much larger saving than I would have expected (keeping one copy of the string, the trailing space is just to represent the trailing NUL).

"Reset" + "Press Button..." replaced -> Sketch uses 228386 bytes (89%) of program storage space.

1120 further bytes saved

>>> 61 * len("Press Button... ")
976

The gain is again exceeding expectations. Not sure why, but I'm not complaining :) .

ten most used strings replaced -> Sketch uses 225798 bytes (88%) of program storage space.

4432 bytes saved ! This is great.

@sanni
Copy link
Owner

sanni commented Oct 21, 2022

Now 5330 bytes with default modules, see attachment.
Cart_Reader_test.zip

@vpelletier
Copy link
Collaborator Author

New top-30 with HW3 and default modules:

$ strings Cart_Reader.ino.bin | sort | uniq -c | sort -nr | head -n30
     28 m/|/
     24 A,Q,2
     23 (_?O
     17 /...
     11 Verified OK
     11 /_?O
     11 Erasing...
     11 /)/3'
     10 Press right to select
     10 Flash ID:
     10 ATTENTION 3.3V
      9 This will erase your
      9 Select file
      9 O__OoO
      9 Blankcheck
      8 Savetype Error
      8 Saved to
      8 Can't open file
      7 Writing...
      7 #Q1     #01
      7 O]_O
      7 File doesnt exist
      7 Error: Wrong Flash ID
      7 BPQ     a       q
      7 Banks:
      6 `/op
      6 K-h-
      6 g+h+i+
      6 Error
      6 Erase failed

This looks a lot cleaner: now the false-positives are taking a significant portion of the output.

I tried enabling all modules, but it still goes over the limit. And I am a bit puzzled: it goes over the limit by 4k... More than before. Maybe this is just because I was a bit behind the master branch, and something would have increased the footprint since ?

vpelletier added a commit to vpelletier/cartreader that referenced this issue Oct 22, 2022
@vpelletier
Copy link
Collaborator Author

vpelletier commented Oct 22, 2022

I have identified another way to gain ~500 bytes of program space (when all modules are enabled): make flashid an integer instead of a 5-bytes string. Still not enough to fit them all, but getting closer.

This saves program memory in 2 ways:

  • the value it is compared against can be a plain integer instead of another 5-bytes string
  • the comparison can be done as a regular integer comparison, removing the need to call strcmp. I expect this part is the biggest source of space gain.

This also saves 200 bytes of ram, which I think come from the many 5-bytes strings which used to be ram-based (so they had to be copied from their initialiser value to ram during boot, so they took 200 bytes of global space).

I pushed the change to my fork (linking to the branch as I will likely push some more). EDIT: Important note: I am not testing these changes on-hardware yet, rather getting a feeling of what impact they have on memory use.

This is applied on top of the changes you posted a few comments above.

[edit] rewording

@vpelletier
Copy link
Collaborator Author

Quick progress report before logging off: I gained about 1kB of program space. 3kB (at least) to go. My current targets are duplicated code:

  • sometimes literal duplicated code, like the same few lines repeated
  • other times code which duplicates features of the libc, or of a library (ex: in NES.ino, there are repeated calls to read() instead of using read(buffer, length))

I target the largest functions using information from objdump on the produced .elf file (so no need to disable modules to get a .bin to analyse): avr-objdump -tC Cart_Reader.ino.elf | grep -v '\.bss' | sort -rk 1.24,1.31 | less.

I exclude .bss just because it fails at being sorted properly, making the output a bit confusing. Anyway these do not take program space, only ram space, and ram is fine: I went below 80% use.

But again, I have not tested this against real hardware yet, so I probably broke some things along the way. I am worried about how I can test the changes I am accumulating, as I only have N64, GB (+ GBC) and GBA gamecarts available to dump, and no flashcart.

@sanni
Copy link
Owner

sanni commented Oct 22, 2022

I can test NES, SMS, MD and have GBA, GBC, SNES and N64 flashcarts. If there is anything I can test before you invest the time changing the code just tell me. Otherwise just do the changes and we test and fix afterwards.

@vpelletier
Copy link
Collaborator Author

Today's news (so far): I've focussed on NES.ino, and reduced the size by abou 1450 bytes. 300 bytes to go before a perfect fit with HW3 (but freeing a few more kB would be more than welcome). I pushed everything so far to my branch.

I would like you to start reviewing the changes, to at least point out mistakes I am making (breakages, coding style, bad directions, pointers to other places where similar changes would be applicable, ...), and maybe even pick a few commits for inclusion.

Should I open a merge request ? I will likely be force-pushing to it (hopefully only the few topmost commits), so this may cause unwanted notification noise.

@sanni
Copy link
Owner

sanni commented Oct 23, 2022

I found another 848 bytes by removing all 5 optional features from the display library like explained here: https://github.com/olikraus/u8g2/wiki/u8g2optimization#u8g2-feature-selection

You have to edit the library header file as setting the "without" defines in Cart_Reader.ino doesn't have the same effect.

sybdbsdyybdyb

I have no clue what these extra features do but I didn't notice anything acting different now. The font I'm using is restricted to characters from 32 to 127 so I think we are fine.

Anyway with your trim branch and the lib edit I can enable all modules on HW5:
Sketch uses 251282 bytes (98%) of program storage space. Maximum is 253952 bytes.
Global variables use 6543 bytes (79%) of dynamic memory, leaving 1649 bytes for local variables. Maximum is 8192 bytes.

@vpelletier
Copy link
Collaborator Author

Anyway with your trim branch and the lib edit I can enable all modules on HW5:

HW3 fits as well.

Sketch uses 253586 bytes (99%) of program storage space. Maximum is 253952 bytes.
Global variables use 6532 bytes (79%) of dynamic memory, leaving 1660 bytes for local variables. Maximum is 8192 bytes.

@sanni
Copy link
Owner

sanni commented Oct 23, 2022

I'm gonna be off for a little while but will test your fork later today. Can you enable "Issues" on your repo then we can track them there better.

I found one with a short test, the NES database browsing screen is missing the first letter of the name and also doesn't display the mapper data correctly

IMG_1665

The first letter appears again when browsing right/forward, disappears when browsing left/back. You can test without a NES cart inserted by just selecting any letter in the manual selection screen.

@vpelletier
Copy link
Collaborator Author

Can you enable "Issues" on your repo then we can track them there better.

Done.

the NES database browsing screen is missing the first letter of the name

Nice catch, looking into this.

@vpelletier
Copy link
Collaborator Author

vpelletier commented Oct 23, 2022

Nice catch, looking into this.

Found it, and very likely fixed it, but I found another issue: I cannot go back on the rom list on HW3 (double-click with left button). ...issue which was caused by my broken fix to the first issue. Hurray, first fix-on-fix.

@vpelletier
Copy link
Collaborator Author

Update: with my current trim branch, I am at this level, with the lcd driver options disabled:

Sketch uses 252210 bytes (99%) of program storage space. Maximum is 253952 bytes.
Global variables use 6250 bytes (76%) of dynamic memory, leaving 1942 bytes for local variables. Maximum is 8192 bytes.

This means that there is now enough headroom to re-enable these LCD driver options. While the code does not use them, I believe this is hurting ease of distribution. Especially, it makes updating dependencies cumbersome.

@vpelletier
Copy link
Collaborator Author

vpelletier commented Oct 28, 2022

Here is an update on the global ram space usage (.bss for globals without initialiser, .data for globals with initialiser, descending size down to 16 bytes as an arbitrary cutoff, some N64 stuff missing as this is from my development environment where I already got rid of it):

$ avr-objdump -tC Cart_Reader.ino.elf | grep '\.bss' | sort -rk 1.23,1.3
008014e1 l     O .bss   00000400 buf.9722
0080126a g     O .bss   00000277 .hidden sd
00800d8a g     O .bss   00000200 .hidden sdBuffer
00801918 g     O .bss   0000008c .hidden menuOptions
008010e0 g     O .bss   00000087 .hidden clockgen
00800c6a g     O .bss   00000084 .hidden filePath
00801167 g     O .bss   00000083 .hidden display
00800cee g     O .bss   00000064 .hidden fileName
0080122a g     O .bss   00000040 .hidden myFile
008011ea g     O .bss   00000040 .hidden myLog
00800c30 g     O .bss   00000030 .hidden signature
008019ab g     O .bss   00000026 .hidden folder
008010a6 g     O .bss   00000020 .hidden TwoWire::txBuffer
0080107a g     O .bss   00000020 .hidden twi_masterBuffer.lto_priv.254
00801058 g     O .bss   00000020 .hidden TwoWire::rxBuffer
00801031 l     O .bss   00000020 twi_rxBuffer
0080100f l     O .bss   00000020 twi_txBuffer
00800d60 g     O .bss   00000016 .hidden romName
00800d7a g     O .bss   00000010 .hidden iNES_HEADER
[...]
$ avr-objdump -tC Cart_Reader.ino.elf | grep '\.data' | sort -rk 1.24,1.31
00800204 l     O .data  0000008c menuOptionspceCart
00800325 l     O .data  00000035 u8x8_d_ssd1306_128x64_noname_init_seq
0080035a l     O .data  00000018 u8x8_ssd1306_128x64_noname_display_info
008003cb l     O .data  00000012 vtable for TwoWire
008003b9 g     O .data  00000012 .hidden vtable for Stream
008003a7 g     O .data  00000012 .hidden vtable for StreamFile<FsBaseFile, unsigned long long>
008003a7 g     O .data  00000012 .hidden vtable for FsFile
00800384 l     O .data  00000012 CHR
00800372 l     O .data  00000012 PRG
[...]

Descriptions grouped by theme.

Display:

  • buf.9722 is the OLED screen framebuffer, so I doubt anything can be gained here: it has to be around to be drawn on.
  • display: likewise
  • u8x8_* (HW5 likely has different names): likewise
  • menuOptions could probably be gotten rid off, but I doubt it would bring much gain in practice.

SD-card:

  • I think sd is the SdFs object from Card_Reader.ino, so similarly I do not think there is anything to gain here.
  • sdBuffer should IMHO be gotten rid of in favour of local buffers allocated at the level where block IO is actually happening. But it is used in so many places, there is still a long way to go
  • myFile: likewise
  • filePath: likewise
  • fileName: likewise
  • folder: likewise
  • myLog: if logging is infrequent and away from performance bottlenecks, maybe the logfile could be opened only while printing to the log.

Clock gen:

  • clockgen: this I have not looked in details. If this is only used to configure the clock generator board (as opposed to being needed to keep the clock running) then I think it should only be instanciated when a configuration change is being done, and this would free a significant amount of global ram.
  • TwoWire and twi_*: likewise, these are involved in talking to the clockgen

Cart IO & cart-type-specific stuff:

  • signature: Comes from GBS.ino. I'm not sure how to get rid of this, but I think it should be removed, possibly by forcing a re-read every time gbSmartGetGames is called.
  • romName: Comes from Cart_Reader.ino, used in many places. I guess this can be gotten rid of by letting each adapter manage its own rom name. But like sdBuffer & friends, it is used all over the place.
  • iNES_HEADER: Not many users, but in different parts of the source code. Still, probably easier to get rid of than most of the other globals.
  • menuOptionspceCart: This should be possible to get rid of, and is on the large side of things. EDIT: removed in Free more global ram space #599
  • CHR and PRG: Come from NES.ino. At a glance it should be possible to move to PROGMEM. EDIT: removed in Free more global ram space #599

@vpelletier
Copy link
Collaborator Author

While I think there is still more that can be done on this general flash-and-ram-usage-optimisation topic, I think this specific bug can be closed: since 11.1, I see that all modules are enabled again by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants