-
-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant strings (and other data) moved to flash memory #697
Comments
There were some good ideas here: SuperHouse/esp-open-rtos#11 I can across a semi-similar thing when looking at porting to Arduino, and in the end I gave up. IMO it's just too much effort to wrap every function call this way (not to mention it's a massive source of potential bugs). It's kind of frustrating the toolchain can't deal with it in a better way. Since Espruino has its own implementation of the string functions (on properly embedded systems quite a few of the built-in functions pull in stuff that really bloats the binary) perhaps you could just reimplement those functions so that they do aligned 32 bit reads, then all strings could go into Flash and you'd be fine? |
... just to add that as far as I can gather, since the ESP8266 isn't harvard architecture you don't have to worry about code meant for strings being used for something else because it will 'just work'. It actually makes things a lot easier. As that post you found says, you could just make every read an aligned 32 bit read and everything would be fine, if slow and a bit bloated. For now, there's some 'low hanging fruit':
Those could be put in flash pretty easily and aligned reads could be performed on them, since the only stuff that reads them is in those files. |
Are we sure that the ESP8266 isn't a Harvard Architecture? I believe that the processor core in an ESP8266 is an Xtensa-106 ... and this business wire about its announcement says: Using a traditional Harvard architecture, it features separate local, tightly coupled, instruction and data RAMs to eliminate memory contention and provide fast performance on performance-critical code and interrupt handling routines. RAM size is user selectable up to 128K bytes. |
Interesting... I only said that because that's what @tve had said in Gitter IIRC? Some processors (like ARM) effectively have separate memories (Flash + RAM) on (I think) different busses, but have a kind of 'bus matrix' that allows them both to be accessed in the same address space. I guess the Xtensa may do something similar. It's actually pretty impressive that Espruino works as well as it does since it accesses flash memory so much for the symbol tables - I guess the extra RAM really helps. |
At this point I am labeling this as an optimization because the amount of free RAM is quite fine and not a top-priority. The open work items I believe are:
|
I'd been vaguely wondering about moving all the error messages/warnings to a separate file: #50 It'd help with localisation, might help to reduce duplication (the compiler may merge the same things, but some differ only by a character or two), and as these things don't have to be accessed quickly, we could potentially store them compressed, decompress the whole thing on demand and just pull out the relevant bits. Not an issue on ESP8266, but if you've got 128kB of flash and 20kB of it is strings, it could help a lot. |
@tve targets/esp8266/esp8266_board_utils.c These strings are using in macros that use os_printf Am I correct in assuming that anything that uses os_printf is ok to from FLASH_STR, so it does not need to be copied to ram first? If this is the case, I would like to work with the following list and see if I can increase the available heap space. The change above yields 144 bytes, it does not seem like much but I can do a few modules it will add up... Next target was the string in ota.c. Here is the output from topstrings: 765 ./src/jsinteractive.o (in the makefile I had to change .rodata.str1.4 to .rodata.str1.1) - should I push this as an update - I'm not sure of the cause of this |
This ought to crash 'cause os_printf doesn't copy every arg to ram, only the first one. |
Make sure you compile with I'd really recommend using the exception handling code that got linked somewhere else recently. It'd make life a million times easier, especially for the os_printf case where speed is not an issue. It's also avoid completely screwing up the Espruino codebase for the sake of ESP8266 - I won't accept PRs that make code unreadable for the sake of one platform :( By the way: I'm still not convinced about topstrings' accuracy - |
Thanks for pointing out jswrapper.o. Taking a look at it, some strings end up in Overall I see four options for reducing the size:
I have to admit I like a combination of 1 and 3 the best, although 1 and 4 are also appealing. I have a hard time believing that 2 can work reasonably (hey, but someone can prove me wrong :-). |
I think #1 is a really good start (maybe with #2 as a fallback for non-critical places) - #3 is more difficult - |
Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:
This brings rodata from 17312 to 14492 bytes, so that's how much room is left for improvement :-) |
Ah, I hadn't seen e.setBootCode, I'll have to give that a try. The 2800 bytes potentially saved by jswSymbols are also very attractive... |
Mhh, another approach for jswrapper.c could be to change JswSymPtr and JswSymList to only have word-size & aligned fields and then move all the static data in jswrapper.c into flash. That could easily yield 5KB of total savings, if not more... It might also reduce the changeset. |
It's pretty recent (this week I think?). There is some room for re-using substrings in functions but that's a bit dodgy if you can't ensure that the memory area will stay around...
Sounds good :) Most of the changes should be inside jswrapper.c... Since most stuff that accesses the tables is in jswrapper, you could almost skip the alignment too. |
Won't the code that accesses the strings also need to be modified? These could also go into flash:
Are the alignments of the |
This kind of function call is used although the code base:
I've made a wrapper for jsvObjectSetChildAndUnLock that declares the string literal in flash memory, and then copies to a static buffer before the call to the real jsvObjectSetChildAndUnLock.
@gfwilliams it doesn't corrupt the code base too much... you might be able to suggest something that will clean this up further:
This is using the COUNTER_ macro, so for the code example: becomes:
and the next line would This take the headache of having to declare all the strings before hand. Where this doesn't work is when it's part of an if statement, or the 2nd arg is not a string literal - so I had to introduce a macro to stop substitution from occurring:
This is the part that you might not like Gordon! Anyway, I've applied through the code base: The saving is (without graphics with crypto): Before: After: So 608 bytes of heap freed up. |
@tve this looks very promising.... What changes need to be made in jswrap_object.c to access the strings in flash? Sorry - I don't know the code well enough to know where to start looking ;-( |
Line 66 in 2d30d81
Line 248 in db470b6
Will this be sufficient? Will the Will the I tried above (Without the aligned(4), however the elf is now too big:
|
taking out the PACKED_FLAGS, the structures above, and changing in jswrapper:
So still no closer ;-( |
Yeah, it's not ideal - and honestly I doubt most people adding code in the future will fully appreciate why they need In fact, I don't understand why it's in places like this? master...wilberforce:master#diff-1e3672c0223d1668a1afdbb00fffe1f2R137 I really think it'd be worth trying to add the exception handler, and then changing the linker to just dump all strings into flash. My guess is performance wouldn't be hit anywhere near as badly as you think, and the few places where it was hit could be fixed, rather than having to change all code (and even then, we're still missing loads of strings that aren't used via
That'd be a really bad idea :) I'm not sure why changing it like you did above wouldn't work though - but the fact it doesn't reset is a good sign :) My guess is that |
It is places like this:
Because the first part of the macro expands to Which declares the literal, and the second part is the function call. The declaration does not work because the if statement is preceding. If the statement was rewritten to use a block, I guess it would work:
.. So if I put the { } in the macro expansion, then it would be transparent. |
Oh wow, that's super nasty :) You can always do stuff like But to me this just seems like madness - the code's being made a lot less readable just to save 600 bytes - and over time, |
Here is the reference to the exception handler I'm not sure how to add this into the code - is there a vector in the link script to add this function too? From what I have read it can have performance hit of approx 6x. |
The trouble was that until I tried this, I didn't know what the saving would be! I think the jswrapper strings and other constants are an easier target! |
Started to check precompiled stuff. There are many defines that add the need attributes. Lets have a look at a jsiConsolePrintf define
source
precompiled:
So Gordon provided many defines and macros to move Strings into irom, they just have to be use correct in the ESP8266 code ;-) |
There is another way to do this - and that is at linking time. This is how moddable do it:
They have different make targets that auto-move stuff.... This way you don't need to change the source files which will make @gfwilliams happy. I had added a similar this for the crypto look up tables: Espruino/make/targets/ESP8266.make Lines 45 to 52 in 4c7f781
This moves the tables into flash. The move will put the data on a 4 byte alignment, however when this is accessed, it needs to be read on a word boundary. |
Yes exactly, that is what Gordon did in the RODATA branch. |
So let’s try the rename section file by file to identify the cause for the reboot with branch RODATA. |
Well this is frustrating. @wilberforce linked an exception handler 3 years ago, but that file is gone now and I can't see it anywhere on the internet! #807 (comment) But if we could find it, that'd be ideal. Rather than just adding hacky patches whenever we find a crash in the ESP8266 build, the exception handler would just handle the problem in the rare cases it occurs. I really don't want to add more ESP8266 specific memory #defines into Espruino (in an ideal world we'd be able to remove all of them), so linker based options are definitely preferable. |
I totally agree! Will try to find this handler, cesanta is still on github: |
Is it that your are looking for? |
Thanks, got it |
If mongoose are no longer using this - it was 3 years ago - this suggests they have found a better way. |
as far as I can see, this is their new way https://github.com/cesanta/mongoose-os/blob/master/platforms/esp8266/src/esp_exc_vectors.S Edit: And they now uses a different SDK |
Found some rules from Daniel Casten https://www.danielcasner.org/guidelines-for-writing-code-for-the-esp8266/
-> objcopy with --rename-section will do.
-> how to avoid this? Any ideas? |
this is exactly what |
use decorator CALLED_FROM_INTERRUPT |
@gfwilliams Are there any Espruino specific function that needs the
decoration ? Because those need to stay in |
Most of the handlers will be in esp8266 specific code - so you have control. Only if these call other espruino code will you have an issue. [EDIT] https://github.com/espruino/Espruino/search?q=CALLED_FROM_INTERRUPT&unscoped_q=CALLED_FROM_INTERRUPT Line 350 in 5b447a6
|
Thanks! there was this change done by tve: rename ICACHE_RAM_ATTR to CALLED_FROM_INTERRUPT |
Any idea why adding \0 to a string moves that string from section .rodata.str1.1 to section .rodata?
because than the objdump with --rename-section in make/targets/ESP8266.make moves it to section .irom.text which means it is stored in flash.
|
No idea at all - perhaps by having two I wouldn't be happy adding |
will use objdump and --rename-section to do the job ;-) |
Replacing sprintf with espruino_snprintf and added a espruino_snprintf_flash version so all added this to jsutils.h
and this to jsutils.c
sample code before
and after:
Will create a PR for this after replacing all @wilberforce might this be helpful for ESP32 as well? |
got compile errors for libs/network/esp8266/ota.c len = espruino_snprintf(buf, sizeof(buf), responseFmt, code, status, os_strlen(text), text);
Espruino/libs/network/esp8266/ota.c Lines 67 to 83 in 9714718
|
It's because you modified the definition of I'd leave the definition of espruino_snprintf alone, but just whenever you use it with a constant string, do:
Also I should add that I emailed you that example line with |
HI @wilberforce, Do you know why we use different sections? Espruino/make/targets/ESP8266.make Lines 45 to 53 in 8744671
Lines 53 to 54 in 8744671
In the test I switched to .irom0.text for both. |
.irom.literal vs .irom0.text ? I don't know why - but "if it an't broke why fix it" ? You would ned to look in the .ld linking script to see what the different is - perhaps the order of how they go into flash one will be before the other. What the real world impact would be - I don't know. |
ESP8266: replace os_sprintf with espruino_snprintf and flash space round about 2200 bytes
to
|
That's a nice gain |
#1677 replace sprintf() with more stable espruino_snprintf() not implemented, to much changes to jsutils with to much side effects for other boards |
#1679 add one more section to flash with ld option --rename, only ESP8266_4MB |
close it for now and reopen if someone likes to add more optimizations. |
With the port of Espruino to the ESP8266, an interesting characteristic has been uncovered. It appears that on some architectures, constant strings (which are strings that are defined as constants ... usually between double quotes) are placed in precious data RAM storage as opposed to a potential alternative which is that they be held in flash memory. There have been a number of discussions in this topic area in general. This issue is being raised to examine what opportunities might exist within the Espruino code base to take advantage of this.
Here are some links to potentially related materials:
From a notional perspective, it would be ideal if the Espruino code could be compiled or linked in some fashion that would "just" place the constant data in flash. However, that may not be possible. If it weren't possible, the next question would be "If a preparation could be performed on String constants in the code base, would this issue warrant the rework of the code base to achieve that?".
For example, if instead of coding:
we might have to code:
would such an undertaking to work through as much of the source code as we could be worth it? Obviously a default macro would result in zero change to the data ... so the cost would be in the dimensions of "time" and "benefit". It is suspected that one will trade app execution speed for reduction in RAM usage.
The text was updated successfully, but these errors were encountered: