Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant strings (and other data) moved to flash memory #697

Closed
nkolban opened this issue Nov 8, 2015 · 121 comments
Closed

Constant strings (and other data) moved to flash memory #697

nkolban opened this issue Nov 8, 2015 · 121 comments
Labels
ESP8266 This is only a problem on ESP8266 devices optimisation

Comments

@nkolban
Copy link
Contributor

nkolban commented Nov 8, 2015

With the port of Espruino to the ESP8266, an interesting characteristic has been uncovered. It appears that on some architectures, constant strings (which are strings that are defined as constants ... usually between double quotes) are placed in precious data RAM storage as opposed to a potential alternative which is that they be held in flash memory. There have been a number of discussions in this topic area in general. This issue is being raised to examine what opportunities might exist within the Espruino code base to take advantage of this.

Here are some links to potentially related materials:

From a notional perspective, it would be ideal if the Espruino code could be compiled or linked in some fashion that would "just" place the constant data in flash. However, that may not be possible. If it weren't possible, the next question would be "If a preparation could be performed on String constants in the code base, would this issue warrant the rework of the code base to achieve that?".

For example, if instead of coding:

if (strcmp("Hello", myData) == 0)

we might have to code:

if (strcmp(F("Hello"), myData) == 0)

would such an undertaking to work through as much of the source code as we could be worth it? Obviously a default macro would result in zero change to the data ... so the cost would be in the dimensions of "time" and "benefit". It is suspected that one will trade app execution speed for reduction in RAM usage.

@gfwilliams
Copy link
Member

There were some good ideas here: SuperHouse/esp-open-rtos#11

I can across a semi-similar thing when looking at porting to Arduino, and in the end I gave up. IMO it's just too much effort to wrap every function call this way (not to mention it's a massive source of potential bugs). It's kind of frustrating the toolchain can't deal with it in a better way.

Since Espruino has its own implementation of the string functions (on properly embedded systems quite a few of the built-in functions pull in stuff that really bloats the binary) perhaps you could just reimplement those functions so that they do aligned 32 bit reads, then all strings could go into Flash and you'd be fine?

@gfwilliams
Copy link
Member

... just to add that as far as I can gather, since the ESP8266 isn't harvard architecture you don't have to worry about code meant for strings being used for something else because it will 'just work'. It actually makes things a lot easier.

As that post you found says, you could just make every read an aligned 32 bit read and everything would be fine, if slow and a bit bloated.

For now, there's some 'low hanging fruit':

  • Symbol tables in jswrapper.c
  • jslex.c string table

Those could be put in flash pretty easily and aligned reads could be performed on them, since the only stuff that reads them is in those files.

@nkolban
Copy link
Contributor Author

nkolban commented Nov 8, 2015

Are we sure that the ESP8266 isn't a Harvard Architecture? I believe that the processor core in an ESP8266 is an Xtensa-106 ... and this business wire about its announcement says:


Using a traditional Harvard architecture, it features separate local, tightly coupled, instruction and data RAMs to eliminate memory contention and provide fast performance on performance-critical code and interrupt handling routines. RAM size is user selectable up to 128K bytes.


@gfwilliams
Copy link
Member

Interesting... I only said that because that's what @tve had said in Gitter IIRC?

Some processors (like ARM) effectively have separate memories (Flash + RAM) on (I think) different busses, but have a kind of 'bus matrix' that allows them both to be accessed in the same address space. I guess the Xtensa may do something similar.

It's actually pretty impressive that Espruino works as well as it does since it accesses flash memory so much for the symbol tables - I guess the extra RAM really helps.

@gfwilliams
Copy link
Member

@tve already did a whole bunch here: 294f3ff

IMO jswrapper.c (the majority of which isn't actually strings but is still const data) could maybe be moved over too.

@gfwilliams gfwilliams added the ESP8266 This is only a problem on ESP8266 devices label Nov 26, 2015
@tve
Copy link
Contributor

tve commented Nov 26, 2015

At this point I am labeling this as an optimization because the amount of free RAM is quite fine and not a top-priority. The open work items I believe are:

  • move constants in jswrapper to flash
  • find more string constants that can be moved to flash, the second argument of jsExceptionHere may be a good candidate

@gfwilliams
Copy link
Member

I'd been vaguely wondering about moving all the error messages/warnings to a separate file: #50

It'd help with localisation, might help to reduce duplication (the compiler may merge the same things, but some differ only by a character or two), and as these things don't have to be accessed quickly, we could potentially store them compressed, decompress the whole thing on demand and just pull out the relevant bits.

Not an issue on ESP8266, but if you've got 128kB of flash and 20kB of it is strings, it could help a lot.

@wilberforce
Copy link
Member

@tve
I'm looking to free up more heap space and want to ensure I'm on the right track here:

master...wilberforce:master

targets/esp8266/esp8266_board_utils.c

These strings are using in macros that use os_printf

Am I correct in assuming that anything that uses os_printf is ok to from FLASH_STR, so it does not need to be copied to ram first?

If this is the case, I would like to work with the following list and see if I can increase the available heap space.

The change above yields 144 bytes, it does not seem like much but I can do a few modules it will add up...

Next target was the string in ota.c.

Here is the output from topstrings:

765 ./src/jsinteractive.o
542 ./libs/network/socketserver.o
519 ./gen/jswrapper.o
438 ./libs/network/esp8266/jswrap_esp8266_network.o
387 ./libs/network/esp8266/ota.o
295 ./src/jslex.o
245 ./src/jsvar.o
225 ./src/jsparse.o
216 ./libs/network/telnet/jswrap_telnet.o
214 ./src/jswrap_process.o
207 ./targets/esp8266/esp8266_board_utils.o
207 ./src/jswrap_json.o
196 ./src/jswrap_io.o
196 ./libs/network/socketerrors.o
176 ./libs/network/esp8266/network_esp8266.o

(in the makefile I had to change .rodata.str1.4 to .rodata.str1.1) - should I push this as an update - I'm not sure of the cause of this

@tve
Copy link
Contributor

tve commented Apr 9, 2016

This ought to crash 'cause os_printf doesn't copy every arg to ram, only the first one.

@gfwilliams
Copy link
Member

Make sure you compile with RELEASE=1, or you'll be wasting effort moving stuff that won't even be in releases.

I'd really recommend using the exception handling code that got linked somewhere else recently. It'd make life a million times easier, especially for the os_printf case where speed is not an issue.

It's also avoid completely screwing up the Espruino codebase for the sake of ESP8266 - I won't accept PRs that make code unreadable for the sake of one platform :(

By the way: I'm still not convinced about topstrings' accuracy - jswrapper.c contrains LOADS of const data that is presumably put into RAM - I'm pretty sure it's many kilobytes rather than just 542B. I'm not really sure why it's not getting reported.

@tve
Copy link
Contributor

tve commented Apr 9, 2016

Thanks for pointing out jswrapper.o. Taking a look at it, some strings end up in .rodata.str1.1 and some in .rodata. I believe this has to do with string constant merging. So far I don't understand why strings end up in one or the other section, for example const char[] and const char* seem to behave differently...

Overall I see four options for reducing the size:

  • 1 target specific places, such as jswBinarySearch which consumes lots of strings and add esp8266 specific macros (I understand about not making the code a mess)
  • 2 use the exception handler and hope performance doesn't become abysmal
  • 3 provide a simple way to load JS libraries into flash and execute them from flash
  • 4 use RAM as cache for JS code in flash (the variable_cache branch)

I have to admit I like a combination of 1 and 3 the best, although 1 and 4 are also appealing. I have a hard time believing that 2 can work reasonably (hey, but someone can prove me wrong :-).

@gfwilliams
Copy link
Member

I think #1 is a really good start (maybe with #2 as a fallback for non-critical places) - Object.getOwnProperties uses it, but that could be handled via the exception handler without a huge hit. Given jswrapper is auto-generated anyway, doing it in a non-messy (or not any more messy than it is already!) way should be pretty easy.

#3 is more difficult - E.setBootCode will cause everything to be run from flash already (and the Web IDE will use it with an option), but it doesn't help you much. The worst offenders (unless you have huge code) are the variable names themselves, which would be very difficult to move into flash (at that stage, #4 looks more appealing to me at least)

@tve
Copy link
Contributor

tve commented Apr 9, 2016

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

@@ -313,10 +313,13 @@ for b in builtins:
 codeOut('');
 codeOut('');

+for b in builtins:
+  builtin = builtins[b]
+  codeOut("FLASH_STR(jswSymbols_"+builtin["name"]+"_str, " + builtin["symbolTableChars"] +");");
 codeOut('const JswSymList jswSymbolTables[] = {');
 for b in builtins:
   builtin = builtins[b]
-  codeOut("  {"+", ".join(["jswSymbols_"+builtin["name"], builtin["symbolTableCount"], builtin["sym
+  codeOut("  {"+", ".join(["jswSymbols_"+builtin["name"], builtin["symbolTableCount"], "jswSymbols_
 codeOut('};');

 codeOut('');

This brings rodata from 17312 to 14492 bytes, so that's how much room is left for improvement :-)

@tve
Copy link
Contributor

tve commented Apr 9, 2016

Ah, I hadn't seen e.setBootCode, I'll have to give that a try. The 2800 bytes potentially saved by jswSymbols are also very attractive...

@tve
Copy link
Contributor

tve commented Apr 9, 2016

Mhh, another approach for jswrapper.c could be to change JswSymPtr and JswSymList to only have word-size & aligned fields and then move all the static data in jswrapper.c into flash. That could easily yield 5KB of total savings, if not more... It might also reduce the changeset.

@gfwilliams
Copy link
Member

Ah, I hadn't seen e.setBootCode

It's pretty recent (this week I think?). There is some room for re-using substrings in functions but that's a bit dodgy if you can't ensure that the memory area will stay around...

That could easily yield 5KB of total savings, if not more...

Sounds good :) Most of the changes should be inside jswrapper.c... Since most stuff that accesses the tables is in jswrapper, you could almost skip the alignment too.

@wilberforce
Copy link
Member

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

Won't the code that accesses the strings also need to be modified?

These could also go into flash:

static const JswSymPtr jswSymbols_Object[] = {
  {0, (void (*)(void))jswrap_object_create, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))},
  {7, (void (*)(void))jswrap_object_defineProperties, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2))},
  {24, (void (*)(void))jswrap_object_defineProperty, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2)) | (JSWAT_JSVAR << (JSWAT_BITS*3))},
  {39, (void (*)(void))jswrap_object_getOwnPropertyDescriptor, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1)) | (JSWAT_JSVAR << (JSWAT_BITS*2))},
  {64, (void (*)(void))gen_jswrap_Object_getOwnPropertyNames, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))},
  {84, (void (*)(void))gen_jswrap_Object_keys, JSWAT_JSVAR | (JSWAT_JSVAR << (JSWAT_BITS*1))}
};

Are the alignments of the JswSymPtr structure all be on 4 byte boundaries and are the elements 4 bytes long?

@wilberforce
Copy link
Member

This kind of function call is used although the code base:

  jsvObjectSetChildAndUnLock(jsDetails, "authMode", jsvNewFromString(wifiAuth[config.authmode]));
  jsvObjectSetChildAndUnLock(jsDetails, "hidden", jsvNewFromBool(config.ssid_hidden));
  jsvObjectSetChildAndUnLock(jsDetails, "maxConn", jsvNewFromInteger(config.max_connection));

I've made a wrapper for jsvObjectSetChildAndUnLock that declares the string literal in flash memory, and then copies to a static buffer before the call to the real jsvObjectSetChildAndUnLock.

It's also avoid completely screwing up the Espruino codebase for the sake of ESP8266 - I won't accept PRs that make code unreadable for the sake of one platform :(

@gfwilliams it doesn't corrupt the code base too much... you might be able to suggest something that will clean this up further:

#define READ_FLASH_UINT8(ptr) ({ uint32_t __p = (uint32_t)(char*)(ptr); volatile uint32_t __d = *(uint32_t*)(__p & (uint32_t)~3); ((uint8_t*)&__d)[__p & 3]; })

#define __concat(o,l) __##o##l
#define jsvObjectSetChildAndUnLock(obj,name,child) jsvObjectSetChildAndUnLock_FLASH_CTR(obj,__COUNTER__,name,child)
#define jsvObjectSetChildAndUnLock_FLASH_CTR(obj,lbl,name,child) FLASH_STR(__concat(obj,lbl),name);\
jsvObjectSetChildAndUnLock_flash(obj,__concat(obj,lbl), child )

#define NOT_FLASH_LITERAL /* no flash literal */

#else

/** Read a uint8_t from this pointer, which could be in RAM or Flash.
    On ARM this is just a standard read, it's different on ESP8266 */
#define READ_FLASH_UINT8(ptr) (*(uint8_t*)(ptr))

#define NOT_FLASH_LITERAL

#endif

This is using the COUNTER_ macro, so for the code example:
jsvObjectSetChildAndUnLock(jsDetails, "authMode", jsvNewFromString(wifiAuth[config.authmode]));

becomes:

FLASH_STR( __jsDetails0,"authMode" )
jsvObjectSetChildAndUnLock_flash(jsDetails, __jsDetails0, jsvNewFromString(wifiAuth[config.authmode]));

and the next line would __jsDetails1 etc...

This take the headache of having to declare all the strings before hand.

Where this doesn't work is when it's part of an if statement, or the 2nd arg is not a string literal - so I had to introduce a macro to stop substitution from occurring:

jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(funcs, buf, func);

if (arrayBuffer2) jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(waveform, "buffer2", arrayBuffer2);

This is the part that you might not like Gordon!

Anyway, I've applied through the code base:
master...wilberforce:master

The saving is (without graphics with crypto):

Before:
"freeHeap": 4576,

After:
"freeHeap": 5184

So 608 bytes of heap freed up.

@wilberforce
Copy link
Member

Looks like moving the string constants in jswSymbolTables to flash can save 2820 bytes of RAM :-). It's a pretty simple change in build_jswrapper.py and jswrap_object.c:

@tve this looks very promising....

What changes need to be made in jswrap_object.c to access the strings in flash?

Sorry - I don't know the code well enough to know where to start looking ;-(

@wilberforce
Copy link
Member

@tve

Mhh, another approach for jswrapper.c could be to change JswSymPtr and JswSymList to only have word-size & aligned fields and then move all the static data in jswrapper.c into flash. That could easily yield 5KB of total savings, if not more... It might also reduce the changeset.

} PACKED_FLAGS JswSymPtr;

/// Structure for each symbol in the list of built-in symbols
typedef struct {
  unsigned short strOffset;
  void (*functionPtr)(void);
  unsigned short functionSpec; // JsnArgumentType
} PACKED_FLAGS JswSymPtr;

/// Information for each list of built-in symbols
typedef struct {
  const JswSymPtr *symbols;
  unsigned char symbolCount;
  const char *symbolChars;
} PACKED_FLAGS JswSymList;

#define PACKED_FLAGS __attribute__ ((__packed__))

#define PACKED_FLAGS  __attribute__ ((__packed__))
#if defined(ESP8266)
// Don't pack
#define PACKED_FLAGS
#else
/// Used when we have enums we want to squash down
#define PACKED_FLAGS  __attribute__ ((__packed__))
#endif

Will this be sufficient? Will the unsigned short strOffset;by padded to 4 bytes if the declaration uses attribute ((__aligned(4)))?

Will the unsigned short strOffset;by padded to 4 bytes?

I tried above (Without the aligned(4), however the elf is now too big:

LD espruino_esp8266_partial.o
LD espruino_esp8266_user2.elf
LD espruino_esp8266_user1.elf
/home/esp8266/esp-open-sdk/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/4.8.2/../../../../xtensa-lx106-elf/bin/ld: address 0x3fffd0a8 of espruino_esp8266_user1.elf section `.bss' is not within region `dram0_0_seg'
/home/esp8266/esp-open-sdk/xtensa-lx106-elf/lib/gcc/xtensa-lx106-elf/4.8.2/../../../../xtensa-lx106-elf/bin/ld: address 0x3fffd0a8 of espruino_esp8266_user1.elf section `.bss' is not within region `dram0_0_seg'

@wilberforce
Copy link
Member

taking out the PACKED_FLAGS, the structures above, and changing in jswrapper:

static const JswSymPtr jswSymbols_Wifi[] = {
to
static JswSymPtr jswSymbols_Wifi[] __attribute__((section(".irom"))) __attribute__((aligned(4))) = {

>require("Wifi").scan(function(x){console.log(x)});
Uncaught Error: Function "scan" not found!
 at line 1 col 17
require("Wifi").scan(function(x){console.log(x)});

So still no closer ;-(

@gfwilliams
Copy link
Member

master...wilberforce:master

Yeah, it's not ideal - and honestly I doubt most people adding code in the future will fully appreciate why they need NOT_FLASH_LITERAL.

In fact, I don't understand why it's in places like this? master...wilberforce:master#diff-1e3672c0223d1668a1afdbb00fffe1f2R137

I really think it'd be worth trying to add the exception handler, and then changing the linker to just dump all strings into flash. My guess is performance wouldn't be hit anywhere near as badly as you think, and the few places where it was hit could be fixed, rather than having to change all code (and even then, we're still missing loads of strings that aren't used via jsvObjectSetChildAndUnLock).

Will #define PACKED_FLAGS this be sufficient?

That'd be a really bad idea :) PACKED_FLAGS is used all over the place and would break LOADS of stuff if it was changed. You'd have to tweak just jswrapper.c (as you did)

I'm not sure why changing it like you did above wouldn't work though - but the fact it doesn't reset is a good sign :) My guess is that unsigned char symbolCount/etc will still force a non-word read, even if it is aligned. try changing that (and others) to uint32_t.

@wilberforce
Copy link
Member

It is places like this:

 if (addrStart>0)
-    jsvObjectSetChildAndUnLock(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));
+    jsvObjectSetChildAndUnLock NOT_FLASH_LITERAL(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));

Because the first part of the macro expands to
FLASH_STR

Which declares the literal, and the second part is the function call.

The declaration does not work because the if statement is preceding. If the statement was rewritten to use a block, I guess it would work:

 if (addrStart>0) {
    jsvObjectSetChildAndUnLock(obj, "protocol", jsvNewFromStringVar(url, 0, (size_t)addrStart-1));
}

.. So if I put the { } in the macro expansion, then it would be transparent.

@gfwilliams
Copy link
Member

Oh wow, that's super nasty :) You can always do stuff like do { ... } while (0) to avoid that I guess.

But to me this just seems like madness - the code's being made a lot less readable just to save 600 bytes - and over time, jsvObjectGetChild gets added, jsvObjectSetChild does, and more and more, until it's just this insane mess.

@wilberforce
Copy link
Member

Here is the reference to the exception handler
#807 (comment)

I'm not sure how to add this into the code - is there a vector in the link script to add this function too?

From what I have read it can have performance hit of approx 6x.

@wilberforce
Copy link
Member

  • the code's being made a lot less readable just to save 600 bytes

The trouble was that until I tried this, I didn't know what the saving would be!

I think the jswrapper strings and other constants are an easier target!

@MaBecker
Copy link
Contributor

Started to check precompiled stuff. There are many defines that add the need attributes.

Lets have a look at a jsiConsolePrintf define

#define jsiConsolePrintf(fmt, ...) do { \
    FLASH_STR(flash_str, fmt); \
    jsiConsolePrintf_int(flash_str, ##__VA_ARGS__); \
} while(0)

source

  jsiConsolePrintf(
    "Flash map %s, manuf 0x%x chip 0x%x\n",
    ( map == 2  && espFlashKB == 1024  && strcmp(PC_BOARD_ID,"ESP8266_4MB") == 0) ? flash_maps_alt[map] : flash_maps[map], 
    fid & 0xff, chip);

precompiled:

  do { static const char flash_str[] __attribute__((section(".irom.literal"))) __attribute__((aligned(4))) = "Flash map %s, manuf 0x%x chip 0x%x\n"; jsiConsolePrintf_int(flash_str, ( map == 2 && espFlashKB == 1024 && strcmp("ESP8266_4MB","ESP8266_4MB") == 0) ? flash_maps_alt[map] : flash_maps[map], fid & 0xff, chip); } while(0)

So Gordon provided many defines and macros to move Strings into irom, they just have to be use correct in the ESP8266 code ;-)

@wilberforce
Copy link
Member

wilberforce commented May 28, 2019

There is another way to do this - and that is at linking time.

This is how moddable do it:

https://github.com/Moddable-OpenSource/moddable/blob/public/tools/mcconfig/make.esp.strings-in-flash.mk#L1-L3

	@echo "# cc" $(<F) "(strings in flash)"
	$(CC) $(C_DEFINES) $(C_INCLUDES) $(C_FLAGS) $< -o $@.unmapped
	$(TOOLS_BIN)/xtensa-lx106-elf-objcopy --rename-section .rodata.str1.1=.irom0.str.1 $@.unmapped $@

They have different make targets that auto-move stuff....

This way you don't need to change the source files which will make @gfwilliams happy.

I had added a similar this for the crypto look up tables:

ifdef USE_CRYPTO
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha1.o
ifdef USE_SHA256
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha256.o
endif
ifdef USE_SHA512
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha512.o
endif

This moves the tables into flash. The move will put the data on a 4 byte alignment, however when this is accessed, it needs to be read on a word boundary.

@MaBecker
Copy link
Contributor

Yes exactly, that is what Gordon did in the RODATA branch.

@MaBecker
Copy link
Contributor

So let’s try the rename section file by file to identify the cause for the reboot with branch RODATA.

@gfwilliams
Copy link
Member

Well this is frustrating. @wilberforce linked an exception handler 3 years ago, but that file is gone now and I can't see it anywhere on the internet! #807 (comment)

But if we could find it, that'd be ideal. Rather than just adding hacky patches whenever we find a crash in the ESP8266 build, the exception handler would just handle the problem in the rare cases it occurs.

I really don't want to add more ESP8266 specific memory #defines into Espruino (in an ideal world we'd be able to remove all of them), so linker based options are definitely preferable.

@MaBecker
Copy link
Contributor

I really don't want to add more ESP8266 specific memory #defines into Espruino (in an ideal world we'd be able to remove all of them), so linker based options are definitely preferable.

I totally agree!

Will try to find this handler, cesanta is still on github:
https://github.com/cesanta

@Frida854
Copy link

Is it that your are looking for?
https://github.com/cesanta/mongoose-os/releases/tag/beta3
I found them in mongoose-os-beta3.tar.gz

Skærmbillede fra 2019-05-28 12-17-54

@MaBecker
Copy link
Contributor

MaBecker commented May 28, 2019

I found them in mongoose-os-beta3.tar.gz

Thanks, got it

@wilberforce
Copy link
Member

If mongoose are no longer using this - it was 3 years ago - this suggests they have found a better way.

@MaBecker
Copy link
Contributor

MaBecker commented May 29, 2019

as far as I can see, this is their new way

https://github.com/cesanta/mongoose-os/blob/master/platforms/esp8266/src/esp_exc_vectors.S

Edit: And they now uses a different SDK

@MaBecker
Copy link
Contributor

Found some rules from Daniel Casten https://www.danielcasner.org/guidelines-for-writing-code-for-the-esp8266/

  1. Application code should have the ICACHE_FLASH_ATTR decorator unless it is executed very often.

-> objcopy with --rename-section will do.

  1. All interrupt handlers must not have the ICACHE_FLASH_ATTR decorator and any code which executes very often should not have the decorator.

-> how to avoid this?

Any ideas?

@MaBecker
Copy link
Contributor

A compiler flag to enable handling of unaligned accesses

this is exactly what CFLAGS += -mforce-l32does, but it does not handle issue 2.

@MaBecker
Copy link
Contributor

  1. All interrupt handlers must not have the ICACHE_FLASH_ATTR decorator and any code which executes very often should not have the decorator.

use decorator CALLED_FROM_INTERRUPT

@MaBecker
Copy link
Contributor

MaBecker commented May 30, 2019

@gfwilliams Are there any Espruino specific function that needs the

LOCAL ... CALLED_FROM_INTERRUPT

decoration ?

Because those need to stay in .iram1.text

@wilberforce
Copy link
Member

wilberforce commented May 30, 2019

Most of the handlers will be in esp8266 specific code - so you have control. Only if these call other espruino code will you have an issue.

[EDIT]
Looking here - it's only used in a few places:

https://github.com/espruino/Espruino/search?q=CALLED_FROM_INTERRUPT&unscoped_q=CALLED_FROM_INTERRUPT

void CALLED_FROM_INTERRUPT jshIOEventOverflowed() {

@MaBecker
Copy link
Contributor

Thanks!

there was this change done by tve:

rename ICACHE_RAM_ATTR to CALLED_FROM_INTERRUPT

dcea5d3#diff-0da56b4ff90ef37d01c86b4ce97b6232

@MaBecker
Copy link
Contributor

MaBecker commented Jun 4, 2019

Any idea why adding \0 to a string moves that string from section .rodata.str1.1 to section .rodata?

xtensa-lx106-elf-objdump -j .rodata.str1.1 -s libs/hello/jswrap_hello.o

libs/hello/jswrap_hello.o:     file format elf32-xtensa-le

Contents of section .rodata.str1.1:
 0000 48656c6c 6f20576f 726c6420 286a7369  Hello World (jsi
 0010 436f6e73 6f6c6550 72696e74 53747269  ConsolePrintStri
 0020 6e672921 0d0a00                      ng)!...     
xtensa-lx106-elf-objdump -j .rodata -s libs/hello/jswrap_hello.o

libs/hello/jswrap_hello.o:     file format elf32-xtensa-le

Contents of section .rodata:
 0000 48656c6c 6f20576f 726c6420 286a7369  Hello World (jsi
 0010 436f6e73 6f6c6550 72696e74 53747269  ConsolePrintStri
 0020 6e672921 0d0a0000                    ng)!....        

because than the objdump with --rename-section in make/targets/ESP8266.make moves it to section .irom.text which means it is stored in flash.

$(Q)$(OBJCOPY) --rename-section .rodata=.irom.literal --rename-section .text=.irom.text --rename-section .literal=.irom.literal  $@

@gfwilliams
Copy link
Member

No idea at all - perhaps by having two \0 at the end (one is inserted by the compiler automatically) it decides that it is no longer dealing with a string.

I wouldn't be happy adding \0 to literally every string in Espruino to take advantage of this quirk though :)

@MaBecker
Copy link
Contributor

MaBecker commented Jun 5, 2019

will use objdump and --rename-section to do the job ;-)

@MaBecker
Copy link
Contributor

Replacing sprintf with espruino_snprintf and added a espruino_snprintf_flash version so all fmtwill go to .irom.literal

added this to jsutils.h

#ifndef USE_FLASH_MEMORY
  int espruino_snprintf( char * s, size_t n, const char * fmt, ... );
#else
  #define espruino_snprintf(s ,n, fmt, ...) (__extension__({ \
    FLASH_STR(flash_str, fmt); \
    int __i = 0; \
    __i = espruino_snprintf_flash(s, n, flash_str, ##__VA_ARGS__); \
    __i; }))
  int espruino_snprintf_flash( char * s, size_t n, const char * fmt, ... );
#endif

and this to jsutils.c

#ifndef USE_FLASH_MEMORY
/// a snprintf replacement so mbedtls doesn't try and pull in the whole stdlib to cat two strings together
int espruino_snprintf( char * s, size_t n, const char * fmt, ... ) {
  espruino_snprintf_data d;
  d.outPtr = s;
  d.idx = 0;
  d.len = n;

  va_list argp;
  va_start(argp, fmt);
  vcbprintf(espruino_snprintf_cb,&d, fmt, argp);
  va_end(argp);

  if (d.idx < d.len) d.outPtr[d.idx] = 0;
  else d.outPtr[d.len-1] = 0;

  return (int)d.idx;
}
#else
int espruino_snprintf_flash( char * s, size_t n, const char * ffmt, ... ) {
  espruino_snprintf_data d;
  d.outPtr = s;
  d.idx = 0;
  d.len = n;

  size_t len = flash_strlen(ffmt);
  char fmt[len+1];
  flash_strncpy(fmt, ffmt, len+1);

  va_list argp;
  va_start(argp, fmt);
  vcbprintf(espruino_snprintf_cb,&d, fmt, argp);
  va_end(argp);

  if (d.idx < d.len) d.outPtr[d.idx] = 0;
  else d.outPtr[d.len-1] = 0;

  return (int)d.idx;
}
#endif

sample code before

os_sprintf(buff, "0x%02lx 0x%04lx", (unsigned int) (fid & 0xff), (lunsigned int) chip);

and after:

espruino_snprintf(buff, sizeof(buff),"0x%02x 0x%04x", (long unsigned int) (fid & 0xff), (long unsigned int) chip);

Will create a PR for this after replacing all os_sprintf and tests are successfully.

@wilberforce might this be helpful for ESP32 as well?

@MaBecker
Copy link
Contributor

got compile errors for libs/network/esp8266/ota.c

len = espruino_snprintf(buf, sizeof(buf), responseFmt, code, status, os_strlen(text), text);

CC libs/network/esp8266/ota.o
libs/network/esp8266/ota.c: In function 'sendResponse':
libs/network/esp8266/ota.c:76:3: error: invalid initializer
   len = espruino_snprintf(buf, sizeof(buf), responseFmt, code, status, os_strlen(text), text);
   ^
make: *** [libs/network/esp8266/ota.o] Error 1

static void sendResponse(OtaConn *oc, uint16_t code, char *text) {
char *status = code < 400 ? "OK" : "ERROR"; // hacky but it works
// allocate buffer to print the response into
uint16_t len = os_strlen(responseFmt)+os_strlen(status)+os_strlen(text);
char buf[len];
// print the response and send it
len = os_sprintf(buf, responseFmt, code, status, os_strlen(text), text);
if (code < 400) os_printf("OTA: %d %s\n", code, status);
else os_printf("OTA: %d %s <<%s>>\n", code, status, text);
int8_t err;
if ((err=espconn_send(oc->conn, (uint8_t*)buf, os_strlen(buf))) != 0) {
os_printf("OTA: send failed err=%d\n", err);
abortConn(oc);
}
}

@gfwilliams
Copy link
Member

It's because you modified the definition of espruino_snprintf so that you can now only put constant strings directly into the 3rd argument, but then you're passing in a variable.

I'd leave the definition of espruino_snprintf alone, but just whenever you use it with a constant string, do:

espruino_snprintf(buff, sizeof(buff),FLASH_STR("0x%02x 0x%04x"), (unsigned int) (fid & 0xff), (unsigned int) chip);

Also I should add that I emailed you that example line with (long unsigned int) changed to (unsigned int) - I'd really keep it that way rather than adding the long in there, since the long was used with the l modifier which Espruino's snprintf doesn't support.

@MaBecker
Copy link
Contributor

HI @wilberforce,

Do you know why we use different sections?

ifdef USE_CRYPTO
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha1.o
ifdef USE_SHA256
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha256.o
endif
ifdef USE_SHA512
$(Q)$(OBJCOPY) --rename-section .rodata=.irom0.text libs/crypto/mbedtls/library/sha512.o
endif
endif

Espruino/src/jsutils.h

Lines 53 to 54 in 8744671

// For the esp8266 we need the posibility to store arrays in flash, because mem is so small
#define IN_FLASH_MEMORY __attribute__((section(".irom.literal"))) __attribute__((aligned(4)))

In the test I switched to .irom0.text for both.

@wilberforce
Copy link
Member

.irom.literal vs .irom0.text ?

I don't know why - but "if it an't broke why fix it" ? You would ned to look in the .ld linking script to see what the different is - perhaps the order of how they go into flash one will be before the other. What the real world impact would be - I don't know.

@MaBecker
Copy link
Contributor

ESP8266: replace os_sprintf with espruino_snprintf and flash space
ESP8266: optimise linker

round about 2200 bytes


>require('ESP8266').getState();
={
  sdkVersion: "2.2.1(6ab97e9)",
  cpuFrequency: 160, freeHeap: 12008, maxCon: 10,
  flashMap: "4MB:1024/1024",
  flashKB: 4096,
  flashChip: "0xef 0x4016"
 }

to

>require('ESP8266').getState();
={
  sdkVersion: "2.2.1(6ab97e9)",
  cpuFrequency: 160, freeHeap: 14280, maxCon: 10,
  flashMap: "4MB:1024/1024",
  flashKB: 4096,
  flashChip: "0xef 0x4016"
 }

@wilberforce
Copy link
Member

That's a nice gain

@MaBecker
Copy link
Contributor

MaBecker commented Jun 24, 2019

#1677 replace sprintf() with more stable espruino_snprintf()

not implemented, to much changes to jsutils with to much side effects for other boards

@MaBecker
Copy link
Contributor

#1679 add one more section to flash with ld option --rename, only ESP8266_4MB

@MaBecker
Copy link
Contributor

MaBecker commented Jun 8, 2022

close it for now and reopen if someone likes to add more optimizations.

@MaBecker MaBecker closed this as completed Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ESP8266 This is only a problem on ESP8266 devices optimisation
Projects
None yet
Development

No branches or pull requests

6 participants