Skip to content
Permalink
Browse files
RAM optimizations: pseudo RO strings, functions in Flash
This patch adds more RAM optimizations to eLua:

- direct file memory mapping: files in ROMFS will be read directly from Flash,
  without allocating any additional buffers. This doesn't help with RAM
  consumption in itself, but enables the set of optimizations below.

- pseudo read-only strings. These are still TStrings, but the actual string
  content can point directly to Flash. Original Lua strings are kept in
  TStrings structures (lobject.h):

  typedef union TString {
    L_Umaxalign dummy;  /* ensures maximum alignment for strings */
    struct {
      CommonHeader;
      lu_byte reserved;
      unsigned int hash;
      size_t len;
    } tsv;
  } TString;

  The actual string content comes right after the union TString above.
  Pseudo RO strings have the same header, but instead of having the string
  content after TString, they have a pointer that points to the actual
  string content (which should exist in a RO memory (Flash) that is directly
  accesbile from the MCU bus (like its internal Flash memory)). lua_newlstr
  detects automatically if it should create a regular string or a pseudo RO
  string by checking if the string pointer comes from the Flash region of the
  MCU. This optimization works for both precompiled (.lc) files that exist in
  ROMFS and for internal Lua strings (C code).

- functions in Flash: for precompiled (.lc) files that exist in ROMFS, the code
  of the functions and a part of the debug information will be read directly
  from Flash.

- ROMFS was changed to support files that are larger than 2**16 bytes and it
  aligns all its files to an offset which is a multiple of 4 in order to prevent
  data alignment issues with precompiled Lua code.

- the Lua bytecode dumper was changed to align all the instructions in a Lua
  function and a part of the debug information to an offset which is a multiple
  of 4. This might slightly increase the size of the precompiled Lua file.

These changes were succesfully checked against the Lua 5.1 test suite.
These changes were tested in eLua on LM3S and AVR32.
  • Loading branch information
bogdanm committed May 9, 2012
1 parent 1a5b04e commit d54659b
Show file tree
Hide file tree
Showing 27 changed files with 260 additions and 59 deletions.
@@ -51,7 +51,8 @@ typedef struct
off_t ( *p_lseek_r )( struct _reent *r, int fd, off_t off, int whence );
void* ( *p_opendir_r )( struct _reent *r, const char* name );
struct dm_dirent* ( *p_readdir_r )( struct _reent *r, void *dir );
int ( *p_closedir_r )( struct _reent *r, void* dir );
int ( *p_closedir_r )( struct _reent *r, void* dir );
const char* ( *p_getaddr_r )( struct _reent *r, int fd );
} DM_DEVICE;

// Errors
@@ -77,6 +78,7 @@ int dm_init();
DM_DIR *dm_opendir( const char* dirname );
struct dm_dirent* dm_readdir( DM_DIR *d );
int dm_closedir( DM_DIR *d );
const char* dm_getaddr( int fd );

#endif

@@ -11,7 +11,7 @@ The Read-Only "filesystem" resides in a contiguous zone of memory, with the
following structure, repeated for each file:
Filename: ASCIIZ, max length is DM_MAX_FNAME_LENGTH defined here, empty if last file
File size: (2 bytes)
File size: (4 bytes)
File data: (file size bytes)
*******************************************************************************/
@@ -31,7 +31,7 @@ typedef struct
{
u32 baseaddr;
u32 offset;
u16 size;
u32 size;
p_read_fs_byte p_read_func;
} FS;

28 mkfs.py
@@ -6,13 +6,15 @@
_crtline = ' '
_numdata = 0
_bytecnt = 0

_fcnt = 0
maxlen = 30
alignment = 4

# Line output function
def _add_data( data, outfile, moredata = True ):
global _crtline, _numdata, _bytecnt
global _crtline, _numdata, _bytecnt, _fcnt
_bytecnt = _bytecnt + 1
_fcnt = _fcnt + 1
if moredata:
_crtline = _crtline + "0x%02X, " % data
else:
@@ -41,7 +43,7 @@ def mkfs( dirname, outname, flist, mode, compcmd ):
print "Unable to create output file"
return False

global _crtline, _numdata, _bytecnt
global _crtline, _numdata, _bytecnt, _fcnt
_crtline = ' '
_numdata = 0
_bytecnt = 0
@@ -107,19 +109,29 @@ def mkfs( dirname, outname, flist, mode, compcmd ):
os.remove( newname )

# Write name, size, id, numpars
_fcnt = 0
for c in fname:
_add_data( ord( c ), outfile )
_add_data( 0, outfile ) # ASCIIZ
size_l = len( filedata ) & 0xFF
size_h = ( len( filedata ) >> 8 ) & 0xFF
_add_data( size_l, outfile )
_add_data( size_h, outfile )
size_ll = len( filedata ) & 0xFF
size_lh = ( len( filedata ) >> 8 ) & 0xFF
size_hl = ( len( filedata ) >> 16 ) & 0xFF
size_hh = ( len( filedata ) >> 24 ) & 0xFF
_add_data( size_ll, outfile )
_add_data( size_lh, outfile )
_add_data( size_hl, outfile )
_add_data( size_hh, outfile )
# Round to a multiple of 4
actual = len( filedata )
while _bytecnt & ( alignment - 1 ) != 0:
_add_data( 0, outfile )
actual = actual + 1
# Then write the rest of the file
for c in filedata:
_add_data( ord( c ), outfile )

# Report
print "Encoded file %s (%d bytes)" % ( fname, len( filedata ) )
print "Encoded file %s (%d bytes real size, %d bytes after rounding, %d bytes total)" % ( fname, len( filedata ), actual, _fcnt )

# All done, write the final "0" (terminator)
_add_data( 0, outfile, False )
@@ -460,6 +460,15 @@ LUA_API void lua_pushlstring (lua_State *L, const char *s, size_t len) {
}


LUA_API void lua_pushrolstring (lua_State *L, const char *s, size_t len) {
lua_lock(L);
luaC_checkGC(L);
setsvalue2s(L, L->top, luaS_newrolstr(L, s, len));
api_incr_top(L);
lua_unlock(L);
}


LUA_API void lua_pushstring (lua_State *L, const char *s) {
if (s == NULL)
lua_pushnil(L);
@@ -30,6 +30,9 @@
#include "lobject.h"
#include "lstate.h"
#include "legc.h"
#ifndef LUA_CROSS_COMPILER
#include "devman.h"
#endif

#define FREELIST_REF 0 /* free list of references */

@@ -577,20 +580,33 @@ typedef struct LoadF {
int extraline;
FILE *f;
char buff[LUAL_BUFFERSIZE];
const char *srcp;
size_t totsize;
} LoadF;


static const char *getF (lua_State *L, void *ud, size_t *size) {
LoadF *lf = (LoadF *)ud;
(void)L;
if (L == NULL && size == NULL) // special request: detect 'direct mode'
return lf->srcp;
if (lf->extraline) {
lf->extraline = 0;
*size = 1;
return "\n";
}
if (feof(lf->f)) return NULL;
*size = fread(lf->buff, 1, sizeof(lf->buff), lf->f);
return (*size > 0) ? lf->buff : NULL;
if (lf->srcp == NULL) { // no direct access
if (feof(lf->f)) return NULL;
*size = fread(lf->buff, 1, sizeof(lf->buff), lf->f);
return (*size > 0) ? lf->buff : NULL;
} else { // direct access, return the whole file as a single buffer
if (lf->totsize) {
*size = lf->totsize;
lf->totsize = 0;
return lf->srcp;
} else
return NULL;
}
}


@@ -607,8 +623,9 @@ LUALIB_API int luaL_loadfile (lua_State *L, const char *filename) {
LoadF lf;
int status, readstatus;
int c;
const char *srcp = NULL;
int fnameindex = lua_gettop(L) + 1; /* index of filename on the stack */
lf.extraline = 0;
lf.extraline = lf.totsize = 0;
if (filename == NULL) {
lua_pushliteral(L, "=stdin");
lf.f = stdin;
@@ -617,6 +634,14 @@ LUALIB_API int luaL_loadfile (lua_State *L, const char *filename) {
lua_pushfstring(L, "@%s", filename);
lf.f = fopen(filename, "r");
if (lf.f == NULL) return errfile(L, "open", fnameindex);
#ifndef LUA_CROSS_COMPILER
srcp = dm_getaddr(fileno(lf.f));
if (srcp) {
fseek(lf.f, 0, SEEK_END);
lf.totsize = ftell(lf.f);
fseek(lf.f, 0, SEEK_SET);
}
#endif
}
c = getc(lf.f);
if (c == '#') { /* Unix exec. file? */
@@ -632,6 +657,11 @@ LUALIB_API int luaL_loadfile (lua_State *L, const char *filename) {
lf.extraline = 0;
}
ungetc(c, lf.f);
if (srcp) {
lf.srcp = srcp + ftell(lf.f);
lf.totsize -= ftell(lf.f);
} else
lf.srcp = NULL;
status = lua_load(L, getF, &lf, lua_tostring(L, -1));
readstatus = ferror(lf.f);
if (filename) fclose(lf.f); /* close file (even in case of errors) */
@@ -653,6 +683,8 @@ typedef struct LoadS {
static const char *getS (lua_State *L, void *ud, size_t *size) {
LoadS *ls = (LoadS *)ud;
(void)L;
if (L == NULL && size == NULL) // direct mode check
return NULL;
if (ls->size == 0) return NULL;
*size = ls->size;
ls->size = 0;
@@ -298,6 +298,8 @@ static int luaB_loadfile (lua_State *L) {
*/
static const char *generic_reader (lua_State *L, void *ud, size_t *size) {
(void)ud; /* to avoid warnings */
if (L == NULL && size == NULL) // direct mode check, doesn't happen
return NULL;
luaL_checkstack(L, 2, "too many nested functions");
lua_pushvalue(L, 1); /* get function */
lua_call(L, 0, 1); /* call it */
@@ -24,6 +24,7 @@ typedef struct {
int strip;
int status;
DumpTargetInfo target;
size_t wrote;
} DumpState;

#define DumpMem(b,n,size,D) DumpBlock(b,(n)*(size),D)
@@ -35,6 +36,7 @@ static void DumpBlock(const void* b, size_t size, DumpState* D)
{
lua_unlock(D->L);
D->status=(*D->writer)(D->L,b,size,D->data);
D->wrote+=size;
lua_lock(D->L);
}
}
@@ -45,6 +47,12 @@ static void DumpChar(int y, DumpState* D)
DumpVar(x,D);
}

static void Align4(DumpState *D)
{
while(D->wrote&3)
DumpChar(0,D);
}

static void MaybeByteSwap(char *number, size_t numbersize, DumpState *D)
{
int x=1;
@@ -162,6 +170,7 @@ static void DumpCode(const Proto *f, DumpState* D)
DumpInt(f->sizecode,D);
char buf[10];
int i;
Align4(D);
for (i=0; i<f->sizecode; i++)
{
memcpy(buf,&f->code[i],sizeof(Instruction));
@@ -223,6 +232,7 @@ static void DumpDebug(const Proto* f, DumpState* D)
int i,n;
n= (D->strip) ? 0 : f->sizelineinfo;
DumpInt(n,D);
Align4(D);
for (i=0; i<n; i++)
{
DumpInt(f->lineinfo[i],D);
@@ -288,6 +298,7 @@ int luaU_dump_crosscompile (lua_State* L, const Proto* f, lua_Writer w, void* da
D.strip=strip;
D.status=0;
D.target=target;
D.wrote=0;
DumpHeader(&D);
DumpFunction(f,NULL,&D);
return D.status;
@@ -139,12 +139,14 @@ Proto *luaF_newproto (lua_State *L) {


void luaF_freeproto (lua_State *L, Proto *f) {
luaM_freearray(L, f->code, f->sizecode, Instruction);
luaM_freearray(L, f->p, f->sizep, Proto *);
luaM_freearray(L, f->k, f->sizek, TValue);
luaM_freearray(L, f->lineinfo, f->sizelineinfo, int);
luaM_freearray(L, f->locvars, f->sizelocvars, struct LocVar);
luaM_freearray(L, f->upvalues, f->sizeupvalues, TString *);
if (!proto_is_readonly(f)) {
luaM_freearray(L, f->code, f->sizecode, Instruction);
luaM_freearray(L, f->lineinfo, f->sizelineinfo, int);
}
luaM_free(L, f);
}

@@ -10,13 +10,16 @@

#include "lobject.h"

#include "lgc.h"

#define sizeCclosure(n) (cast(int, sizeof(CClosure)) + \
cast(int, sizeof(TValue)*((n)-1)))

#define sizeLclosure(n) (cast(int, sizeof(LClosure)) + \
cast(int, sizeof(TValue *)*((n)-1)))

#define proto_readonly(p) l_setbit((p)->marked, READONLYBIT)
#define proto_is_readonly(p) testbit((p)->marked, READONLYBIT)

LUAI_FUNC Proto *luaF_newproto (lua_State *L);
LUAI_FUNC Closure *luaF_newCclosure (lua_State *L, int nelems, Table *e);
@@ -312,12 +312,12 @@ static l_mem propagatemark (global_State *g) {
Proto *p = gco2p(o);
g->gray = p->gclist;
traverseproto(g, p);
return sizeof(Proto) + sizeof(Instruction) * p->sizecode +
sizeof(Proto *) * p->sizep +
return sizeof(Proto) + sizeof(Proto *) * p->sizep +
sizeof(TValue) * p->sizek +
sizeof(int) * p->sizelineinfo +
sizeof(LocVar) * p->sizelocvars +
sizeof(TString *) * p->sizeupvalues;
sizeof(TString *) * p->sizeupvalues +
(proto_is_readonly(p) ? 0 : sizeof(Instruction) * p->sizecode +
sizeof(int) * p->sizelineinfo);
}
default: lua_assert(0); return 0;
}
@@ -66,6 +66,7 @@
** bit 4 - for tables: has weak values
** bit 5 - object is fixed (should not be collected)
** bit 6 - object is "super" fixed (only the main thread)
** bit 7 - object is (partially) stored in read-only memory
*/


@@ -78,6 +79,7 @@
#define VALUEWEAKBIT 4
#define FIXEDBIT 5
#define SFIXEDBIT 6
#define READONLYBIT 7
#define WHITEBITS bit2mask(WHITE0BIT, WHITE1BIT)


@@ -21,6 +21,9 @@

#define NUM_TAGS (LAST_TAG+1)

/* mask for 'read-only' objects. must match READONLYBIT in lgc.h' */
#define READONLYMASK 128


/*
** Extra tags for non-values
@@ -364,7 +367,7 @@ typedef union TString {
} TString;


#define getstr(ts) cast(const char *, (ts) + 1)
#define getstr(ts) (((ts)->tsv.marked & READONLYMASK) ? cast(const char *, *(const char**)((ts) + 1)) : cast(const char *, (ts) + 1))
#define svalue(o) getstr(rawtsvalue(o))


@@ -84,7 +84,7 @@ static void luaR_next_helper(lua_State *L, const luaR_entry *pentries, int pos,
if (pentries[pos].key.type != LUA_TNIL) {
/* Found an entry */
if (pentries[pos].key.type == LUA_TSTRING)
setsvalue(L, key, luaS_new(L, pentries[pos].key.id.strkey))
setsvalue(L, key, luaS_newro(L, pentries[pos].key.id.strkey))
else
setnvalue(key, (lua_Number)pentries[pos].key.id.numkey)
setobj2s(L, val, &pentries[pos].value);

2 comments on commit d54659b

@jsnyder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in any response here. I'll be doing some testing of this code shortly, including with LuaRPC to make sure there shouldn't be any problems there with the code dumper etc.

So, if I understand correctly, this optimization applies to compiled Lua scripts on ROMFS, not to uncompiled code? Obviously it can't do anything for subsequent bytecode, but does it prevent copying of the raw Lua source into SRAM for compilation? It looks like you've hooked the loading functions, but I haven't traced through exactly what happens for source.

I'm a little less clear on the implications for strings otherwise. Certainly a string needs to be in FLASH for the optimization to apply. Does this generally make it so that generally, at least when using Lua API functions that existing strings in flash that get pushed over via the stack stay in flash?

Otherwise, thanks for getting this one in. These SRAM optimizations have been great :-)

@bogdanm
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Sorry for the delay in any response here. I'll be doing some testing of this code shortly, including with LuaRPC to make sure there shouldn't be any problems there with the code dumper etc.

Thanks, this is exactly where I need help from you.

So, if I understand correctly, this optimization applies to compiled Lua scripts on ROMFS, not to uncompiled code? Obviously it can't do anything for subsequent bytecode, but does it prevent copying of the raw Lua source into SRAM for compilation? It looks like you've hooked the loading functions, but I haven't traced through exactly what happens for source.

I don't think it does anything at all for uncompiled code. I'm quite sure that uncompiled code doesn't get loaded fully to RAM while compiling anyway, it's read sequentially from a small buffer instead.

I'm a little less clear on the implications for strings otherwise. Certainly a string needs to be in FLASH for the optimization to apply.

Yes.

Does this generally make it so that generally, at least when using Lua API functions that existing strings in flash that get pushed over via the stack stay in flash?

Yes. The interesting piece of code is here (src/lua/lstring.c):

TString luaS_newlstr (lua_State *L, const char *str, size_t l) {
// If the pointer is in a read-only memory and the string is at least 4 chars in length,
// create it as a read-only string instead
if(lua_is_ptr_in_ro_area(str) && l+1 > sizeof(char
*))
return luaS_newlstr_helper(L, str, l, 1);
else
return luaS_newlstr_helper(L, str, l, 0);
}

As you can see, the code automatically decides to create a string or a "rostring" based on the physical location of the pointer. If it is in Flash (and if it's smaller than the size of the pointer (otherwise you'd actually increase memory usage, not decrease it)) it will be automagically constructed as a "rostring". There is also a luaS_newrolstr that creates a rostring directly, but that's mainly for later use, when we finally get to implement these pesky loadable modules and the "is the string in Flash" test won't be good enough anymore.

Otherwise, thanks for getting this one in. These SRAM optimizations have been great :-)

You're welcome. I actually got a bit crazy on this one and I have some code that dumps the complete TString structure in Flash and reads it from there, saving quite a bit of memory in the process. It's far from being complete or tested properly (it runs life and hangman though :), but I was finally able to anchor "foreign" TStrings inside eLua's data structures.

Please sign in to comment.