Skip to content

Reducing Code Size

Tom Sherman edited this page Mar 7, 2017 · 30 revisions

The main limiting factor for writing SSBM mods is the amount of memory available to inject your code into. If you ever have seen the error: can't find allocation of code with given memory regions you know how much it sucks to run out of memory.

Determining Minimum Size of the Heap

In order to use malloc, calloc, realloc it is necessary to call initHeap located in system.h. This is typically done a single time at the beginning of your program. A typical example would be:

static char heap[5000];
static bool init_run = false;

void init()
{
    initHeap(heap, heap + sizeof(heap));
    ...
}

void _main()
{
    if (!init_run)
    {
        init();
        init_run = true;
    }

    ...
}

This would create a 5,000 byte heap for the rest of the program to use. Any call to malloc, calloc, realloc will allocate memory somewhere in the memory region that heap[5000] occupies. If initHeap is called again, the new heap will be used and the previous one will be essentially forgotten about.

When the heap is initialized this way, the heap is contained in one of the memory regions supplied in the config file for wiimake. This means that an excessively large heap will limit how much code you can inject. On the other hand, a heap that is too small will result in many failed calls to malloc, calloc, realloc. (Note: in order to write stable code you should always handle the possibility of memory allocation failing). Thus, it is extremely important to choose the 'right' size of your heap. Here are some steps you can take to determine how large the heap should be.

First, you must account for the memory that is allocated internally in the MeleeModdingLibray.

print() located in print.h - The stream that print writes to is allocated on the heap. Memory is not allocated until print() is called (note: calling error() does not cause memory to be allocated). Each line of display takes up 96 bytes. The maximum number of lines is capped at 27 or until the stream takes up 1/5 of the heap size. Thus, 96 * num_lines * 5 = size_of_heap. This means, in order to have all 27 lines available to display you need a heap of size: 96 * 27 * 5 = 12960. Note that this does not take into account the fragmentation that occurs when malloc and free are frequently called. It is important to take this into account and make the heap larger than necessary. More exact info will be provided when testing is done, but for now assume that you will a heap twice as large as the total memory allocated. This consideration is naturally built in the way print allocates memory. By limiting its size to 1/5 of the total heap, it is unlikely to have invalid allocations.

addLogic, addMove located in AI.h - The logic and inputs stored in an AI struct are kept on the heap. This takes up a considerably smaller portion of the heap than print() but still needs to be accounted for. Each Logic struct has a size of 24 bytes. The size of the array holding the AI's logic increases as follows: 1, 3, 7, 15, 31, .... So if your AI has to hold 17 logic rules at one time, it will occupy 31 * 24 = 744 bytes on the heap. Each ControllerInput struct has a size of 8 bytes and the array will be at most twice as long as the longest move. Thus, if the largest move used has 8 distinct inputs, the input array will have size 16 * 8 = 128 bytes at most. Both of these size estimates are more than large enough for most programs. The total recommended size then is 2 * (744 + 128) = 1744.

If your code never calls print then 1786 bytes is the recommended minimum size of your heap. If print is called you will need a bigger heap (unless you use less than 5 lines), but remember 4/5 of the heap is unused by print and that should be enough for the AI in most cases.

After accounting for the memory allocation in the library, you should do similar calculations for allocation that you do. Remember, if making the heap larger doesn't prevent your code from injecting then it can only help to make your code more stable.

One possible way to avoid this whole problem is to shrink Melee's heap and use that space, for more information see the next section titled "Shrinking Melee's Heap".

Shrinking Melee's Heap

This is a strategy that should only be used as a last resort, since it is not guaranteed to be stable. Rather than using up precious space in the available memory regions, you can create the heap in a region of the memory that melee uses for it's heap. Currently, we are not able to use the same heap as melee without causing errors, but we can shrink the size of melee's heap and use the leftover space for our heap. Here's how it's done:

void boot(void)
{
    limitGameMemory((void*) 0x81780000);
}

limitGameMemory will set the top address of melee's heap to 0x81780000 instead of 0x817f8ab0. This gives us 494,256 bytes of free memory we can use for the heap. It is not recommended to use any more than this. In fact, use the smallest possible size of your heap when using this method.

boot must be run at the startup of the game, so use this as the FIXED_SYMBOL value in your config file:

boot 80005358 4e800020

When it comes time to initialize your heap:

void init(void)
{
    initHeap((void*) 0x81780000, (void*) 0x817f8ab0);
    ...
}

Compiler Optimization

Perhaps the easiest way to shrink your code size is enabling compiler optimization -O0, -O1, -O2, -O3, -Os. Simply add one of these flags to CFLAGS in your *.ini file used with wiimake. In addition to optimizing your own code, you can link against optimized versions of libmml. The MeleeModdingLibrary distribution comes with different libraries for different levels of optimization. For example, libmml_O1.a means the library was compiled with the -O1 flag. This cuts the size down significantly, -O1 reduces the library size by almost 20%. The only potential issue with optimization is less stable code. If your code is not stable when compiling with optimization it can be tricky to track down the source of the error. If the library is not stable when linking against optimized versions, please report the error (Reporting Errors). Sometimes compiling with optimization will throw compile-time errors. These can be surprisingly tricky to diagnose, but are definitely a smaller problem than runtime errors.

Writing Fragmented Code

On one hand, unnecessary functions add code since its takes a few lines of assembly to set up a function call. On the other hand, it is much easier to inject many small pieces of code than a single large piece. Every function must be injected into a continuous region of memory. Two smaller functions can be injected into separate regions of memory. The two smaller functions will take more total size, but may be easier to inject. So how do you know when to make this trade off? If your code is failing to inject but the total size of your code is smaller than the total size of your memory regions, consider finding ways to fragment your code more. This is done by breaking up functions and large variables (arrays declared in a global scope).

Data is Smaller than Code

This is a general principle that applies is most cases. Let's start with an example from SimpleProgram in the tutorial.

static Logic respawnLogic = 
{
    {&actionStateEq, .arg1.u = 2, .arg2.u = _AS_RebirthWait},
    {&addMove, .arg1.p = &cpuPlayer, .arg2.p = &_mv_shortHop}
};

void main()
{
    ...
}

This is how respawnLogic is declared. The size of a Logic struct is 24 bytes, and since respawnLogic is created at compile time it takes up exactly 24 bytes. What if we changed it too:

void main()
{
    Logic respawnLogic = 
    {
        {&actionStateEq, .arg1.u = 2, .arg2.u = _AS_RebirthWait},
        {&addMove, .arg1.p = &cpuPlayer, .arg2.p = &_mv_shortHop}
    };

    ...
}

This version is 56 bytes larger. Why? When respawnLogic is declared like this, it is not created at compile time, instead the necessary code to create this struct at run-time must be injected. This takes up more space than just the size of the Logic struct.

That being said, if your variable is particularly large and your initialization is particularly simple, it maybe be more efficient to declare the variable inside of a function (as long as it wasn't needed in the global scope). This takes some experimenting to figure out exactly, but this trade-off should always be on your mind.