Skip to content

Introduction to code translating

Rajko Horvat edited this page Jul 21, 2023 · 23 revisions

Introduction to translating assembly code to C# counterpart

For start let's introduce few common objects and assumptions used in assembly code decompiled directly to C#:

  • The 'CPU' object located in 'CPU.cs' (most commonly referred as 'this.oCPU' or 'this.oParent.CPU' in code), with additional files: 'CPURegister.cs', 'CPUFlags.cs' and 'CPUMemory.cs' files encapsulates virtual CPU state: registers, flags, assembly instructions, memory interface and I/O instructions.
  • The details about CPU instruction(s) and flags can be found in book '[[https://css.csail.mit.edu/6.858/2019/readings/i386.pdf][INTEL 80386 PROGRAMMER'S REFERENCE MANUAL]]'.
  • The 'Civilization' object located in 'Civilization.cs' encapsulates CIV graphic driver, code segments, overlays, shared variables and CPU state. The Segments and overlays communicate with each other through shared 'Civilization' object (most commonly referred as 'this.oParent' in code).
  • The graphic driver is composed of three parts 'VGADriver.cs', 'VGABitmap.cs' and ' VGAForm.cs'. The code directly communicates with VGADriver (most commonly referred as 'this.oParent.VGADriver' in code) object and then VGADriver implements in conjunction with VGABitmap the bitmap, sprites, palettes and screens game uses.
  • The object 'MSCAPI' encapsulates all C compiler library functions which will eventually be replaced by C# counterparts and they don't need to be translated, except for functionality sake.
  • Code starts in a Segment_3045.Start() method, it's the MSC C compiler method which will eventually be completely replaced and does not need translating.
  • The real entry point is in 'Segment_11a8.F0_11a8_0008_Main()' which is really a simple C 'int main(int argc, char **argv, char **envp)' function.
  • The Segment(s) have their names based on segment address (useful if you want to compare direct disassembly, from IDA for example).
  • The function names are based on overlay, segment and offset (again, useful if you want to compare disassembly) like so: $"F{Overlay}{Segment}{Offset}". The loadable overlays like mgraphic.exe, misc.exe, nsound.cvl... have overlay number 0.
  • The MSC linker medium memory model shares stack space with data space, so basically DS=SS, also all pointers to data variables and stack are in 64K window and DS segment for data, and SS for stack is assumed. Sometimes the compiler uses CS segment to access data variables from assembly space.
  • Variables are returned from function as uint16 (includes int16) in register AX, or as uint32 (includes int32) in register pair DX:AX.
  • The assembly code uses CDECL notation which means that the parameters are pushed to stack from last to first (not first to last!), also, the caller should remove parameters from the stack.

Let's translate a simple 'void *memcpy(void *dest, const void *src, int n)' function:

public void memcpy()
{
    this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");
    this.oCPU.CS.Word = 0x3045; // set this function segment, we don't need this if this function does not use register CS for static variables

    // function body
    this.oCPU.PushWord(this.oCPU.BP.Word); // compiler generated
    this.oCPU.BP.Word = this.oCPU.SP.Word; // compiler generated
    this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4); // compiler generated, reserve space for local stack values
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2), this.oCPU.DI.Word); // store register DI on a local stack
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), this.oCPU.SI.Word); // store register SI on a local stack
    this.oCPU.AX.Word = this.oCPU.DS.Word; // register AX = DS
    this.oCPU.ES.Word = this.oCPU.AX.Word; // register ES = AX
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x8)); // read second parameter and store it to SI register
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6)); // read first parameter and store it to DI register
    this.oCPU.AX.Word = this.oCPU.DI.Word; // register AX = DS
    this.oCPU.CX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0xa)); // read third parameter, and store it to CX register
    this.oCPU.CMPWord(this.oCPU.CX.Word, 0x0); // Compare two values (analog of value1 - value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a2e; // goto label L2a2e if result is equal or zero
    this.oCPU.TESTByte(this.oCPU.AX.Low, 0x1); // Test two values (analog of value1 & value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a26; // goto label L2a26 if result is equal or zero
    this.oCPU.MOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI); // move byte at memory address DS:SI to ES:DI (DS/ES are segments, and SI, DI are offsets), and increment registers SI and DI
    this.oCPU.CX.Word = this.oCPU.DECWord(this.oCPU.CX.Word); // decrement CX register value and set appropriate flags
 
L2a26:
    this.oCPU.CX.Word = this.oCPU.SHRWord(this.oCPU.CX.Word, 0x1); // shift right CX register value by one (CX = CX << 1)
    this.oCPU.REPEMOVSWord(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move word at memory address DS:SI to ES:DI, increment registers SI and DI by 2, decrement register CX, and repeat if CX != 0
    this.oCPU.CX.Word = this.oCPU.ADCWord(this.oCPU.CX.Word, this.oCPU.CX.Word); // if previous operation couldn't move word, and the byte remained add 1 to register CX
    this.oCPU.REPEMOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move byte at memory address DS:SI to ES:DI, increment registers SI and DI by 1, decrement register CX, and repeat if CX != 0
 
L2a2e:
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4)); // restore value of register SI from local stack
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2)); // restore value of register DI from local stack
    this.oCPU.SP.Word = this.oCPU.BP.Word; // compiler generated, restore previous SP value (if local stack variables are present)
    this.oCPU.BP.Word = this.oCPU.PopWord(); // compiler generated
 
    // Far return
    this.oCPU.Log.ExitBlock("'memcpy'");
}

The code:

this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");

and then (at the end of function)

// Far return
this.oCPU.Log.ExitBlock("'memcpy'");

Is just my decompiler inserting some debug code which goes to log file and which is not important for final code, but very useful for debugging.

The MSC C compiler always uses the following code at the beginning of function:

this.oCPU.PushWord(this.oCPU.BP.Word);
this.oCPU.BP.Word = this.oCPU.SP.Word;

and then (at the end of function)

this.oCPU.BP.Word = this.oCPU.PopWord();

So, this is not an important information for final code. Sometimes compiler pushes SI and/or DI registers, and/or segment registers DS and/or ES on the stack which are important, if function changes them, and should be preserved. Like so:

this.oCPU.PushWord(this.oCPU.SI.Word);

and then (at the end of function)

this.oCPU.SI.Word = this.oCPU.PopWord();

Again, this is informative, but not something too important for a final code.

How is stack space reserved for local variables:

this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4);

and then (at the end of function)

this.oCPU.SP.Word = this.oCPU.BP.Word;

Which in turn reserves 4 bytes on a stack for local variables. The stack is always aligned to Words (2 bytes).

To access them, and store value to AX register the code does:

this.oCPU.AX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4));

To write value (or a register value) to a local stack variable the code does this:

this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), 0x1234);

To access the function parameters the code does this:

this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6));

Which is the first parameter, to access a second add 8 (instead of 6) to BP register and so on.

So our stack space looks as follows (segment register is always SS, but sometimes SS and DS are interchanged by compiler, remember, since SS=DS in medium memory model):
...and other local variables (always with negative value)
BP - 4 = local variable 2
BP - 2 = local variable 1
BP + 0 = return IP (offset)
BP + 2 = return CS (segment)
BP + 4 = the value of BP register compiler pushed at the beginning of function
BP + 6 = first parameter
BP + 8 = second parameter
And so on... (always a positive value starting from 6)

So, we can surmise our function to this:

public ushort memcpy(ushort destination, ushort source, ushort n)
{
    this.oCPU.Log.EnterBlock($"memcpy(0x{destination:x4}, 0x{source:x4}, {n})");

    // function body
    uint uiDestination = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, destination);
    uint uiSource = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, source);
    int iCount = n;

    while (iCount > 0)
    {
        this.oCPU.Memory.WriteByte(uiDestination, this.oCPU.Memory.ReadByte(uiSource)); // we will not transfer words first and then bytes as this complicates things too much
        uiDestination++;
        uiSource++;
        iCount--;
    }

    this.oCPU.Log.ExitBlock("memcpy");
    return destination;
}

To call our 'memcpy' function the code was (before our transformation, remember, the parameters are pushed to stack from last to first):

    this.oCPU.AX.Word = 0x10; // byte count to copy
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xa343; // our source pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xdb44; // our destination pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.PushWord(this.oCPU.CS.Word); // decompiler generated, stack management - push return segment
    this.oCPU.PushWord(0x11a0); // decompiler generated, stack management - push return offset
    // Instruction address 0x0000:0x119b, size: 5
    this.oParent.MSCAPI.memcpy(); // call our function
    this.oCPU.PopDWord(); // decompiler generated, stack management - pop return offset and segment
    this.oCPU.CS.Word = this.usSegment; // decompiler generated, restore this function segment
    this.oCPU.SP.Word = this.oCPU.ADDWord(this.oCPU.SP.Word, 0x6); // decompiler generated, remove our parameters from the stack

And with transformed code, the call to our function simply looks like:

    this.oParent.MSCAPI.memcpy(0xdb44, 0xa343, 0x10); // call our function


How to access global variables:

    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.DS.Word, 0x2); // transfer word from memory location 2 to a SI register


How to store global variables:

    this.oCPU.WriteWord(this.oCPU.DS.Word, 0x2, this.oCPU.SI.Word); // store value of SI register (word) to a memory location 2

Off course a more complex forms are allowed, for example:

    this.oCPU.WriteByte(this.oCPU.DS.Word, (ushort)(this.oCPU.SI.Word + 0x10), this.oCPU.AX.Low); // store byte value of register AX.Low to a memory location indexed by 0x10 + SI
    this.oCPU.SI.Word++; // increment register SI to store a next value to an array


The goal is to replace all global variables with C# counterparts which are shared in common 'Civilization' object.

The helper methods in 'CPU' object:

  • DWordToWords(CPURegister regLow, CPURegister regHigh, uint value) - the regLow will be assigned with low word of value, and regHigh will be assigned with high word
  • uint WordsToDWord(ushort lowValue, ushort highValue) - the regLow and regHigh words will be combined to dword value
  • string ReadString(uint address) - Returns the string located at linear memory address and terminated with 0
  • string ReadDosString(uint address) - Returns the string located at linear memory address and terminated with '$'
  • WriteString(uint address, string text, int maxLength) - Writes a string which length can't be more than maximum length at linear memory address

The helper method in 'CPUMemory' object:

  • uint ToLinearAddress(ushort segment, ushort offset) - Converts the segment and offset location to an absolute memory address
Clone this wiki locally