Skip to content

Introduction to code translating

Rajko Horvat edited this page Jan 30, 2024 · 23 revisions

Introduction to translating assembly code to C# counterpart

Why haven't you used some sort or decompiler of the shelf?

The decompilers for 16bit DOS code are virtually nonexistent. There is a lot of problems with segment:offset pairs. Also the code itself uses a lot of assembly combined with C code. The effort of producing such a decompiler would be too long, tedious and would probably produce a code which isn't runnable, but requires a lot of modifications to became usable.
With the approach I took 'Assembly code directly decompiled to C#' I got a runnable code which works pretty quickly. Of course the effort of translating it to native C# is also simpler because you can immediately introduce function parameters, fix errors and test your work.

Introduction of common objects and assumptions used in assembly code decompiled directly to C#

  • The 'CPU' object located in 'CPU.cs' (most commonly referred as 'this.oCPU' or 'this.oParent.CPU' in code), with additional files: 'CPURegister.cs', 'CPUFlags.cs' and 'CPUMemory.cs' files encapsulates Virtual CPU state: registers, flags, assembly instructions, memory interface and I/O instructions.
  • The details about CPU instruction(s) and flags can be found in book INTEL 80386 PROGRAMMER'S REFERENCE MANUAL.
  • The good documentation about CPU interrupts, reserved memory locations and hardware ports can be found in HelpPC Reference Library
  • The 'OpenCiv1' object located in 'OpenCiv1.cs' encapsulates CIV graphic driver, code segments, overlays, shared variables and Virtual CPU state. The Segments and overlays communicate with each other through shared 'OpenCiv1' object (most commonly referred as 'this.oParent' in code).
  • The graphic driver is composed of two parts 'VGADriver.cs' and 'VGABitmap.cs'. The code directly communicates with VGADriver (most commonly referred as 'this.oParent.VGADriver' in code) object and then VGADriver implements in conjunction with VGABitmap the bitmap, sprites, palettes and screens game uses.
  • The object 'MSCAPI' encapsulates all C compiler library functions which will eventually be replaced by C# counterparts and they don't need to be translated, except for functionality sake.
  • Code starts in 'Segment_11a8.F0_11a8_0008_Main()' which is really a standard C 'int main(int argc, char **argv, char **envp)' function.
  • The function names are based on overlay, segment and offset (useful if you want to compare direct disassembly, from IDA for example) like so: $"F{Overlay}_{Segment}_{Offset}". The loadable overlays like mgraphic.exe, misc.exe, nsound.cvl... have overlay and segment number 0.
  • The MSC linker medium memory model shares stack space with data space, so basically DS=SS, also all pointers to data variables and stack are in 64K window and DS segment for data, and SS for stack is assumed. Sometimes the compiler uses CS segment to access data variables from assembly space.
  • The most common integral values MSC compiler uses are (the difference between signed and unsigned values is sometimes hard to distinguish between, except in context with connected code):
    • char (int8),
    • unsigned char (uint8),
    • int (int16),
    • unsigned int (uint16),
    • large (int32),
    • unsigned large (uint32).
  • Functions return value as Int16 or UInt16 in register AX, or as Int32 or UInt32 in register pair DX:AX. The returned value can be a simple value, or a pointer (UInt16 value) to some more complex data. The C functions doesn't return flags (in contrast with CPU DOS or BIOS interrupt functions), only 16bit (AX), or 32bit (DX:AX) value.
  • The registers (in this model) are 16bit, but their 8bit high and low values can be accessed separately. You can view registers as local variables, except segment registers ES, DS, SS, CS which are used when accessing memory locations.
  • The assembly code uses CDECL notation which means that the parameters are pushed to stack from last to first (not first to last!), also, the caller should remove parameters from the stack.

Translating a simple 'void *memcpy(void *dest, const void *src, int n)' function

public void memcpy()
{
    this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");
    this.oCPU.CS.Word = 0x3045; // set this function segment, we don't need this if this function does not use register CS for static variables

    // function body
    this.oCPU.PushWord(this.oCPU.BP.Word); // compiler generated
    this.oCPU.BP.Word = this.oCPU.SP.Word; // compiler generated
    this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4); // compiler generated, reserve space for local stack values
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2), this.oCPU.DI.Word); // store register DI on a local stack
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), this.oCPU.SI.Word); // store register SI on a local stack
    this.oCPU.AX.Word = this.oCPU.DS.Word; // register AX = DS
    this.oCPU.ES.Word = this.oCPU.AX.Word; // register ES = AX
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x8)); // read second parameter and store it to SI register
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6)); // read first parameter and store it to DI register
    this.oCPU.AX.Word = this.oCPU.DI.Word; // register AX = DS
    this.oCPU.CX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0xa)); // read third parameter, and store it to CX register
    this.oCPU.CMPWord(this.oCPU.CX.Word, 0x0); // Compare two values (analog of value1 - value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a2e; // goto label L2a2e if result is equal or zero
    this.oCPU.TESTByte(this.oCPU.AX.Low, 0x1); // Test two values (analog of value1 & value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a26; // goto label L2a26 if result is equal or zero
    this.oCPU.MOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI); // move byte at memory address DS:SI to ES:DI (DS/ES are segments, and SI, DI are offsets), and increment registers SI and DI
    this.oCPU.CX.Word = this.oCPU.DECWord(this.oCPU.CX.Word); // decrement CX register value and set appropriate flags
 
L2a26:
    this.oCPU.CX.Word = this.oCPU.SHRWord(this.oCPU.CX.Word, 0x1); // shift right CX register value by one (CX = CX << 1)
    this.oCPU.REPEMOVSWord(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move word at memory address DS:SI to ES:DI, increment registers SI and DI by 2, decrement register CX, and repeat if CX != 0
    this.oCPU.CX.Word = this.oCPU.ADCWord(this.oCPU.CX.Word, this.oCPU.CX.Word); // if previous operation couldn't move word, and the byte remained add 1 to register CX
    this.oCPU.REPEMOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move byte at memory address DS:SI to ES:DI, increment registers SI and DI by 1, decrement register CX, and repeat if CX != 0
 
L2a2e:
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4)); // restore value of register SI from local stack
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2)); // restore value of register DI from local stack
    this.oCPU.SP.Word = this.oCPU.BP.Word; // compiler generated, restore previous SP value (if local stack variables are present)
    this.oCPU.BP.Word = this.oCPU.PopWord(); // compiler generated
 
    // Far return
    this.oCPU.Log.ExitBlock("'memcpy'");
}

The code:

this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");

and then (at the end of function)

// Far return
this.oCPU.Log.ExitBlock("'memcpy'");

Is just my decompiler inserting some debug code which goes to log file and which is not important for final code, but very useful for debugging.

The MSC C compiler always uses the following code at the beginning of function:

this.oCPU.PushWord(this.oCPU.BP.Word);
this.oCPU.BP.Word = this.oCPU.SP.Word;

and then (at the end of function)

this.oCPU.BP.Word = this.oCPU.PopWord();

So, this is not an important information for final code. Sometimes compiler pushes SI and/or DI registers, and/or segment registers DS and/or ES on the stack which are important, if function changes them, and should be preserved. Like so:

this.oCPU.PushWord(this.oCPU.SI.Word);

and then (at the end of function)

this.oCPU.SI.Word = this.oCPU.PopWord();

Again, this is informative, but not something too important for a final code.

How is stack space reserved for local variables

At the beginning of the function:

this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4);

and then (at the end of function)

this.oCPU.SP.Word = this.oCPU.BP.Word;

Which in turn reserves 4 bytes on a stack for local variables. The stack is always aligned to Words (2 bytes).

To access them, and store value to AX register the code does:

this.oCPU.AX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4));

To write value (or a register value) to a local stack variable the code does this:

this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), 0x1234);

Accessing the function parameters

this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6));

Which is the first parameter, to access a second add 8 (instead of 6) to BP register and so on.

How stack space looks like in a common function?

Our function stack space looks as follows (segment register is always SS, but sometimes SS and DS are interchanged by compiler, remember, since SS=DS in medium memory model):
...and other local variables (always with negative value)
BP - 4 = local variable 2
BP - 2 = local variable 1
BP + 0 = the value of BP register compiler pushed at the beginning of function
BP + 2 = return IP (offset)
BP + 4 = return CS (segment)
BP + 6 = first parameter
BP + 8 = second parameter
And so on... (always a positive value starting from 6)

Which CPU registers get preserved between function calls

When a function is called the following register rules are followed:

  • register AX, BX, CX, DX and segment register ES values are not preserved/restored,
  • register SI, DI, BP and SP, along with segment registers DS and SS values are preserved/restored,
  • the function(s) can use register(s) AX (for 16 bit values) or AX:DX (for 32 bit values) as a return value.

Function memcpy after translation

public ushort memcpy(ushort destination, ushort source, ushort n)
{
    this.oCPU.Log.EnterBlock($"memcpy(0x{destination:x4}, 0x{source:x4}, {n})");

    // function body
    uint uiDestination = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, destination);
    uint uiSource = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, source);
    int iCount = n;

    while (iCount > 0)
    {
        this.oCPU.Memory.WriteByte(uiDestination, this.oCPU.Memory.ReadByte(uiSource)); // we will not transfer words first and then bytes as this complicates things too much
        uiDestination++;
        uiSource++;
        iCount--;
    }

    this.oCPU.Log.ExitBlock("memcpy");
    return destination;
}

Calling a function from code

To call our 'memcpy' function the code was (before our transformation, remember, the parameters are pushed to stack from last to first):

    this.oCPU.AX.Word = 0x10; // byte count to copy
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xa343; // our source pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xdb44; // our destination pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.PushWord(this.oCPU.CS.Word); // decompiler generated, stack management - push return segment
    this.oCPU.PushWord(0x11a0); // decompiler generated, stack management - push return offset
    // Instruction address 0x0000:0x119b, size: 5
    this.oParent.MSCAPI.memcpy(); // call our function
    this.oCPU.PopDWord(); // decompiler generated, stack management - pop return offset and segment
    this.oCPU.CS.Word = this.usSegment; // decompiler generated, restore this function segment
    this.oCPU.SP.Word = this.oCPU.ADDWord(this.oCPU.SP.Word, 0x6); // compiler generated, remove our parameters from the stack

And with transformed code, the call to our function simply looks like:

    this.oParent.MSCAPI.memcpy(0xdb44, 0xa343, 0x10); // call our function


How to access global variables

    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.DS.Word, 0x2); // transfer word from memory location 2 to a SI register

Of course a more complex forms are allowed, for example:

    this.oCPU.AX.Word = this.oCPU.ReadWord(this.oCPU.DS.Word, (ushort)(this.oCPU.BX.Word + 0x2)); // transfer word from memory location indexed by BX register to AX register
    this.oCPU.BX.Word = this.oCPU.AddWord(this.oCPU.BX.Word, 0x2); // add 2 to indexing register BX to read a next value in array


How to store global variables

    this.oCPU.WriteWord(this.oCPU.DS.Word, 0x2, this.oCPU.SI.Word); // store value of SI register (word) to a memory location 2

Of course a more complex forms are allowed, for example:

    this.oCPU.WriteByte(this.oCPU.DS.Word, (ushort)(this.oCPU.SI.Word + 0x10), this.oCPU.AX.Low); // store byte value of register AX.Low to a memory location indexed by 0x10 + SI
    this.oCPU.SI.Word++; // increment register SI to store a next value to an array


The goal is to replace all global variables with C# counterparts which are shared in common 'OpenCiv1' object.

Converting signed word (short) value to Absolute value (Math.Abs())

    this.oCPU.AX.Word = 0xffff; // -1
    this.oCPU.CWD(this.oCPU.AX, this.oCPU.DX);
    this.oCPU.AX.Word = this.oCPU.XORWord(this.oCPU.AX.Word, this.oCPU.DX.Word);
    this.oCPU.AX.Word = this.oCPU.SUBWord(this.oCPU.AX.Word, this.oCPU.DX.Word);
    // this.oCPU.AX.Word now contains 0x1

This simple assembly code converts all negative short values to positive ones.
For example, all values in range [-1, -32767] are converted to [1, 32767] (the only exception is -32768 [0x8000] which stays the same as it has no positive counterpart).

The helper methods in 'CPU' object

  • DWordToWords(CPURegister regLow, CPURegister regHigh, uint value) - the regLow will be assigned with low word of value, and regHigh will be assigned with high word
  • uint WordsToDWord(ushort lowValue, ushort highValue) - the regLow and regHigh words will be combined to dword value
  • string ReadString(uint address) - Returns the string located at linear memory address and terminated with 0
  • string ReadDosString(uint address) - Returns the string located at linear memory address and terminated with '$'
  • WriteString(uint address, string text, int maxLength) - Writes a string which length can't be more than maximum length at linear memory address
  • uint ToLinearAddress(ushort segment, ushort offset) - Converts the segment and offset location to an absolute memory address