Skip to content

Introduction to code translating

Rajko Horvat edited this page Jul 22, 2023 · 23 revisions

Introduction to translating assembly code to C# counterpart

Why haven't you used some sort or decompiler of the shelf?

The decompilers for 16bit DOS code are virtually nonexistent. There is a lot of problems with segment:offset pairs. Also the code itself uses a lot of assembly combined with C code. The effort of producing such a decompiler would be too long, tedious and would probably produce a code which isn't runnable, but requires a lot of modifications to became usable.
With the approach I took 'Assembly code directly decompiled to C#' I got a runnable code which works pretty quickly. Of course the effort of translating it to native C# is also simpler because you can immediately introduce function parameters, fix errors and test your work.

Introduction of common objects and assumptions used in assembly code decompiled directly to C#

  • The 'CPU' object located in 'CPU.cs' (most commonly referred as 'this.oCPU' or 'this.oParent.CPU' in code), with additional files: 'CPURegister.cs', 'CPUFlags.cs' and 'CPUMemory.cs' files encapsulates virtual CPU state: registers, flags, assembly instructions, memory interface and I/O instructions.
  • The details about CPU instruction(s) and flags can be found in book INTEL 80386 PROGRAMMER'S REFERENCE MANUAL.
  • The 'Civilization' object located in 'Civilization.cs' encapsulates CIV graphic driver, code segments, overlays, shared variables and CPU state. The Segments and overlays communicate with each other through shared 'Civilization' object (most commonly referred as 'this.oParent' in code).
  • The graphic driver is composed of three parts 'VGADriver.cs', 'VGABitmap.cs' and ' VGAForm.cs'. The code directly communicates with VGADriver (most commonly referred as 'this.oParent.VGADriver' in code) object and then VGADriver implements in conjunction with VGABitmap the bitmap, sprites, palettes and screens game uses.
  • The object 'MSCAPI' encapsulates all C compiler library functions which will eventually be replaced by C# counterparts and they don't need to be translated, except for functionality sake.
  • Code starts in a Segment_3045.Start() method, it's the MSC C compiler method which will eventually be completely replaced and does not need translating.
  • The real entry point is in 'Segment_11a8.F0_11a8_0008_Main()' which is really a simple C 'int main(int argc, char **argv, char **envp)' function.
  • The Segment(s) have their names based on segment address (useful if you want to compare direct disassembly, from IDA for example).
  • The function names are based on overlay, segment and offset (again, useful if you want to compare disassembly) like so: $"F{Overlay}_{Segment}_{Offset}". The loadable overlays like mgraphic.exe, misc.exe, nsound.cvl... have overlay and segment number 0.
  • The MSC linker medium memory model shares stack space with data space, so basically DS=SS, also all pointers to data variables and stack are in 64K window and DS segment for data, and SS for stack is assumed. Sometimes the compiler uses CS segment to access data variables from assembly space.
  • The most common integral values MSC compiler uses are (the difference between signed and unsigned values is sometimes hard to distinguish between, except in context with connected code):
    • char (int8),
    • unsigned char (uint8),
    • int (int16),
    • unsigned int (uint16),
    • large (int32),
    • unsigned large (uint32).
  • Functions return value as uint16 (includes int16) in register AX, or as uint32 (includes int32) in register pair DX:AX. The returned value can be a simple value, or a pointer to some more complex data.
  • The assembly code uses CDECL notation which means that the parameters are pushed to stack from last to first (not first to last!), also, the caller should remove parameters from the stack.

Translating a simple 'void *memcpy(void *dest, const void *src, int n)' function

public void memcpy()
{
    this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");
    this.oCPU.CS.Word = 0x3045; // set this function segment, we don't need this if this function does not use register CS for static variables

    // function body
    this.oCPU.PushWord(this.oCPU.BP.Word); // compiler generated
    this.oCPU.BP.Word = this.oCPU.SP.Word; // compiler generated
    this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4); // compiler generated, reserve space for local stack values
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2), this.oCPU.DI.Word); // store register DI on a local stack
    this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), this.oCPU.SI.Word); // store register SI on a local stack
    this.oCPU.AX.Word = this.oCPU.DS.Word; // register AX = DS
    this.oCPU.ES.Word = this.oCPU.AX.Word; // register ES = AX
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x8)); // read second parameter and store it to SI register
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6)); // read first parameter and store it to DI register
    this.oCPU.AX.Word = this.oCPU.DI.Word; // register AX = DS
    this.oCPU.CX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0xa)); // read third parameter, and store it to CX register
    this.oCPU.CMPWord(this.oCPU.CX.Word, 0x0); // Compare two values (analog of value1 - value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a2e; // goto label L2a2e if result is equal or zero
    this.oCPU.TESTByte(this.oCPU.AX.Low, 0x1); // Test two values (analog of value1 & value2) and set appropriate flags
    if (this.oCPU.Flags.E) goto L2a26; // goto label L2a26 if result is equal or zero
    this.oCPU.MOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI); // move byte at memory address DS:SI to ES:DI (DS/ES are segments, and SI, DI are offsets), and increment registers SI and DI
    this.oCPU.CX.Word = this.oCPU.DECWord(this.oCPU.CX.Word); // decrement CX register value and set appropriate flags
 
L2a26:
    this.oCPU.CX.Word = this.oCPU.SHRWord(this.oCPU.CX.Word, 0x1); // shift right CX register value by one (CX = CX << 1)
    this.oCPU.REPEMOVSWord(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move word at memory address DS:SI to ES:DI, increment registers SI and DI by 2, decrement register CX, and repeat if CX != 0
    this.oCPU.CX.Word = this.oCPU.ADCWord(this.oCPU.CX.Word, this.oCPU.CX.Word); // if previous operation couldn't move word, and the byte remained add 1 to register CX
    this.oCPU.REPEMOVSByte(this.oCPU.DS, this.oCPU.SI, this.oCPU.ES, this.oCPU.DI, this.oCPU.CX); // move byte at memory address DS:SI to ES:DI, increment registers SI and DI by 1, decrement register CX, and repeat if CX != 0
 
L2a2e:
    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4)); // restore value of register SI from local stack
    this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x2)); // restore value of register DI from local stack
    this.oCPU.SP.Word = this.oCPU.BP.Word; // compiler generated, restore previous SP value (if local stack variables are present)
    this.oCPU.BP.Word = this.oCPU.PopWord(); // compiler generated
 
    // Far return
    this.oCPU.Log.ExitBlock("'memcpy'");
}

The code:

this.oCPU.Log.EnterBlock("'memcpy'(Cdecl) at 0x3045:0x2a08");

and then (at the end of function)

// Far return
this.oCPU.Log.ExitBlock("'memcpy'");

Is just my decompiler inserting some debug code which goes to log file and which is not important for final code, but very useful for debugging.

The MSC C compiler always uses the following code at the beginning of function:

this.oCPU.PushWord(this.oCPU.BP.Word);
this.oCPU.BP.Word = this.oCPU.SP.Word;

and then (at the end of function)

this.oCPU.BP.Word = this.oCPU.PopWord();

So, this is not an important information for final code. Sometimes compiler pushes SI and/or DI registers, and/or segment registers DS and/or ES on the stack which are important, if function changes them, and should be preserved. Like so:

this.oCPU.PushWord(this.oCPU.SI.Word);

and then (at the end of function)

this.oCPU.SI.Word = this.oCPU.PopWord();

Again, this is informative, but not something too important for a final code.

How is stack space reserved for local variables

this.oCPU.SP.Word = this.oCPU.SUBWord(this.oCPU.SP.Word, 0x4);

and then (at the end of function)

this.oCPU.SP.Word = this.oCPU.BP.Word;

Which in turn reserves 4 bytes on a stack for local variables. The stack is always aligned to Words (2 bytes).

To access them, and store value to AX register the code does:

this.oCPU.AX.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4));

To write value (or a register value) to a local stack variable the code does this:

this.oCPU.WriteWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word - 0x4), 0x1234);

Accessing the function parameters

this.oCPU.DI.Word = this.oCPU.ReadWord(this.oCPU.SS.Word, (ushort)(this.oCPU.BP.Word + 0x6));

Which is the first parameter, to access a second add 8 (instead of 6) to BP register and so on.

So our stack space looks as follows (segment register is always SS, but sometimes SS and DS are interchanged by compiler, remember, since SS=DS in medium memory model):
...and other local variables (always with negative value)
BP - 4 = local variable 2
BP - 2 = local variable 1
BP + 0 = return IP (offset)
BP + 2 = return CS (segment)
BP + 4 = the value of BP register compiler pushed at the beginning of function
BP + 6 = first parameter
BP + 8 = second parameter
And so on... (always a positive value starting from 6)

Function memcpy after translation

public ushort memcpy(ushort destination, ushort source, ushort n)
{
    this.oCPU.Log.EnterBlock($"memcpy(0x{destination:x4}, 0x{source:x4}, {n})");

    // function body
    uint uiDestination = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, destination);
    uint uiSource = CPUMemory.ToLinearAddress(this.oCPU.DS.Word, source);
    int iCount = n;

    while (iCount > 0)
    {
        this.oCPU.Memory.WriteByte(uiDestination, this.oCPU.Memory.ReadByte(uiSource)); // we will not transfer words first and then bytes as this complicates things too much
        uiDestination++;
        uiSource++;
        iCount--;
    }

    this.oCPU.Log.ExitBlock("memcpy");
    return destination;
}

Calling a function from code

To call our 'memcpy' function the code was (before our transformation, remember, the parameters are pushed to stack from last to first):

    this.oCPU.AX.Word = 0x10; // byte count to copy
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xa343; // our source pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.AX.Word = 0xdb44; // our destination pointer
    this.oCPU.PushWord(this.oCPU.AX.Word);
    this.oCPU.PushWord(this.oCPU.CS.Word); // decompiler generated, stack management - push return segment
    this.oCPU.PushWord(0x11a0); // decompiler generated, stack management - push return offset
    // Instruction address 0x0000:0x119b, size: 5
    this.oParent.MSCAPI.memcpy(); // call our function
    this.oCPU.PopDWord(); // decompiler generated, stack management - pop return offset and segment
    this.oCPU.CS.Word = this.usSegment; // decompiler generated, restore this function segment
    this.oCPU.SP.Word = this.oCPU.ADDWord(this.oCPU.SP.Word, 0x6); // decompiler generated, remove our parameters from the stack

And with transformed code, the call to our function simply looks like:

    this.oParent.MSCAPI.memcpy(0xdb44, 0xa343, 0x10); // call our function


How to access global variables

    this.oCPU.SI.Word = this.oCPU.ReadWord(this.oCPU.DS.Word, 0x2); // transfer word from memory location 2 to a SI register


How to store global variables

    this.oCPU.WriteWord(this.oCPU.DS.Word, 0x2, this.oCPU.SI.Word); // store value of SI register (word) to a memory location 2

Of course a more complex forms are allowed, for example:

    this.oCPU.WriteByte(this.oCPU.DS.Word, (ushort)(this.oCPU.SI.Word + 0x10), this.oCPU.AX.Low); // store byte value of register AX.Low to a memory location indexed by 0x10 + SI
    this.oCPU.SI.Word++; // increment register SI to store a next value to an array


The goal is to replace all global variables with C# counterparts which are shared in common 'Civilization' object.

The helper methods in 'CPU' object

  • DWordToWords(CPURegister regLow, CPURegister regHigh, uint value) - the regLow will be assigned with low word of value, and regHigh will be assigned with high word
  • uint WordsToDWord(ushort lowValue, ushort highValue) - the regLow and regHigh words will be combined to dword value
  • string ReadString(uint address) - Returns the string located at linear memory address and terminated with 0
  • string ReadDosString(uint address) - Returns the string located at linear memory address and terminated with '$'
  • WriteString(uint address, string text, int maxLength) - Writes a string which length can't be more than maximum length at linear memory address

The helper method in 'CPUMemory' object

  • uint ToLinearAddress(ushort segment, ushort offset) - Converts the segment and offset location to an absolute memory address
Clone this wiki locally