Skip to content

SingleStepTests/8088

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

8088 V1

Current version: 1.2.0

This is a set of 8088 CPU tests produced by Daniel Balsom and Folkert van Heusden using the Arduino8088 interface and the MartyPC emulator.

These tests are produced with a Harris 80C88 running in Maximum Mode, with bus signals generated by an Intel 8288 Bus Controller.

10,000 tests are provided per opcode, with the following exceptions:

  • String instructions are limited to 2,000 tests due to their large size, even when masking CX to 7 bits.
  • Shift and rotate instructions that use CL (D2, D3) are limited to 4,000 tests again due to size constraints, even when masking CL to 6 bits.

All tests assume a full 1MB of RAM is mapped to the processor and writable. Bear in mind that on the 8088, the address space wraps around at FFFFF.

No wait states are incurred during any of the tests. The interrupt and trap flags are not exercised.

Using the Tests

All tests begin after a CPU reset. The simplest way to run each test is to override your emulated CPU's normal reset vector (FFFF:0000) to the CS:IP of the test's initial state, then reset the CPU. Optionally, you can simulate the far jump typically found at the reset vector to jump to the test's CS:IP. The resulting cycle states should align using either method.

The initial state's 'ram' list defines address and byte pairs, including the instruction bytes and subsequent bytes fetched during program execution, and should be written to memory at the start of the test.

Instruction cycles begin from the cycle in which the QS0 and QS1 status lines indicate an instruction "First Byte" has been fetched - this may be an optional instruction prefix, in which case there will be multiple First Byte statuses.

Random segment override prefixes have been prepended to a percentage of instructions, even if they may not do anything. This isn't completely useless - a few bugs have been found where segment overrides had an effect when they should not have.

Instructions cycles end when the first byte of the next instruction is read from the queue. If the byte was already present in the queue, this represents a line of microcode flagged with NXT or RNI. If the queue was empty, the extra cycles spent fetching it lengthen the instruction from documented timings.

All bytes after the initial instruction bytes are set to 0x90 (144) (NOP). Therefore, the queue contents at the end of all tests will contain only NOPs, with a maximum of 3 (since one has been read out).

String instructions may randomly be prepended by a REP, REPE or REPNE instruction prefix. In this event, CX is masked to 7 bits to produce reasonably sized tests (A string instruction with CX==65535 would be over a million cycles in execution).

Test Format

Sample test:

{
    "name": "add byte [ss:bp+si-2259h], cl",
    "bytes": [0, 138, 167, 221],
    "initial": {
        "regs": {
            "ax": 53139,
            "bx": 10002,
            "cx": 25563,
            "dx": 1580,
            "cs": 46716,
            "ss": 56284,
            "ds": 33035,
            "es": 63307,
            "sp": 3860,
            "bp": 34194,
            "si": 31963,
            "di": 35563,
            "ip": 13514,
            "flags": 62531
        },
        "ram": [
            [760970, 0],
            [760971, 138],
            [760972, 167],
            [760973, 221],
            [957908, 244]
        ],
        "queue": []
    },
    "final": {
        "regs": {
            "ip": 13518,
            "flags": 62599
        },
        "ram": [
            [957908, 207]
        ],
        "queue": [144]
    },
    "cycles": [
        [0, 760971, "CS", "R--", "---", 0, 0, "CODE", "T2", "F", 0],
        [0, 760971, "CS", "R--", "---", 0, 138, "PASV", "T3", "-", 0],
        [0, 760971, "CS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [1, 760972, "--", "---", "---", 0, 0, "CODE", "T1", "-", 0],
        [0, 760972, "CS", "R--", "---", 0, 0, "CODE", "T2", "S", 138],
        [0, 760972, "CS", "R--", "---", 0, 167, "PASV", "T3", "-", 0],
        [0, 760972, "CS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [1, 760973, "--", "---", "---", 0, 0, "CODE", "T1", "-", 0],
        [0, 760973, "CS", "R--", "---", 0, 0, "CODE", "T2", "S", 167],
        [0, 760973, "CS", "R--", "---", 0, 221, "PASV", "T3", "-", 0],
        [0, 760973, "CS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [1, 760974, "--", "---", "---", 0, 0, "CODE", "T1", "-", 0],
        [0, 760974, "CS", "R--", "---", 0, 0, "CODE", "T2", "-", 0],
        [0, 760974, "CS", "R--", "---", 0, 144, "PASV", "T3", "S", 221],
        [0, 760974, "CS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [0, 760974, "--", "---", "---", 0, 0, "PASV", "Ti", "-", 0],
        [0, 760974, "--", "---", "---", 0, 0, "PASV", "Ti", "-", 0],
        [1, 957908, "--", "---", "---", 0, 0, "MEMR", "T1", "-", 0],
        [0, 957908, "SS", "R--", "---", 0, 0, "MEMR", "T2", "-", 0],
        [0, 957908, "SS", "R--", "---", 0, 244, "PASV", "T3", "-", 0],
        [0, 957908, "SS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [1, 760975, "--", "---", "---", 0, 0, "CODE", "T1", "-", 0],
        [0, 760975, "CS", "R--", "---", 0, 0, "CODE", "T2", "-", 0],
        [0, 760975, "CS", "R--", "---", 0, 144, "PASV", "T3", "-", 0],
        [0, 760975, "CS", "---", "---", 0, 0, "PASV", "T4", "-", 0],
        [1, 957908, "--", "---", "---", 0, 0, "MEMW", "T1", "-", 0],
        [0, 957908, "SS", "-A-", "---", 0, 0, "MEMW", "T2", "-", 0],
        [0, 957908, "SS", "-AW", "---", 0, 207, "PASV", "T3", "-", 0],
        [0, 957908, "SS", "---", "---", 0, 0, "PASV", "T4", "-", 0]
    ],
    "hash": "d32f3371444cd1d30d05aec40f65284c9d6c85ec",
    "idx": 3
},

The 'name' field is a user-readable disassembly of the instruction. The 'bytes' list contains the instruction bytes that make up the full instruction. The 'initial' keys contain the register, memory and queue states before instruction execution. The 'final' keys contain changes to registers and memory, and the state of the queue after instruction execution.

  • Registers and memory locations that are unchanged from the initial state are not included in the final state.
  • The entire value of 'flags' is provided if any flag has changed.

The 'hash' key is a SHA1 hash of the test JSON. It should uniquely identify any test in the suite. The 'idx' key is the numerical index of the test within the test file.

Cycle Format

If you are not interested in writing a cycle-accurate emulator, you can ignore this section.

The 'cycles' list contains sub lists, each corresponding to a single CPU cycle. Each contains several fields. From left to right, the cycle fields are:

  • Pin bitfield
  • address latch
  • segment status
  • memory status
  • IO status
  • BHE (Byte high enable) status
  • data bus
  • bus status
  • T-state
  • queue operation status
  • queue byte read

The first column is a bitfield representing certain chip pin states.

  • Bit #0 of this field represents the ALE (Address Latch Enable) pin output, which in Maximum Mode is output by the i8288. This signal is asserted on T1 to instruct the PC's address latches to store the current address. This is necessary since the address and data lines of the 8088 are multiplexed, and a full, valid address is only on the bus while ALE is asserted. Thus the second column represents the value of the address latch, and not the address bus itself (which may not be valid in a given cycle).
  • Bit #1 of this field represents the INTR pin input. This is not currently exercised, but may be in future test releases.
  • Bit #2 of this field represents the NMI pin input. This is not currently exercised, but may be in future test releases.

The segment status indicates which segment is in use to calculate addresses by the CPU, using segment-offset addressing. This field represents the S3 and S4 status lines of the 8088.

The memory status field represents outputs of the attached i8288 Bus Controller. From left to right, this field will contain RAW or ---. R represents the MRDC status line, A represents the AMWC status line, and W represents the MWTC status line. These status lines are active-low. A memory read will occur on T3 or the last Tw t-state when MRDC is active. A memory write will occur on T3 or the last Tw t-state when AMWC is active. At this point, the value of the data bus field will be valid and will represent the byte read or written.

The IO status field represents outputs of the attached i8288 Bus Controller. From left to right, this field will contain RAW or ---. R represents the IORC status line. A represents the AIOWC status line. W represents the IOWC status line. These status lines are active-low. An IO read will occur on T3 or the last Tw t-state when IORC is active. An IO write will occur on T3 or the last Tw t-state when AIOWC is active. At this point, the value of the data bus field will be valid and will represent the byte read or written.

The BHE status indicates whether a 16-bit data transfer is occurring. This pin does not exist on the 8088 but is provided to make the test set comptable with any set that may be produced for the 8086 in the future.

The data bus indicates the value of the last 8 bits of the multiplexed bus. It is typically only valid on T3.

The bus status lines indicate the type of bus m-cycle currently in operation. Either INTA, IOR, IOW, MEMR, MEMW, HALT, CODE, or PASV. These states represent the S0-S2 status lines of the 8088.

The T-state is the current T-state of the CPU. Since this state is not exposed by the CPU, it is calculated based on bus activity.

The queue operation status will contain either F, S, E or -. F indicates a "First Byte" of an instruction or instruction prefix has been read. S indicates a "Subsequent" byte of an instruction has been read - either a modr/m, displacement, or operand. E indicates that the instruction queue has been Emptied/Flushed. All queue operation statuses reflect an operation that actually occurred on the previous cycle. This field represents the QS0 and QS1 status lines of the 8088.

When the queue operation status is not '-', then the value of the queue byte read field is valid and represents the byte read from the queue.

For more information on the 8088 and 8288 status lines, see their respective white papers.

Undefined Instructions

Note that these tests include many undocumented/undefined opcodes and instruction forms. The 8088 has no concept of an invalid instruction, and will perform some task for any provided sequence of instruction bytes. Additionally, flags may be changed by documented instructions in ways that are officially undefined.

Per-Instruction Notes

  • 8F: The behavior of 8F with reg != 0 is undefined. If you can figure out the rules governing its behavior, please let us know.
  • 9B: WAIT is not included in this test set.
  • 8C,8E: These instructions are only defined for a reg value of 0-3, however only the first two bits are checked, so the test set contains random values for reg.
  • 8D,C4,C5: 'r, r' forms of LEA, LES, LDS are undefined. These forms are not included in this test set due to disruption of the last calculated EA by the CPU set up routine.
  • A4-A7,AA-AF: CX is masked to 7 bits. This provides a reasonable test length, as the full 65535 value in CX with a REP prefix could result in over one million cycles.
  • C6,C7: Although the reg != 0 forms of these instructions are officially undefined, this field is ignored. Therefore, the test set contains random values for reg.
  • D2,D3: CL is masked to 6 bits. This shortens the possible test length, while still hopefully catching the case where CL is improperly masked to 5 bits (186+ behavior).
  • E4,E5,EC,ED: All forms of the IN instruction should return 0xFF on IO read.
  • F0, F1: The LOCK prefix is not exercised in this test set.
  • F4: HALT is not included in this test set.
  • F6.6, F6.7, F7.6, F7.7 - These instructions can generate a divide exception (more accurately, a Type-0 Interrupt). When this occurs, cycle traces continue until the first byte of the exception handler is fetched and read from the queue. The IVT entry for INT0 is set up to point to 1024 (0400h).
    • NOTE: On the 8088 specifically, the return addressed pushed to the stack on divide exception is the address of the next instruction. This differs from the behavior of later CPUs and generic Intel IA-32 emulators.
  • F6.7, F7.7 - Presence of a REP prefix preceding IDIV will invert the sign of the quotient, therefore REP prefixes are prepended to 10% of IDIV tests. This was only recently discovered by reenigne.
  • FE: The forms with reg field 2-7 are undefined and are not included in this initial release.

metadata.json

If you are not interested in emulating the undefined behavior of the 8088, you can use the included metadata.json file which lists which instructions are undocumented or undefined and provides masks for undefined flags.

{
    "url": "https://github.com/SingleStepTests/8088/",
    "version": "1.2.0",
    "syntax_version": 2,
    "cpu": "8088",
    "cpu_detail": "Harris 80C88",
    "generator": "arduino8088",
    "date": "2024",
    "opcodes": {
        "00": {
            "status": "normal"
        },
        "01": {
            "status": "normal"
        },
        "02": {
            "status": "normal"
        },
        "03": {
            "status": "normal"
        },
        ...

In metadata.json, opcodes are listed as object keys under the 'opcodes' field, each key being the opcode hexadecimal string representation padded to two digits. Each opcode entry has a 'status' field which may be 'normal', 'prefix', 'alias', 'undocumented', 'undefined', or 'fpu'.

An opcode marked 'prefix' is an instruction prefix. These opcodes will not have individual tests. An opcode marked 'alias' is simply an alias for another instruction. These exist because the mask that determines which microcode address maps to which opcode is not always perfectly specific. An opcode marked 'undocumented' has well-defined and potentially useful behavior, such as SETMO and SETMOC. An opcode marked 'undefined' likely has unusual or unpredictable behavior of limited usefulness. An opcode marked 'fpu' is an FPU instruction (ESC opcode).

If present, the 'flags' field indicates which flags are undefined after the instruction has executed. A flag is either a letter from the pattern odiszapc indicating it is undefined, or a period, indicating it is defined. The 'flags-mask' field is a 16 bit value that can be applied with an AND to the flags register after instruction execution to clear any flags left undefined.

An opcode may have a 'reg' field which will be an object of opcode extensions/register specifiers represented by single digit string keys - this is the 'reg' field of the modrm byte. Certain opcodes may be defined or undefined depending on their register specifier or opcode extension. Therefore, each entry within this 'reg' object will have the same fields as a top-level opcode object.