Skip to content

scancodes

Mohiuddin Khan Inamdar edited this page Jun 21, 2026 · 3 revisions

← Home

Scancodes & Translation

Turning a raw keyboard byte into a character: make codes, break codes, the 0x80 release bit, and two lookup tables.

The byte you read from the keyboard's 8042 controller at port 0x60 is a scancode — a number that identifies a physical key and whether it was pressed or released. It is not ASCII. The letter q is not byte 0x71 ('q'); it is scancode 0x10. Turning that number into the character 'q' (or 'Q', depending on modifiers) is the job described here.

Scancode Set 1

The keyboard sends Set 1 scancodes — the original IBM XT keyboard set. Each key has a fixed make code; the kernel's translation tables are indexed directly by that code.

💡 Tidbit: Set 1 is ancient — it dates to the 1981 IBM PC — yet it is still what bare-metal code sees today. Modern USB keyboards send entirely different codes, but the controller (or BIOS, in legacy/translation mode) converts them back to Set 1 for compatibility. Writing to Set 1 means your decoder works on hardware from 1981 to 2026.

Make codes and break codes

Every key press generates a make code. Releasing the same key generates a break code, which is just the make code with the high bit (0x80) set:

make code   q  = 0x10   (0001 0000)
break code  q  = 0x90   (1001 0000)   <- bit 7 set

So the decoder tests bit 0x80 to tell press from release, then masks it off to recover the original key:

#define KEY_RELEASE_BIT 0x80
...
// Handle key release
if (scancode & KEY_RELEASE_BIT) {
    scancode &= ~KEY_RELEASE_BIT;
    ...
    return 0; // Return 0 for key releases
}

keyboard.h:90 and kernel.c:130-142.

Both readers return 0 on a release so the input loop ignores it — except that releasing a modifier key (shift/ctrl/alt) first clears that modifier's flag in KeyboardState (kernel.c:133-140).

The translation tables

Two static tables in keyboard.h map a make code to an ASCII character — one for the unshifted key, one for the shifted key:

static const char scancode_to_ascii[] = {
    0,  27, '1', '2', '3', '4', '5', '6', '7', '8',  // 0-9
    '9', '0', '-', '=', '\b', '\t', 'q', 'w', 'e', 'r',  // 10-19
    ...
};
static const char scancode_to_ascii_shift[] = {
    0,  27, '!', '@', '#', '$', '%', '^', '&', '*',  // 0-9
    '(', ')', '_', '+', '\b', '\t', 'Q', 'W', 'E', 'R',  // 20-29
    ...
};

keyboard.h:32-55.

The index is the make code. Scancode 0x10 (decimal 16) lands on 'q' / 'Q'; scancode 0x02 lands on '1' / '!'. Entries that have no printable character (modifier keys, unused codes) are 0. The full byte-by-byte tables are in the scancode tables reference.

⚠️ Caveat: The tables only cover indices 089, which is why both readers guard the lookup with if (scancode < 90) (kernel.c:161, shell.c:168). A scancode past the end of the table would read out of bounds — but Set 1's printable keys all fall within this range, and special keys are handled separately by name.

Choosing which table

The decoder picks the shifted table when shift (or, in the full kernel, caps lock) is active:

if (scancode < 90) {
    if (kbd_state.shift_pressed || kbd_state.caps_lock) {
        ascii_char = scancode_to_ascii_shift[scancode];
    } else {
        ascii_char = scancode_to_ascii[scancode];
    }
    // Handle caps lock for letters
    if (kbd_state.caps_lock && !kbd_state.shift_pressed) {
        if (ascii_char >= 'a' && ascii_char <= 'z') ascii_char -= 32;
    } else if (kbd_state.caps_lock && kbd_state.shift_pressed) {
        if (ascii_char >= 'A' && ascii_char <= 'Z') ascii_char += 32;
    }
}

kernel.c:161-178.

The extra caps_lock logic fixes a subtlety: caps lock should affect letters only, but the shifted table also turns 1!, 2@, and so on. So when caps lock alone is on, the code selects the shifted table for uppercase letters, then any non-letter that got shifted is left as-is by the range checks; and when caps lock and shift are both held, letters are flipped back to lowercase (+= 32), matching how a real keyboard behaves.

💡 Tidbit: The -= 32 / += 32 trick exploits the ASCII layout: uppercase and lowercase letters are exactly 32 apart ('A' = 65, 'a' = 97). Subtracting 32 uppercases; adding 32 lowercases. No table needed.

The shell's simpler get_key skips caps entirely — shift only (shell.c:168-173).

Modifier scancodes

Modifier keys are recognized by their make codes and update KeyboardState instead of producing a character:

Modifier Scancode Macro
Left Shift 0x2A SCANCODE_LEFT_SHIFT
Right Shift 0x36 SCANCODE_RIGHT_SHIFT
Left Ctrl 0x1D SCANCODE_LEFT_CTRL
Left Alt 0x38 SCANCODE_LEFT_ALT
Caps Lock 0x3A SCANCODE_CAPS_LOCK

keyboard.h:58-62. Shift/ctrl/alt set their flag on the make code and clear it on the break code; caps lock toggles on each press (kernel.c:155-157):

} else if (scancode == SCANCODE_CAPS_LOCK) {
    kbd_state.caps_lock = !kbd_state.caps_lock;
    return 0;
}

Special keys

Non-character keys are handled by name after the table lookup. The full kernel maps arrows and function keys to sentinel return values, plus the universal Enter/Backspace/Tab/Esc:

if (scancode == SCANCODE_UP)    return 'U';
if (scancode == SCANCODE_DOWN)  return 'D';
if (scancode == SCANCODE_LEFT)  return 'L';
if (scancode == SCANCODE_RIGHT) return 'R';
if (scancode == SCANCODE_F1)    return '1';
...
if (scancode == SCANCODE_ESC)       return 27;
if (scancode == SCANCODE_ENTER)     return '\n';
if (scancode == SCANCODE_BACKSPACE) return '\b';
if (scancode == SCANCODE_TAB)       return '\t';

kernel.c:181-190. Their codes (keyboard.h:63-87): ESC 0x01, ENTER 0x1C, BACKSPACE 0x0E, TAB 0x0F, SPACE 0x39, F1–F10 0x3B0x44, UP 0x48, LEFT 0x4B, RIGHT 0x4D, DOWN 0x50, HOME 0x47, END 0x4F, PAGE_UP 0x49, PAGE_DOWN 0x51, INSERT 0x52, DELETE 0x53.

⚠️ Caveat: On real hardware, the extended keys (arrows, Home/End, etc.) are sent as a two-byte sequence prefixed with 0xE0 — e.g. Up is 0xE0 0x48, not bare 0x48. MyOS-Simple does not handle the 0xE0 prefix; it reads a single byte and matches 0x48 directly. This works under QEMU, which delivers the simpler single-byte codes, but the arrow keys may misbehave on physical hardware that sends the full extended sequence.

See also

Clone this wiki locally