-
Notifications
You must be signed in to change notification settings - Fork 0
scancodes
Turning a raw keyboard byte into a character: make codes, break codes, the 0x80 release bit, and two lookup tables.
The byte you read from the keyboard's 8042 controller at port 0x60 is a scancode — a number that identifies a physical key and whether it was pressed or released. It is not ASCII. The letter q is not byte 0x71 ('q'); it is scancode 0x10. Turning that number into the character 'q' (or 'Q', depending on modifiers) is the job described here.
The keyboard sends Set 1 scancodes — the original IBM XT keyboard set. Each key has a fixed make code; the kernel's translation tables are indexed directly by that code.
💡 Tidbit: Set 1 is ancient — it dates to the 1981 IBM PC — yet it is still what bare-metal code sees today. Modern USB keyboards send entirely different codes, but the controller (or BIOS, in legacy/translation mode) converts them back to Set 1 for compatibility. Writing to Set 1 means your decoder works on hardware from 1981 to 2026.
Every key press generates a make code. Releasing the same key generates a break code, which is just the make code with the high bit (0x80) set:
make code q = 0x10 (0001 0000)
break code q = 0x90 (1001 0000) <- bit 7 set
So the decoder tests bit 0x80 to tell press from release, then masks it off to recover the original key:
#define KEY_RELEASE_BIT 0x80
...
// Handle key release
if (scancode & KEY_RELEASE_BIT) {
scancode &= ~KEY_RELEASE_BIT;
...
return 0; // Return 0 for key releases
}— keyboard.h:90 and kernel.c:130-142.
Both readers return 0 on a release so the input loop ignores it — except that releasing a modifier key (shift/ctrl/alt) first clears that modifier's flag in KeyboardState (kernel.c:133-140).
Two static tables in keyboard.h map a make code to an ASCII character — one for the unshifted key, one for the shifted key:
static const char scancode_to_ascii[] = {
0, 27, '1', '2', '3', '4', '5', '6', '7', '8', // 0-9
'9', '0', '-', '=', '\b', '\t', 'q', 'w', 'e', 'r', // 10-19
...
};
static const char scancode_to_ascii_shift[] = {
0, 27, '!', '@', '#', '$', '%', '^', '&', '*', // 0-9
'(', ')', '_', '+', '\b', '\t', 'Q', 'W', 'E', 'R', // 20-29
...
};— keyboard.h:32-55.
The index is the make code. Scancode 0x10 (decimal 16) lands on 'q' / 'Q'; scancode 0x02 lands on '1' / '!'. Entries that have no printable character (modifier keys, unused codes) are 0. The full byte-by-byte tables are in the scancode tables reference.
⚠️ Caveat: The tables only cover indices0–89, which is why both readers guard the lookup withif (scancode < 90)(kernel.c:161,shell.c:168). A scancode past the end of the table would read out of bounds — but Set 1's printable keys all fall within this range, and special keys are handled separately by name.
The decoder picks the shifted table when shift (or, in the full kernel, caps lock) is active:
if (scancode < 90) {
if (kbd_state.shift_pressed || kbd_state.caps_lock) {
ascii_char = scancode_to_ascii_shift[scancode];
} else {
ascii_char = scancode_to_ascii[scancode];
}
// Handle caps lock for letters
if (kbd_state.caps_lock && !kbd_state.shift_pressed) {
if (ascii_char >= 'a' && ascii_char <= 'z') ascii_char -= 32;
} else if (kbd_state.caps_lock && kbd_state.shift_pressed) {
if (ascii_char >= 'A' && ascii_char <= 'Z') ascii_char += 32;
}
}— kernel.c:161-178.
The extra caps_lock logic fixes a subtlety: caps lock should affect letters only, but the shifted table also turns 1→!, 2→@, and so on. So when caps lock alone is on, the code selects the shifted table for uppercase letters, then any non-letter that got shifted is left as-is by the range checks; and when caps lock and shift are both held, letters are flipped back to lowercase (+= 32), matching how a real keyboard behaves.
💡 Tidbit: The
-= 32/+= 32trick exploits the ASCII layout: uppercase and lowercase letters are exactly 32 apart ('A'= 65,'a'= 97). Subtracting 32 uppercases; adding 32 lowercases. No table needed.
The shell's simpler get_key skips caps entirely — shift only (shell.c:168-173).
Modifier keys are recognized by their make codes and update KeyboardState instead of producing a character:
| Modifier | Scancode | Macro |
|---|---|---|
| Left Shift | 0x2A |
SCANCODE_LEFT_SHIFT |
| Right Shift | 0x36 |
SCANCODE_RIGHT_SHIFT |
| Left Ctrl | 0x1D |
SCANCODE_LEFT_CTRL |
| Left Alt | 0x38 |
SCANCODE_LEFT_ALT |
| Caps Lock | 0x3A |
SCANCODE_CAPS_LOCK |
— keyboard.h:58-62. Shift/ctrl/alt set their flag on the make code and clear it on the break code; caps lock toggles on each press (kernel.c:155-157):
} else if (scancode == SCANCODE_CAPS_LOCK) {
kbd_state.caps_lock = !kbd_state.caps_lock;
return 0;
}Non-character keys are handled by name after the table lookup. The full kernel maps arrows and function keys to sentinel return values, plus the universal Enter/Backspace/Tab/Esc:
if (scancode == SCANCODE_UP) return 'U';
if (scancode == SCANCODE_DOWN) return 'D';
if (scancode == SCANCODE_LEFT) return 'L';
if (scancode == SCANCODE_RIGHT) return 'R';
if (scancode == SCANCODE_F1) return '1';
...
if (scancode == SCANCODE_ESC) return 27;
if (scancode == SCANCODE_ENTER) return '\n';
if (scancode == SCANCODE_BACKSPACE) return '\b';
if (scancode == SCANCODE_TAB) return '\t';— kernel.c:181-190. Their codes (keyboard.h:63-87): ESC 0x01, ENTER 0x1C, BACKSPACE 0x0E, TAB 0x0F, SPACE 0x39, F1–F10 0x3B–0x44, UP 0x48, LEFT 0x4B, RIGHT 0x4D, DOWN 0x50, HOME 0x47, END 0x4F, PAGE_UP 0x49, PAGE_DOWN 0x51, INSERT 0x52, DELETE 0x53.
⚠️ Caveat: On real hardware, the extended keys (arrows, Home/End, etc.) are sent as a two-byte sequence prefixed with0xE0— e.g. Up is0xE0 0x48, not bare0x48. MyOS-Simple does not handle the0xE0prefix; it reads a single byte and matches0x48directly. This works under QEMU, which delivers the simpler single-byte codes, but the arrow keys may misbehave on physical hardware that sends the full extended sequence.
-
PS/2 Keyboard & the 8042 Controller — how the scancode byte arrives at port
0x60 - VGA Text Mode — where the decoded character is finally drawn
- Scancode Tables — the complete Set 1 byte tables, normal and shifted
-
Stage 2: C in Protected Mode — the full modifier/arrow decoder (
get_key_advanced) -
Stage 3: Interactive Shell — the simpler shift-only
get_key - Home
Stages
- 1 · Assembly boot
- 2 · C protected mode
- 3 · Interactive shell
- 4 · Clock / processes / calc
- 5 · Stabilized release
Concepts — boot
Concepts — protected mode
Concepts — hardware
Concepts — OS services
Reference
- Memory map
- I/O ports
- GDT descriptor format
- Scancode tables
- Command reference
- Toolchain & build
- Glossary
Guides