Skip to content

A serializer/deserializer for working directly with binary data in a way that is still human-readable (ASCII -> ASCII, sensitive chars -> similar-looking UTF-8, control chars -> semantically-relevant glyphs), but don't want to break your terminal/editor. Includes C, LuaJIT and Node.js implementations. Full test coverage. Decompilation option.

License

Notifications You must be signed in to change notification settings

pmarreck/printable-binary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PrintableBinary

A cross-platform utility (LuaJIT, C, and JavaScript implementations) for encoding arbitrary binary data into human-readable UTF-8 text, and then decoding it back to the original binary data.

Overview

PrintableBinary is designed to [de]serialize binary data to/from a visually distinct, human-readable format that is also copy-pastable and embeddable in any UTF-8-aware context. It's an alternative to hexadecimal encoding that offers better visual density and makes embedded ASCII text immediately recognizable, while also making it possible to incorporate binary data into text-based formats (such as JSON, TOML, XML, YAML, etc.) without escaping issues.

This implementation allows you to view binary data directly in a terminal (it even has a pipe inspection mode with --passthrough) without breaking the display, making it particularly useful for debugging, logging, sharing binary data in human-readable form, and even dragging files into a web UI for instant encode/decode.

Features

  • Triple Implementations: Available as LuaJIT script, compiled C binary, and JavaScript module (shared by the browser UI and Node.js tooling) for maximum flexibility
  • Web & Node.js Tooling: Drag-and-drop browser interface and a Node-based CLI wrapper share the same encode/decode core for cross-platform workflows
  • Visually Distinct Characters: Each of the 256 possible byte values maps to a unique, visually distinct UTF-8 character
  • ASCII Passthrough: Standard printable ASCII characters (32-126) largely remain themselves for immediate recognition
  • Shell-Safe Encoding: Special characters that could cause shell issues are encoded with safe Unicode alternatives
  • Single Character Width: Each encoded representation renders as a single character wide in a monospace terminal
  • Compactness: Uses 1-3 byte UTF-8 characters for optimal space efficiency
  • Usability: Encoded strings are easily copyable, pastable, and printable
  • Smart Disassembly: Format-aware disassembly using objdump that understands binary file structures (Mach-O, ELF, PE)
  • Raw Disassembly: Direct byte-to-instruction disassembly using Capstone with auto-architecture detection or manual selection
  • Formatting: Customizable output formatting with group size and line width options
  • Universal Binary Support: Detects and clearly identifies macOS universal binaries with multiple architectures
  • Intelligent Pattern Recognition: Recognizes common byte patterns (NUL, NOP, INT3) and provides context-aware analysis to distinguish between code and data
  • Binary Safety: Preserves all binary data, including NUL bytes, when encoding and decoding
  • Passthrough Mode: Simultaneously outputs original binary data to stdout and encoded text to stderr for flexible processing pipelines

Compared to Hexadecimal Encodings

  • Higher on-screen density: Hex consumes two glyphs per byte; PrintableBinary maps each byte to a single visible character, so you see roughly twice as much data per line while still preserving UTF-8 safety.
  • ASCII stands out: Printable ASCII bytes are left untouched (except for shell-hostile symbols, which use look-alike substitutes), so embedded text is immediately readable instead of needing to mentally decode hex pairs.
  • Control characters are labeled: Bytes 0–31 and DEL render as mnemonic symbols (, , , etc.), making structure and control flow obvious without extra tooling.
  • Trade-off: Hex expands data by exactly 2× in bytes. PrintableBinary averages about 1.8–1.9× on real-world binaries (thanks to the many 1- and 2-byte UTF-8 mappings) and only approaches 3× in the worst case. The small extra cost buys markedly better readability and paste safety.

Usage

As a Command Line Tool

# Use any implementation:
# LuaJIT version: ./printable_binary
# C version:     ./bin/printable_binary_c
# Node.js CLI:   ./printable_binary_node.js
# (Examples below use the LuaJIT version; the others accept the same flags.)

# Encode binary data
echo -n "Hello, World!" | ./printable_binary
# Output: Hello,␣World﹗

# Note: Direct encoding of binary data as command-line arguments is not supported
# because shell environments cannot represent all binary data (such as NUL bytes)
# Always pipe input or specify a file to encode

# Encode a file
./printable_binary somefile.bin > encoded.txt

# Encode with formatting (groups of 8 characters, 10 groups per line)
./printable_binary -f somefile.bin > formatted_encoded.txt

# Encode with custom formatting (groups of 4 characters, 16 groups per line)
./printable_binary -f=4x16 somefile.bin > custom_formatted.txt

# Encode with raw disassembly (auto-detects architecture)
./printable_binary -a executable.bin > disassembled.txt

# Encode with smart disassembly (format-aware)
./printable_binary --smart-asm executable.bin > smart_disassembled.txt

# Encode with both formatting and disassembly
./printable_binary -a -f=8x8 executable.bin > formatted_disassembly.txt

# Encode with specific architecture (useful for universal binaries)
./printable_binary -a --arch x64 universal_binary.bin > x64_disassembly.txt

# NOTE: Disassembly only processes a portion of the binary
# Decoding from disassembly will not reconstruct the full binary
# For universal binaries, it will only show one architecture
./printable_binary universal_binary.bin > full_binary.txt  # Use this for full binary preservation

# Decode data (spaces and newlines are automatically ignored during decoding)
echo -n "Hello,␣World﹗" | ./printable_binary -d
# Output: Hello, World!

# Decode formatted data (formatting is ignored)
cat formatted_encoded.txt | ./printable_binary -d > original.bin

# Decode disassembled data (disassembly info is ignored)
cat disassembled.txt | ./printable_binary -d > original_executable.bin

# Use passthrough mode to output both original binary (stdout) and encoded text (stderr)
# This is useful for binary data processing pipelines that need both representations
echo -n "Hello, World!" | ./printable_binary --passthrough 2>encoded.txt | wc -c
# Binary data goes to stdout, encoded text to stderr

# Use the C implementation for better performance on large files
./bin/printable_binary_c large_file.bin > encoded_large.txt

Web Interface

  • Live demo: https://pmarreck.github.io/printable-binary/
  • Drag-and-drop or browse to encode any file; .pbt uploads are automatically decoded back to their original binary.
  • Large outputs (>1 MB) skip the textarea to avoid browser jank—use the Download button to grab the UTF-8 text.
  • Default wrapping is 75 characters per line to balance readability and density; copy/download buttons reuse the exact bytes produced by the CLI and Node implementations.
  • To hack locally, open docs/index.html (or index.html) in any modern browser; the page loads the shared printable_binary.js module with no build step required.

As a Lua Library

local PrintableBinary = require("printable_binary")

-- Encode binary data
local binary_data = "Hello, World!"
local encoded = PrintableBinary.encode(binary_data)
print(encoded)  -- Output: Hello,␣World!

-- Decode back to binary
local decoded = PrintableBinary.decode(encoded)
print(decoded)  -- Output: Hello, World!

As a JavaScript Module

import PrintableBinary from './printable_binary.js';

const pb = new PrintableBinary();
const input = new Uint8Array([0x00, 0xFF, 0x41]);

// Encode to printable UTF-8
const encoded = pb.encode(input, { format: '75x1' });
console.log(encoded);

// Decode back to bytes
const decoded = pb.decode(encoded);
console.log(Array.from(decoded)); // [0, 255, 65]

The same module powers the browser UI and can be run in Node.js (ESM) or bundled for other environments.

JavaScript CLI

For command-line parity with the LuaJIT/C tools, use the Node-based wrapper:

# Encode (auto-detects stdin vs. file)
./printable_binary_node.js input.bin > encoded.pbt

# Decode (whitespace is ignored automatically)
./printable_binary_node.js --decode encoded.pbt > restored.bin

# Apply formatting (e.g., 75 characters per line)
./printable_binary_node.js --format 75x1 input.bin > formatted.pbt

# Pipe data through stdin
cat input.bin | ./printable_binary_node.js -f=8x10 > encoded.txt

Supported flags: -d/--decode, -f/--format NxM, -h/--help. The CLI shares the exact encode/decode implementation with the browser UI. Disassembly options (-a, --smart-asm, etc.) are not available in the Node wrapper; use the LuaJIT or C binaries when you need Capstone/objdump features.

Character Map

All implementations share the same mapping table stored in character_map.txt (256 lines, one glyph per byte). The binaries look for this file in the following order:

  • PRINTABLE_BINARY_MAP environment variable (path to the file)
  • alongside the executable/module (printable_binary, printable_binary.js, printable_binary_c)
  • the current working directory

Edit the file to experiment with alternative glyphs and the LuaJIT, Node.js, and C CLIs will all pick up the changes automatically.

Inspecting Streams (Passthrough Mode)

One powerful trick is to drop PrintableBinary into a pipeline so you can watch the encoded stream on stderr while the raw bytes continue downstream untouched:

# Monitor traffic but keep the pipeline lossless
tcpdump -i en0 -w - | \
  ./printable_binary --passthrough > capture.raw 2> capture.pbt

# Alternatively inspect a decompression stream:
gzip -c bigfile > /tmp/data.gz
gzip -dc /tmp/data.gz | \
  ./printable_binary --passthrough | md5sum
# stdout (original bytes) flows into md5sum; stderr shows the printable view.

Because --passthrough sends the original binary to stdout, you can insert PrintableBinary anywhere in a Unix pipeline for observability without modifying the data flow.

Disassembly Features

PrintableBinary offers two modes for disassembling binary files, each with different strengths:

Smart Disassembly (--smart-asm)

Uses objdump for format-aware disassembly that understands binary file structures:

# Smart disassembly - recommended for most use cases
./printable_binary --smart-asm /usr/bin/ls
./printable_binary --smart-asm -f=4x8 binary_file.exe

Advantages:

  • ✅ Format-aware (understands Mach-O, ELF, PE formats)
  • ✅ Only disassembles actual executable code sections
  • ✅ Accurate disassembly with proper architecture detection
  • ✅ Includes section headers and file format information
  • ✅ Best for analyzing complete, well-formed binaries

Requirements: objdump (usually part of binutils)

Raw Disassembly (-a, --asm)

Uses cstool (Capstone) for direct byte-to-instruction disassembly:

# Raw disassembly with auto-detection
./printable_binary -a binary_file

# Force specific architecture
./printable_binary -a --arch=arm64 data_file.bin
./printable_binary -a --arch=x64 shellcode.bin

Advantages:

  • ✅ Works on any binary data, including fragments
  • ✅ Faster performance
  • ✅ Good for shellcode, raw code fragments, or data analysis
  • ✅ Useful for seeing "what would this data look like as code"
  • ✅ Cross-architecture analysis

Requirements: cstool (part of Capstone framework)

When to Use Each Mode

Use Case Recommended Mode Reason
Analyzing executables/libraries --smart-asm Format-aware, shows only real code
Raw shellcode analysis -a, --asm Works on code fragments
Memory dumps -a, --asm No file format structure
Cross-architecture analysis -a, --asm Force interpretation as different arch
Data section analysis -a, --asm See what data looks like as code
Quick analysis --smart-asm More accurate results
Research/debugging -a, --asm Raw interpretation without format intelligence

Examples

Smart disassembly of a macOS binary:

./printable_binary --smart-asm /usr/libexec/rosetta/runtime
# Output includes proper ARM64 disassembly with section information

Raw disassembly for shellcode analysis:

# Analyze potential shellcode
echo -n "4889e5" | xxd -r -p | ./printable_binary -a --arch=x64

Cross-architecture analysis:

# See what ARM code looks like when interpreted as x86
./printable_binary -a --arch=x32 /usr/bin/arm_binary

Format Compatibility

The PrintableBinary character set is specifically designed to be highly compatible with common text formats:

Excellent Compatibility With:

  • JSON - Perfect in quoted strings (we re-encode " as ˵)
  • XML/HTML - Perfect in text content and attributes (no <>& in our encodings)
  • TOML - Perfect in quoted strings
  • YAML - Perfect in quoted strings, good in unquoted context
  • C/C++/Java/etc. - Perfect in string literals (we re-encode \ as )
  • Shell scripts - Perfect in quoted strings (we re-encode ' as ʼ)
  • SQL - Perfect in quoted strings
  • Most UTF-8 aware text formats

🎯 Key Design Decisions for Compatibility:

  • Double quotes (34) → ˵ (U+02F5) - Avoids JSON/XML attribute conflicts
  • Single quotes (39) → ʼ (U+02BC) - Avoids shell/SQL conflicts
  • Backslashes (92) → (U+29F9) - Avoids escape sequence issues
  • Control characters → Safe Unicode symbols (∅, ⇩, ⏎, etc.)
  • No problematic delimiters in our special encodings

📝 Usage Recommendations:

# JSON
echo '{"binary_data": "'$(./printable_binary file.bin)'"}'

# XML/HTML
echo '<data>'$(./printable_binary file.bin)'</data>'

# YAML
echo 'data: "'$(./printable_binary file.bin)'"'

# Shell variable
DATA="$(./printable_binary file.bin)"

# C string literal
printf 'char data[] = "%s";\n' "$(./printable_binary file.bin)"

Note: If your original binary contains problematic characters (like < or {), they'll appear as-is since they're printable ASCII. Use quoted contexts when embedding in structured formats.

Character Encoding

  • Control Characters (0-31): Mapped to visually distinct symbols like ∅, ¯, «, », µ, etc.
  • Space (32): Encoded as ␣ for visibility
  • Shell-unsafe ASCII characters: Mapped to safe Unicode alternatives:
    • Exclamation mark (33) → ﹗ (U+FE57) Small Exclamation Mark
    • Double quote (34) → ˵ (U+02F5) Modifier Letter Middle Double Grave Accent
    • Hash (35) → ♯ (U+266F) Music Sharp Sign
    • Dollar sign (36) → ﹩ (U+FE69) Small Dollar Sign
    • Percent (37) → ﹪ (U+FE6A) Small Percent Sign
    • Ampersand (38) → ⅋ (U+214B) Turned Ampersand
    • Single quote (39) → ʼ (U+02BC) Modifier Letter Apostrophe
    • Parentheses (40-41) → ❨❩ (U+2768-2769) Medium Parenthesis Ornaments
    • Asterisk (42) → ﹡ (U+FE61) Small Asterisk
    • Plus (43) → ﹢ (U+FE62) Small Plus Sign
    • Minus (45) → ﹣ (U+FE63) Small Hyphen-Minus
    • Slash (47) → ⁄ (U+2044) Fraction Slash
    • Colon (58) → ꞉ (U+A789) Modifier Letter Colon
    • Semicolon (59) → ; (U+037E) Greek Question Mark
    • Equals (61) → ꞊ (U+A78A) Modifier Letter Short Equals Sign
    • Question mark (63) → Ɂ (U+0241) Latin Capital Letter Glottal Stop
    • At sign (64) → @ (U+0040) Commercial At
    • Backslash (92) → ⧷ (U+29F7) Reverse Solidus with Horizontal Stroke
    • Brackets (91, 93) → ⟦⟧ (U+27E6-27E7) Mathematical White Square Brackets
    • Backtick (96) → ˋ (U+02CB) Modifier Letter Grave Accent
    • Braces (123-125) → ❴∣❵ (Ornament and mathematical variants)
    • Tilde (126) → ˜ (U+02DC) Small Tilde
  • DEL (127): Encoded as ⌦
  • Extended Bytes (128-255): Pulled directly from character_map.txt and grouped alphabetically so adjacent bytes share related glyphs

Complete Character Mapping Reference

This table is generated from character_map.txt so every implementation stays in sync:

Byte Char Unicode UTF-8 Name
0 U+2205 E2 88 85 Empty Set
1 ¯ U+00AF C2 AF Macron
2 « U+00AB C2 AB Left-Pointing Double Angle Quotation Mark
3 » U+00BB C2 BB Right-Pointing Double Angle Quotation Mark
4 ϟ U+03DF CF 9F Greek Small Letter Koppa
5 ¿ U+00BF C2 BF Inverted Question Mark
6 ¡ U+00A1 C2 A1 Inverted Exclamation Mark
7 ª U+00AA C2 AA Feminine Ordinal Indicator
8 U+232B E2 8C AB Erase To The Left
9 U+21E5 E2 87 A5 Rightwards Arrow To Bar
10 U+21E9 E2 87 A9 Downwards White Arrow
11 U+21A7 E2 86 A7 Downwards Arrow From Bar
12 § U+00A7 C2 A7 Section Sign
13 U+23CE E2 8F 8E Return Symbol
14 ȯ U+022F C8 AF Latin Small Letter O With Dot Above
15 ʘ U+0298 CA 98 Latin Letter Bilabial Click
16 Ɣ U+0194 C6 94 Latin Capital Letter Gamma
17 ¹ U+00B9 C2 B9 Superscript One
18 ² U+00B2 C2 B2 Superscript Two
19 º U+00BA C2 BA Masculine Ordinal Indicator
20 ³ U+00B3 C2 B3 Superscript Three
21 µ U+00B5 C2 B5 Micro Sign
22 ɨ U+0268 C9 A8 Latin Small Letter I With Stroke
23 ¬ U+00AC C2 AC Not Sign
24 © U+00A9 C2 A9 Copyright Sign
25 ¦ U+00A6 C2 A6 Broken Bar
26 Ƶ U+01B5 C6 B5 Latin Capital Letter Z With Stroke
27 U+238B E2 8E 8B Broken Circle With Northwest Arrow
28 Ξ U+039E CE 9E Greek Capital Letter Xi
29 ǁ U+01C1 C7 81 Latin Letter Lateral Click
30 ǀ U+01C0 C7 80 Latin Letter Dental Click
31 U+00B6 C2 B6 Pilcrow Sign
32 U+2423 E2 90 A3 Open Box
33 ǃ U+01C3 C7 83 Latin Letter Retroflex Click
34 ˵ U+02F5 CB B5 Modifier Letter Middle Double Grave Accent
35 U+266F E2 99 AF Music Sharp Sign
36 U+A7A8 EA 9E A8 Latin Capital Letter S With Oblique Stroke
37 U+2030 E2 80 B0 Per Mille Sign
38 U+214B E2 85 8B Turned Ampersand
39 ʼ U+02BC CA BC Modifier Letter Apostrophe
40 U+2768 E2 9D A8 Medium Left Parenthesis Ornament
41 U+2769 E2 9D A9 Medium Right Parenthesis Ornament
42 U+204E E2 81 8E Low Asterisk
43 U+2A26 E2 A8 A6 Plus Sign With Tilde Below
44 , U+002C 2C Comma
45 ˗ U+02D7 CB 97 Modifier Letter Minus Sign
46 . U+002E 2E Full Stop
47 U+2044 E2 81 84 Fraction Slash
48 0 U+0030 30 Digit Zero
49 1 U+0031 31 Digit One
50 2 U+0032 32 Digit Two
51 3 U+0033 33 Digit Three
52 4 U+0034 34 Digit Four
53 5 U+0035 35 Digit Five
54 6 U+0036 36 Digit Six
55 7 U+0037 37 Digit Seven
56 8 U+0038 38 Digit Eight
57 9 U+0039 39 Digit Nine
58 U+A789 EA 9E 89 Modifier Letter Colon
59 ; U+037E CD BE Greek Question Mark
60 < U+003C 3C Less-Than Sign
61 U+A78A EA 9E 8A Modifier Letter Short Equals Sign
62 > U+003E 3E Greater-Than Sign
63 Ɂ U+0241 C9 81 Latin Capital Letter Glottal Stop
64 @ U+0040 40 Commercial At
65 A U+0041 41 Latin Capital Letter A
66 B U+0042 42 Latin Capital Letter B
67 C U+0043 43 Latin Capital Letter C
68 D U+0044 44 Latin Capital Letter D
69 E U+0045 45 Latin Capital Letter E
70 F U+0046 46 Latin Capital Letter F
71 G U+0047 47 Latin Capital Letter G
72 H U+0048 48 Latin Capital Letter H
73 I U+0049 49 Latin Capital Letter I
74 J U+004A 4A Latin Capital Letter J
75 K U+004B 4B Latin Capital Letter K
76 L U+004C 4C Latin Capital Letter L
77 M U+004D 4D Latin Capital Letter M
78 N U+004E 4E Latin Capital Letter N
79 O U+004F 4F Latin Capital Letter O
80 P U+0050 50 Latin Capital Letter P
81 Q U+0051 51 Latin Capital Letter Q
82 R U+0052 52 Latin Capital Letter R
83 S U+0053 53 Latin Capital Letter S
84 T U+0054 54 Latin Capital Letter T
85 U U+0055 55 Latin Capital Letter U
86 V U+0056 56 Latin Capital Letter V
87 W U+0057 57 Latin Capital Letter W
88 X U+0058 58 Latin Capital Letter X
89 Y U+0059 59 Latin Capital Letter Y
90 Z U+005A 5A Latin Capital Letter Z
91 U+27E6 E2 9F A6 Mathematical Left White Square Bracket
92 U+29F7 E2 A7 B7 Reverse Solidus With Horizontal Stroke
93 U+27E7 E2 9F A7 Mathematical Right White Square Bracket
94 ^ U+005E 5E Circumflex Accent
95 _ U+005F 5F Low Line
96 ˋ U+02CB CB 8B Modifier Letter Grave Accent
97 a U+0061 61 Latin Small Letter A
98 b U+0062 62 Latin Small Letter B
99 c U+0063 63 Latin Small Letter C
100 d U+0064 64 Latin Small Letter D
101 e U+0065 65 Latin Small Letter E
102 f U+0066 66 Latin Small Letter F
103 g U+0067 67 Latin Small Letter G
104 h U+0068 68 Latin Small Letter H
105 i U+0069 69 Latin Small Letter I
106 j U+006A 6A Latin Small Letter J
107 k U+006B 6B Latin Small Letter K
108 l U+006C 6C Latin Small Letter L
109 m U+006D 6D Latin Small Letter M
110 n U+006E 6E Latin Small Letter N
111 o U+006F 6F Latin Small Letter O
112 p U+0070 70 Latin Small Letter P
113 q U+0071 71 Latin Small Letter Q
114 r U+0072 72 Latin Small Letter R
115 s U+0073 73 Latin Small Letter S
116 t U+0074 74 Latin Small Letter T
117 u U+0075 75 Latin Small Letter U
118 v U+0076 76 Latin Small Letter V
119 w U+0077 77 Latin Small Letter W
120 x U+0078 78 Latin Small Letter X
121 y U+0079 79 Latin Small Letter Y
122 z U+007A 7A Latin Small Letter Z
123 U+2774 E2 9D B4 Medium Left Curly Bracket Ornament
124 U+2223 E2 88 A3 Divides
125 U+2775 E2 9D B5 Medium Right Curly Bracket Ornament
126 ˜ U+02DC CB 9C Small Tilde
127 U+2326 E2 8C A6 Erase To The Right
128 ă U+0103 C4 83 Latin Small Letter A With Breve
129 Ă U+0102 C4 82 Latin Capital Letter A With Breve
130 Ǎ U+01CD C7 8D Latin Capital Letter A With Caron
131 ǟ U+01DF C7 9F Latin Small Letter A With Diaeresis And Macron
132 Ǟ U+01DE C7 9E Latin Capital Letter A With Diaeresis And Macron
133 ȧ U+0227 C8 A7 Latin Small Letter A With Dot Above
134 Ȧ U+0226 C8 A6 Latin Capital Letter A With Dot Above
135 ǡ U+01E1 C7 A1 Latin Small Letter A With Dot Above And Macron
136 ƀ U+0180 C6 80 Latin Small Letter B With Stroke
137 Ƀ U+0243 C9 83 Latin Capital Letter B With Stroke
138 Ɓ U+0181 C6 81 Latin Capital Letter B With Hook
139 ƃ U+0183 C6 83 Latin Small Letter B With Topbar
140 Ƃ U+0182 C6 82 Latin Capital Letter B With Topbar
141 ć U+0107 C4 87 Latin Small Letter C With Acute
142 Ć U+0106 C4 86 Latin Capital Letter C With Acute
143 ĉ U+0109 C4 89 Latin Small Letter C With Circumflex
144 Ĉ U+0108 C4 88 Latin Capital Letter C With Circumflex
145 č U+010D C4 8D Latin Small Letter C With Caron
146 Č U+010C C4 8C Latin Capital Letter C With Caron
147 ċ U+010B C4 8B Latin Small Letter C With Dot Above
148 Ċ U+010A C4 8A Latin Capital Letter C With Dot Above
149 ď U+010F C4 8F Latin Small Letter D With Caron
150 Ď U+010E C4 8E Latin Capital Letter D With Caron
151 Đ U+0110 C4 90 Latin Capital Letter D With Stroke
152 ȸ U+0238 C8 B8 Latin Small Letter Db Digraph
153 Ɗ U+018A C6 8A Latin Capital Letter D With Hook
154 ƌ U+018C C6 8C Latin Small Letter D With Topbar
155 Ƌ U+018B C6 8B Latin Capital Letter D With Topbar
156 ȡ U+0221 C8 A1 Latin Small Letter D With Curl
157 ĕ U+0115 C4 95 Latin Small Letter E With Breve
158 Ĕ U+0114 C4 94 Latin Capital Letter E With Breve
159 Ě U+011A C4 9A Latin Capital Letter E With Caron
160 ė U+0117 C4 97 Latin Small Letter E With Dot Above
161 ȩ U+0229 C8 A9 Latin Small Letter E With Cedilla
162 Ȩ U+0228 C8 A8 Latin Capital Letter E With Cedilla
163 ƒ U+0192 C6 92 Latin Small Letter F With Hook
164 Ƒ U+0191 C6 91 Latin Capital Letter F With Hook
165 ǵ U+01F5 C7 B5 Latin Small Letter G With Acute
166 Ǵ U+01F4 C7 B4 Latin Capital Letter G With Acute
167 ğ U+011F C4 9F Latin Small Letter G With Breve
168 Ğ U+011E C4 9E Latin Capital Letter G With Breve
169 ǧ U+01E7 C7 A7 Latin Small Letter G With Caron
170 Ǧ U+01E6 C7 A6 Latin Capital Letter G With Caron
171 U+1E21 E1 B8 A1 Latin Small Letter G With Macron
172 U+1E20 E1 B8 A0 Latin Capital Letter G With Macron
173 ĥ U+0125 C4 A5 Latin Small Letter H With Circumflex
174 Ĥ U+0124 C4 A4 Latin Capital Letter H With Circumflex
175 ȟ U+021F C8 9F Latin Small Letter H With Caron
176 Ȟ U+021E C8 9E Latin Capital Letter H With Caron
177 ƕ U+0195 C6 95 Latin Small Letter Hv
178 Ƕ U+01F6 C7 B6 Latin Capital Letter Hwair
179 ĭ U+012D C4 AD Latin Small Letter I With Breve
180 Ĭ U+012C C4 AC Latin Capital Letter I With Breve
181 Ǐ U+01CF C7 8F Latin Capital Letter I With Caron
182 İ U+0130 C4 B0 Latin Capital Letter I With Dot Above
183 ȉ U+0209 C8 89 Latin Small Letter I With Double Grave
184 ȋ U+020B C8 8B Latin Small Letter I With Inverted Breve
185 ĵ U+0135 C4 B5 Latin Small Letter J With Circumflex
186 Ĵ U+0134 C4 B4 Latin Capital Letter J With Circumflex
187 ǰ U+01F0 C7 B0 Latin Small Letter J With Caron
188 ǩ U+01E9 C7 A9 Latin Small Letter K With Caron
189 Ǩ U+01E8 C7 A8 Latin Capital Letter K With Caron
190 ķ U+0137 C4 B7 Latin Small Letter K With Cedilla
191 Ķ U+0136 C4 B6 Latin Capital Letter K With Cedilla
192 ƙ U+0199 C6 99 Latin Small Letter K With Hook
193 Ƙ U+0198 C6 98 Latin Capital Letter K With Hook
194 ĺ U+013A C4 BA Latin Small Letter L With Acute
195 Ĺ U+0139 C4 B9 Latin Capital Letter L With Acute
196 ľ U+013E C4 BE Latin Small Letter L With Caron
197 Ľ U+013D C4 BD Latin Capital Letter L With Caron
198 ƚ U+019A C6 9A Latin Small Letter L With Bar
199 Ƚ U+023D C8 BD Latin Capital Letter L With Bar
200 Ń U+0143 C5 83 Latin Capital Letter N With Acute
201 ǹ U+01F9 C7 B9 Latin Small Letter N With Grave
202 Ň U+0147 C5 87 Latin Capital Letter N With Caron
203 ņ U+0146 C5 86 Latin Small Letter N With Cedilla
204 Ņ U+0145 C5 85 Latin Capital Letter N With Cedilla
205 ȵ U+0235 C8 B5 Latin Small Letter N With Curl
206 ŏ U+014F C5 8F Latin Small Letter O With Breve
207 Ŏ U+014E C5 8E Latin Capital Letter O With Breve
208 Ǒ U+01D1 C7 91 Latin Capital Letter O With Caron
209 ȫ U+022B C8 AB Latin Small Letter O With Diaeresis And Macron
210 Ȫ U+022A C8 AA Latin Capital Letter O With Diaeresis And Macron
211 ȱ U+0231 C8 B1 Latin Small Letter O With Dot Above And Macron
212 ƥ U+01A5 C6 A5 Latin Small Letter P With Hook
213 Ƥ U+01A4 C6 A4 Latin Capital Letter P With Hook
214 ȹ U+0239 C8 B9 Latin Small Letter Qp Digraph
215 ɋ U+024B C9 8B Latin Small Letter Q With Hook Tail
216 ŕ U+0155 C5 95 Latin Small Letter R With Acute
217 Ŕ U+0154 C5 94 Latin Capital Letter R With Acute
218 ř U+0159 C5 99 Latin Small Letter R With Caron
219 Ř U+0158 C5 98 Latin Capital Letter R With Caron
220 ŗ U+0157 C5 97 Latin Small Letter R With Cedilla
221 Ŗ U+0156 C5 96 Latin Capital Letter R With Cedilla
222 ś U+015B C5 9B Latin Small Letter S With Acute
223 Ś U+015A C5 9A Latin Capital Letter S With Acute
224 š U+0161 C5 A1 Latin Small Letter S With Caron
225 Š U+0160 C5 A0 Latin Capital Letter S With Caron
226 ş U+015F C5 9F Latin Small Letter S With Cedilla
227 Ş U+015E C5 9E Latin Capital Letter S With Cedilla
228 ť U+0165 C5 A5 Latin Small Letter T With Caron
229 Ť U+0164 C5 A4 Latin Capital Letter T With Caron
230 ţ U+0163 C5 A3 Latin Small Letter T With Cedilla
231 Ţ U+0162 C5 A2 Latin Capital Letter T With Cedilla
232 ț U+021B C8 9B Latin Small Letter T With Comma Below
233 Ț U+021A C8 9A Latin Capital Letter T With Comma Below
234 ŭ U+016D C5 AD Latin Small Letter U With Breve
235 Ŭ U+016C C5 AC Latin Capital Letter U With Breve
236 Ǔ U+01D3 C7 93 Latin Capital Letter U With Caron
237 ű U+0171 C5 B1 Latin Small Letter U With Double Acute
238 ȕ U+0215 C8 95 Latin Small Letter U With Double Grave
239 Ʉ U+0244 C9 84 Latin Capital Letter U Bar
240 U+1E7E E1 B9 BE Latin Capital Letter V With Dot Below
241 Ʋ U+01B2 C6 B2 Latin Capital Letter V With Hook
242 ŵ U+0175 C5 B5 Latin Small Letter W With Circumflex
243 Ŵ U+0174 C5 B4 Latin Capital Letter W With Circumflex
244 ŷ U+0177 C5 B7 Latin Small Letter Y With Circumflex
245 Ŷ U+0176 C5 B6 Latin Capital Letter Y With Circumflex
246 Ÿ U+0178 C5 B8 Latin Capital Letter Y With Diaeresis
247 ȳ U+0233 C8 B3 Latin Small Letter Y With Macron
248 ƴ U+01B4 C6 B4 Latin Small Letter Y With Hook
249 Ƴ U+01B3 C6 B3 Latin Capital Letter Y With Hook
250 ź U+017A C5 BA Latin Small Letter Z With Acute
251 Ź U+0179 C5 B9 Latin Capital Letter Z With Acute
252 ž U+017E C5 BE Latin Small Letter Z With Caron
253 Ž U+017D C5 BD Latin Capital Letter Z With Caron
254 ż U+017C C5 BC Latin Small Letter Z With Dot Above
255 Ż U+017B C5 BB Latin Capital Letter Z With Dot Above

Bytes 33-126 (printable ASCII, except 34, 39, and 92) reuse their literal glyphs.

The canonical list for bytes 128-255 lives in character_map.txt; the web UI mirrors it in docs/character_map.txt. We order the high bytes alphabetically (all A/a glyphs, then B/b, and so on) so neighbouring values are visually related. After editing the map, run python3 utils/audit_character_map.py character_map.txt and regenerate CHARACTER_WIDTHS.md to keep these docs fresh.

Running Tests

The project includes three types of test suites:

Deterministic Unit Tests

These tests validate basic functionality and expected behavior:

./test

Non-deterministic Fuzz Tests

These tests run randomized inputs to verify robustness:

./fuzz_test

Performance Benchmark Tests

These tests measure encoding and decoding performance:

./benchmark_test

Running All Tests

To run all test suites at once:

./test_all

Utilities

The project includes several utility scripts in the utils/ directory:

  • xxhash32: Standard XXH32 hash utility (supports binary/hex/encoded output)
  • prng: Deterministic pseudo-random number generator using XXH32 (supports seeded and auto-seeded generation)

Requirements

LuaJIT Implementation

  • LuaJIT (tested with LuaJIT 2.0.5)

C Implementation

  • C99-compatible compiler (GCC, Clang)
  • Standard C library

Optional Dependencies (for disassembly features)

  • cstool (Capstone disassembly engine) for raw disassembly (-a/--asm)
  • objdump for smart disassembly (--smart-asm)

Build

# Build C implementation
make

# Both implementations are included:
# ./printable_binary (LuaJIT script)
# ./bin/printable_binary_c (compiled C binary)

Nix Development Environment

If you're using Nix, the included flake.nix provides a full development shell:

nix develop        # drops you into a shell with gcc/clang, LuaJIT, Deno, etc.
nix build          # builds the optimized C binary via the default package output

The shell hook lists the major tools (compilers, debuggers, benchmarking utilities) that are available. This is the easiest way to ensure all optional dependencies—such as LuaJIT for the script version and Deno/Node tooling for the JS implementation—are present.

Implementation Details

Algorithm Overview

For encoding:

  1. Each byte of the input binary data is processed individually
  2. The byte value (0-255) is used as a key to look up the corresponding UTF-8 representation
  3. The encoded representations are concatenated to form the output string

For decoding:

  1. The input string is processed from left to right
  2. At each position, the decoder attempts to match the longest possible UTF-8 sequence (3, 2, or 1 bytes)
  3. When a match is found, the corresponding byte value is output
  4. This continues until the entire input is processed

UTF-8 Encoding Strategy

This implementation uses a carefully chosen set of UTF-8 characters to represent each possible byte value:

  • Control characters (0-31) use visually distinct symbols, primarily from Unicode blocks like Mathematical Symbols, Arrows, and Latin Extended
  • Standard printable ASCII characters (33-126, except ", ', and \) remain themselves
  • Special characters (space, double quote, single quote, backslash) get more visible representations
  • Extended bytes (128-255) are driven by character_map.txt and ordered alphabetically to keep neighbouring glyphs visually related

Encoding/Decoding Maps

The implementation builds two lookup tables at initialization:

  • encode_map: Maps byte values (0-255) to their UTF-8 string representations
  • decode_map: Maps UTF-8 string representations back to byte values

These bidirectional maps ensure efficient and accurate conversion in both directions.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A serializer/deserializer for working directly with binary data in a way that is still human-readable (ASCII -> ASCII, sensitive chars -> similar-looking UTF-8, control chars -> semantically-relevant glyphs), but don't want to break your terminal/editor. Includes C, LuaJIT and Node.js implementations. Full test coverage. Decompilation option.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •