bytesight

What you see is not what's there. A hidden carriage return overwrites the screen but the bytes still execute.

Purpose:

bytesight scans files at the byte level for invisible, misleading, or deceptive Unicode content. If it says clean, every byte is tab, newline, or printable ASCII.

Built against UNICODE 17.0.0 (https://www.unicode.org/versions/Unicode17.0.0/)

Usage

Scan files:

bytesight src/lib.rs src/main.rs

Scan a directory

bytesight -r src/

Scan Clipboard

Linux

wl-paste -n | bytesight -

MacOs

Coming soon

Windows

Coming soon

bytesight development info

See dev-info.md

grep verification (Linux):

grep -Pna '[^\x09\x0a\x20-\x7e]'

Build

Download

Github

git clone https://github.com/mrname5/bytesight.git
cd bytesight
cargo build --release

crates.io (coming soon)

Linux (Coming soon)

Sha256sum: Install:

Windows (Coming soon)

Sha256sum: Install:

Options

--windows        Allow end-of-line \r\n (Windows line endings)
--tab-width N    Set tab expansion width, 1-16 (default: 8)
--wide-line N    Set wide line threshold in columns, 1-10000 (default: 500)
-q, --quiet      Suppress output; exit code only (0=clean, 1=issues, 2=error)
-r, --recursive  Recursively scan directories
-V, --version    Print version and exit
-h, --help       Show help

Exit codes

0    All files clean
1    Issues found
2    Error (cannot read file, bad arguments, etc.)

What it catches

bytesight flags anything outside the printable ASCII set plus tab and newline. Specific categories get specific warnings:

Invisible characters (zero-width spaces, bidi overrides, soft hyphens)
Trojan Source attack vectors (CVE-2021-42574, CVE-2021-42694)
Fake spaces (NBSP, em space, ideographic space, 13 others)
Homoglyphs (Cyrillic, Greek, fullwidth ASCII, math symbols)
Dangerous control characters (NUL, ESC, backspace)
Mid-line carriage returns (hides preceding text in terminal)
C1 terminal control codes (U+0080-U+009F)
Combining marks and variation selectors
Invalid UTF-8 byte sequences
Unicode noncharacters
Content past edge of editor hidden when wrap text off

bytesight demos

Files in the demo directory demonstrate real attack vectors that bytesight detects. All files are demonstrations only -- no actual malicious payloads.

Files

cr-attack.js -- Mid-line carriage return

The file contains a hidden command. When displayed in a terminal with cat, the carriage return (0x0D) moves the cursor back to the start of the line. Everything before the CR is overwritten on screen by everything after it. The hidden command still executes.

homoglyph.rs -- Cyrillic lookalike variable

The file contains two variables that look identical: admin (Latin) and a second one starting with Cyrillic 'a' (U+0430). To a reviewer they are the same word. To the compiler they are different variables. The function returns the wrong one.

invisible.js -- Zero-width space in function name

The file contains two functions with visually identical names. One is validateInput and the other contains a zero-width space (U+200B) making it validate[invisible]Input. A reviewer sees one function. The code calls the malicious copy.

clean.rs -- Normal file

A normal Rust file with no hidden content. bytesight reports it clean.

Useful tools

Parsing of UNICODE in NODEJS

Prints whatever HEX is given

String.fromCodePoint(97)

prints UNICODE number of charcter

"a".codePointAt(0)

AI Use

Claude used to generate most of the code with lots of input and guidance from user.

Chatgpt and Google Gemini used to verify.

General human review on whole codebase. Specifics of the ASCII and Unicode ranges have not been fully verified yet, only generally reviewed. However, the code logic, argument handling, and general behaviour have been verified via human review.

License

See LICENSE file.

Sources

ASCII:

ANSI X3.4-1986 (the original American standard)
ECMA-6 (international equivalent): https://www.ecma-international.org/publications-and-standards/standards/ecma-6/

Unicode:

The Unicode Standard, full text: https://www.unicode.org/releases/
Unicode Character Database (every codepoint's properties): https://www.unicode.org/Public/UCD/latest/ucd/
UnicodeData.txt (the machine-readable master list): https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
Code charts (visual PDF per block): https://www.unicode.org/charts/
PropList.txt (properties like Noncharacter): https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt

UTF-8 encoding:

RFC 3629: https://www.rfc-editor.org/rfc/rfc3629

C1 control characters:

ECMA-48 (defines all 32 C1 codes): https://ecma-international.org/publications-and-standards/standards/ecma-48/

For verifying combining marks specifically:

UnicodeData.txt, column 3 (General_Category): values Mn (nonspacing mark), Mc (spacing mark), Me (enclosing mark) are the combining marks

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
demos		demos
src		src
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
dev-info.md		dev-info.md

Folders and files

Latest commit

History

Repository files navigation

bytesight

Purpose:

Usage

Scan files:

Scan a directory

Scan Clipboard

Linux

MacOs

Windows

bytesight development info

grep verification (Linux):

Build

Download

Github

crates.io (coming soon)

Linux (Coming soon)

Windows (Coming soon)

Options

Exit codes

What it catches

bytesight demos

Files

cr-attack.js -- Mid-line carriage return

homoglyph.rs -- Cyrillic lookalike variable

invisible.js -- Zero-width space in function name

clean.rs -- Normal file

Useful tools

Parsing of UNICODE in NODEJS

Prints whatever HEX is given

prints UNICODE number of charcter

AI Use

License

Sources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages