Skip to content

mrname5/bytesight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bytesight

bytesight demo

What you see is not what's there. A hidden carriage return overwrites the screen but the bytes still execute.

Purpose:

bytesight scans files at the byte level for invisible, misleading, or deceptive Unicode content. If it says clean, every byte is tab, newline, or printable ASCII.

Built against UNICODE 17.0.0 (https://www.unicode.org/versions/Unicode17.0.0/)

Usage

Scan files:

bytesight src/lib.rs src/main.rs

Scan a directory

bytesight -r src/

Scan Clipboard

Linux

wl-paste -n | bytesight -

MacOs

Coming soon

Windows

Coming soon

bytesight development info

See dev-info.md

grep verification (Linux):

grep -Pna '[^\x09\x0a\x20-\x7e]'

Build

Download

Github

git clone https://github.com/mrname5/bytesight.git
cd bytesight
cargo build --release

crates.io (coming soon)

Linux (Coming soon)

Sha256sum: Install:

Windows (Coming soon)

Sha256sum: Install:

Options

--windows        Allow end-of-line \r\n (Windows line endings)
--tab-width N    Set tab expansion width, 1-16 (default: 8)
--wide-line N    Set wide line threshold in columns, 1-10000 (default: 500)
-q, --quiet      Suppress output; exit code only (0=clean, 1=issues, 2=error)
-r, --recursive  Recursively scan directories
-V, --version    Print version and exit
-h, --help       Show help

Exit codes

0    All files clean
1    Issues found
2    Error (cannot read file, bad arguments, etc.)

What it catches

bytesight flags anything outside the printable ASCII set plus tab and newline. Specific categories get specific warnings:

  • Invisible characters (zero-width spaces, bidi overrides, soft hyphens)
  • Trojan Source attack vectors (CVE-2021-42574, CVE-2021-42694)
  • Fake spaces (NBSP, em space, ideographic space, 13 others)
  • Homoglyphs (Cyrillic, Greek, fullwidth ASCII, math symbols)
  • Dangerous control characters (NUL, ESC, backspace)
  • Mid-line carriage returns (hides preceding text in terminal)
  • C1 terminal control codes (U+0080-U+009F)
  • Combining marks and variation selectors
  • Invalid UTF-8 byte sequences
  • Unicode noncharacters
  • Content past edge of editor hidden when wrap text off

bytesight demos

Files in the demo directory demonstrate real attack vectors that bytesight detects. All files are demonstrations only -- no actual malicious payloads.

Files

cr-attack.js -- Mid-line carriage return

The file contains a hidden command. When displayed in a terminal with cat, the carriage return (0x0D) moves the cursor back to the start of the line. Everything before the CR is overwritten on screen by everything after it. The hidden command still executes.

homoglyph.rs -- Cyrillic lookalike variable

The file contains two variables that look identical: admin (Latin) and a second one starting with Cyrillic 'a' (U+0430). To a reviewer they are the same word. To the compiler they are different variables. The function returns the wrong one.

invisible.js -- Zero-width space in function name

The file contains two functions with visually identical names. One is validateInput and the other contains a zero-width space (U+200B) making it validate[invisible]Input. A reviewer sees one function. The code calls the malicious copy.

clean.rs -- Normal file

A normal Rust file with no hidden content. bytesight reports it clean.

Useful tools

Parsing of UNICODE in NODEJS

Prints whatever HEX is given

String.fromCodePoint(97)

prints UNICODE number of charcter

"a".codePointAt(0)

AI Use

Claude used to generate most of the code with lots of input and guidance from user.

Chatgpt and Google Gemini used to verify.

General human review on whole codebase. Specifics of the ASCII and Unicode ranges have not been fully verified yet, only generally reviewed. However, the code logic, argument handling, and general behaviour have been verified via human review.

License

See LICENSE file.

Sources

ASCII:

Unicode:

UTF-8 encoding:

C1 control characters:

For verifying combining marks specifically:

  • UnicodeData.txt, column 3 (General_Category): values Mn (nonspacing mark), Mc (spacing mark), Me (enclosing mark) are the combining marks

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages