Skip to content

ella-to/sherlock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

░██████╗██╗░░██╗███████╗██████╗░██╗░░░░░░█████╗░░█████╗░██╗░░██╗
██╔════╝██║░░██║██╔════╝██╔══██╗██║░░░░░██╔══██╗██╔══██╗██║░██╔╝
╚█████╗░███████║█████╗░░██████╔╝██║░░░░░██║░░██║██║░░╚═╝█████═╝░
░╚═══██╗██╔══██║██╔══╝░░██╔══██╗██║░░░░░██║░░██║██║░░██╗██╔═██╗░
██████╔╝██║░░██║███████╗██║░░██║███████╗╚█████╔╝╚█████╔╝██║░╚██╗
╚═════╝░╚═╝░░╚═╝╚══════╝╚═╝░░╚═╝╚══════╝░╚════╝░░╚════╝░╚═╝░░╚═╝

Go Reference Go Report Card License: MIT

sherlock detects file types and extracts metadata from raw bytes — images, documents, archives, video, executables, and more.

Installation

go get ella.to/sherlock

Quick Start

data, _ := os.ReadFile("photo.jpg")

meta, err := sherlock.BytesDetect(data)
if err != nil {
    log.Fatal(err)
}

fmt.Println(meta["mime"])   // ["image/jpeg"]
fmt.Println(meta["width"])  // ["4032"]
fmt.Println(meta["height"]) // ["3024"]

API

There are two detection functions. Both return map[string][]string with the extracted metadata.

// From a byte slice
meta, err := sherlock.BytesDetect(data)

// From any io.Reader
meta, err := sherlock.ReaderDetect(reader)

Streaming Detection

For cases where data arrives in chunks (uploads, network streams), use StreamDetector. It implements io.WriteCloser, so you can write data to it incrementally.

detector := sherlock.NewStreamDetector()

// Write data as it arrives
detector.Write(chunk1)
detector.Write(chunk2)
detector.Write(chunk3)

// Get the result (closes the detector and returns metadata)
meta, err := detector.CloseAndResult()
fmt.Println(meta["mime"]) // ["text/csv"]

Results are cached — calling Result() multiple times after closing returns the same value without re-processing.

detector.Close()
meta1, _ := detector.Result()
meta2, _ := detector.Result() // same object, no extra work

Supported Formats

Images

PNG, JPEG, GIF, BMP, TIFF, WebP, HEIC/HEIF, PSD, SVG, ICO

Image metadata includes dimensions (width, height), and for JPEG/HEIC files with EXIF data: camera make/model, GPS coordinates, and timestamps.

Documents

PDF, DOCX, XLSX, PPTX, EPUB, OLE-based Office files (DOC, XLS, PPT)

PDF metadata includes version, encryption status, linearization, and approximate page count.

Archives

ZIP, GZIP, TAR

Archive metadata includes entry names, entry counts, and compressed/uncompressed sizes.

Video / Audio

MP4, QuickTime, HEIF containers, FLV, Ogg, FLAC, WebM

Video metadata includes container format, major brand, and compatible brands.

Executables

ELF (Linux), PE (Windows), Mach-O (macOS), DMG (Apple disk images), DOS COM

Executable metadata includes architecture bits, endianness, and format-specific details.

Other

CSV, shell scripts (via shebang detection), BitTorrent files

Metadata Keys

Every detection result includes these baseline fields:

Key Description
detector_version Version of the detection engine
size_bytes Size of the input data
mime Detected MIME type
type Primary type (e.g. image, video, application)
subtype Subtype (e.g. png, pdf, zip)

Additional keys depend on the file type:

Images: width, height, image_format, resolution, datetime, camera_make, camera_model, location_latitude, location_longitude

CSV: csv_rows, csv_columns, csv_consistent_columns, csv_header

PDF: pdf_version, pdf_encrypted, pdf_linearized, pdf_pages_approx

Archives: zip_entries, zip_entry_name, tar_entries_sampled, gzip_name

Executables: executable_format, architecture_bits, endianness

Video: video_major_brand, video_compatible_brand, video_container

WASM Support

Sherlock can be compiled to WebAssembly for use in browsers or WASI runtimes. See the examples directory for:

  • WASI — CLI tool that reads a file and outputs JSON metadata. Build with GOOS=wasip1 GOARCH=wasm.
  • Browser WASM — Exposes a sherlockDetectBase64() function to JavaScript. Build with GOOS=js GOARCH=wasm.

License

MIT — see LICENSE for details.

About

Detects file types and extracts metadata from raw bytes

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages