@halfmoonai/cleanfile

Lossless file metadata parser & cleaner for the browser.

Strip EXIF, GPS, author info and other privacy-leaking metadata from images, PDFs, audio, video, Office documents (DOCX/XLSX/PPTX), and ZIP archives — entirely client-side, no server upload.

Supported Formats

Category	Parse	Clean	Formats
Image	✅	✅	JPEG, PNG, WebP, HEIC, AVIF, SVG, GIF, TIFF, BMP, ICO
PDF	✅	✅	PDF
Audio	✅	✅	MP3 (ID3v1/ID3v2), WAV, FLAC, OGG Vorbis/Opus, M4A
Video	✅	✅	MP4, MOV (ISO BMFF)
Office	✅	✅	DOCX, XLSX, PPTX
ZIP	✅	✅	ZIP

Install

npm install @halfmoonai/cleanfile
# or
yarn add @halfmoonai/cleanfile
# or
pnpm add @halfmoonai/cleanfile

Usage

import {
  detectFile,
  parseImageMetadata,
  cleanImage,
  downloadBlob,
  cleanFilename,
} from '@halfmoonai/cleanfile'

// Detect file type
const file: File = /* from <input> or drag & drop */
const { category } = detectFile(file) // 'image' | 'pdf' | 'audio' | ...

// Parse metadata
const meta = await parseImageMetadata(file)
console.log(meta.hasGps, meta.latitude, meta.longitude)
console.log(meta.make, meta.model, meta.camera)

// Clean (strip all metadata, lossless)
const cleanedBlob = await cleanImage(file)
downloadBlob(cleanedBlob, cleanFilename(file.name))

API

File Detection

detectFile(file: File) → { file, category, mimeType, extension }

Image

parseImageMetadata(file: File) → Promise<ImageMetadata>
cleanImage(file: File) → Promise<Blob>

PDF

parsePdfMetadata(file: File) → Promise<PdfMetadata>
cleanPdf(file: File) → Promise<Blob>

Audio

parseAudioMetadata(file: File) → Promise<AudioMetadata>
cleanAudio(file: File) → Promise<Blob>

Video

parseVideoMetadata(file: File) → Promise<VideoMetadata>
cleanVideo(file: File) → Promise<Blob>

Office (DOCX / XLSX / PPTX)

parseWordMetadata(file: File) → Promise<WordMetadata>
cleanWord(file: File) → Promise<Blob>

ZIP

parseZipMetadata(file: File) → Promise<ZipMetadata>
cleanZip(file: File) → Promise<Blob>

Utilities

downloadBlob(blob: Blob, filename: string) — trigger browser download
cleanFilename(name: string) → string — prefix with clean_

How It Works

All cleaning is lossless — no re-encoding, no quality loss:

JPEG: strips APP1/APP2/APP13 marker segments (EXIF, XMP, ICC, IPTC)
PNG: removes tEXt, iTXt, zTXt, eXIf, tIME chunks
WebP: removes EXIF/XMP RIFF chunks
HEIC/AVIF: neutralizes Exif items in ISO BMFF container
SVG: removes <metadata> elements and XML comments
PDF: clears Info dictionary (title, author, creator, producer, dates)
MP3: strips ID3v2 header and ID3v1 tail
WAV: removes LIST/INFO RIFF chunks
FLAC: replaces Vorbis Comment with empty block
OGG: replaces comment packet with empty Vorbis Comment
M4A: removes udta/meta atoms, zeroes mvhd timestamps
MP4/MOV: removes udta/meta atoms, zeroes mvhd/tkhd/mdhd timestamps
DOCX/XLSX/PPTX: clears core.xml and app.xml metadata, removes comments
ZIP: re-archives without comments, normalized timestamps

Development

# Install dependencies
yarn install

# Run tests
yarn test

# Build
yarn build

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@halfmoonai/cleanfile

Supported Formats

Install

Usage

API

File Detection

Image

PDF

Audio

Video

Office (DOCX / XLSX / PPTX)

ZIP

Utilities

How It Works

Development

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@halfmoonai/cleanfile

Supported Formats

Install

Usage

API

File Detection

Image

PDF

Audio

Video

Office (DOCX / XLSX / PPTX)

ZIP

Utilities

How It Works

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages