Skip to content

spencermountain/out-of-character

Repository files navigation

remove invisible unicode characters from a text
npm install out-of-character

Unicode has a few-dozen characters that do not render anything, on purpose.

This is cool for cultural idiosyncracies in historical languages. More often though, their use is unintentional (or nefarious!), and these characters end-up causing problems parsing text formats.

• these are sometimes called 'zero-width', 'ignorable', or 'tag-characters'

This library helps spot and remove these funboys, before they cause some trouble.

Please remember that some text is meant to have Khmer-vowels, or Kaithi-alphabet characters.

image

CLI

npm install -g out-of-character

detect invisible characters in all files in a directory

out-of-character ./path/to/dir

remove them from all files in a directory

out-of-character ./path/to/dir --replace

detect invisible characters in a file

out-of-character ./path/to/file.txt

remove invisible characters from a file

out-of-character ./path/to/file.txt --replace

Javascript API

import {detect, replace} from 'out-of-character'

let str='noth­ing s͏neak឵y h᠎ere' //actually, there is.
console.log(detect(str))
/*  😮  😮  😮
[
  {
    name: 'KHMER VOWEL INHERENT AA',
    code: 'U+17B5',
    offset: 15,
    replacement: ''
  },
  {
    name: 'MONGOLIAN VOWEL SEPARATOR',
    code: 'U+180E',
    offset: 19,
    replacement: ''
  }
]*/

// get rid of them!
let after = replace(str)
console.log(str !== after)
// true

fixing/detecting in files can be done like:

const fs = require('fs')
const {detect, replace} = require('out-of-character')

let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.

// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))

// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.

Thank you to character.construction/blanks by Jan Lelis

and a tale of characters in Unicode by Stefan Judis

See also

MIT