detect-character-encoding

Detect character encoding using ICU

Tip: If you don’t need ICU in particular, consider using ced, which is based on Google’s lighter compact_enc_det library.

Installation

$ npm install detect-character-encoding

detect-character-encoding is a C++ addon. Therefore, you may need to install various build tools. Check node-gyp’s readme for more information.

Usage

const fs = require('fs');
const detectCharacterEncoding = require('detect-character-encoding');

const fileBuffer = fs.readFileSync('file.txt');
const charsetMatch = detectCharacterEncoding(fileBuffer);

console.log(charsetMatch);
// {
//   encoding: 'UTF-8',
//   confidence: 60
// }

detect-character-encoding may return null if no charset matches.

Supported operating systems

macOS Sonoma
Ubuntu 22.04 and 20.04
Debian 12, 11, and 10

detect-character-encoding does not support 32-bit operating systems.

Supported character sets

As listed in ICU’s user guide:

UTF-8
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
Shift_JIS
ISO-2022-JP
ISO-2022-CN
ISO-2022-KR
GB18030
Big5
EUC-JP
EUC-KR
ISO-8859-1
ISO-8859-2
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
KOI8-R
IBM420
IBM424

License

detect-character-encoding is licensed under the BSD 2-clause license but includes third-party software under different licenses. See LICENSE.md for the full license text.

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
.github/workflows		.github/workflows
test		test
vendor/icu		vendor/icu
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
binding.gyp		binding.gyp
config.gypi		config.gypi
icuWrapper.cpp		icuWrapper.cpp
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

detect-character-encoding

Installation

Usage

Supported operating systems

Supported character sets

License

About

Uh oh!

Releases

Packages

Used by 325

Contributors 4

Uh oh!

Languages

License

sonicdoe/detect-character-encoding

Folders and files

Latest commit

History

Repository files navigation

detect-character-encoding

Installation

Usage

Supported operating systems

Supported character sets

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Used by 325

Contributors 4

Uh oh!

Languages

Packages