Character encoding auto-detection in JavaScript (port of python's chardet)
JavaScript Python
Switch branches/tags
Nothing to show
Pull request Compare This branch is 1 commit ahead, 50 commits behind aadsm:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
builder
dist
src
tests
README

README

JsChardet
=========

Port of python's charset (http://chardet.feedparser.org/).

How To Use It
-------------

    >// "àíàçã" in UTF-8
    >jschardet.detect("\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3")
    {encoding: "utf-8", confidence: 0.9690625}

    >// "次常用國字標準字體表" in Big5 
    >jschardet.detect("\xa6\xb8\xb1\x60\xa5\xce\xb0\xea\xa6\x72\xbc\xd0\xb7\xc7\xa6\x72\xc5\xe9\xaa\xed")
    {encoding: "Big5", confidence: 0.99}

Supported Charsets
------------------

* Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
* EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
* EUC-KR and ISO-2022-KR (Korean)
* KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
* ISO-8859-2 and windows-1250 (Hungarian)
* ISO-8859-5 and windows-1251 (Bulgarian)
* windows-1252
* ISO-8859-7 and windows-1253 (Greek)
* ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
* TIS-620 (Thai)
* UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
* UTF-16 BE or LE (with a BOM)
* UTF-8 (with or without a BOM)
* ASCII

Technical Information
---------------------

I haven't been able to create tests to correctly detect:

* ISO-2022-CN
* windows-1250 in Hungarian
* windows-1251 in Bulgarian
* windows-1253 in Greek
* EUC-CN

A one-file minimized version is missing.

Authors
-------

Ported from python to JavaScript by António Afonso (antonio.afonso gmail.com)