Forked version of chardet
Python Perl
Switch branches/tags
Nothing to show
Pull request Compare This branch is 15 commits ahead of dcramer:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
chardet
docs
tests
.gitignore
COPYING
MANIFEST.in
README.rst
setup.cfg
setup.py
test.py

README.rst

chardet

chardet guesses the encoding of text files.

Detects...

  • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
  • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
  • EUC-KR, ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
  • ISO-8859-2, windows-1250 (Hungarian)
  • ISO-8859-5, windows-1251 (Bulgarian)
  • windows-1252 (English)
  • ISO-8859-7, windows-1253 (Greek)
  • ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Requires Python 2.3 or later.

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect.py somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0