Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

BSD licensed charset/encoding converter library with more functionalities than libiconv

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 addons
Octocat-spinner-32 codecs
Octocat-spinner-32 src
Octocat-spinner-32 tools
Octocat-spinner-32 Changelog
Octocat-spinner-32 Makefile
Octocat-spinner-32 Makefile.win
Octocat-spinner-32 README
README
Online DEMO:
http://security-hole.info/~buganini/bsdconv.php

--
Table format:
from<tab>to

Internal encoding:
* Should be as unicoded as possible
* BSDCONV special chars are prefixed with 00
  * BOM is 0000
* UNICODE is prefixed with 01
* CNS11643 is prefixed with 02
* BYTE is prefixed with 03
* Chinese components is prefixed with 04
* ANSI control sequence is prefixed with 1b

NOTICE:
Though many charset encoding is ascii-compatible,
 ascii is excluded from their codecs to provide flexibility.

--
Compiling & Installation (FreeBSD):
make
sudo make install

--
Run:
Convert traditional chinese big5 to simplified chinese utf-8
> bsdconv big5,ascii:zhcn:utf-8,ascii in.txt out.txt

Convert traditional chinese utf-8 to simplified chinese GB2312 with transliteration
> bsdconv utf-8,ascii:zhcn:cp936,cp936_trans,ascii in.txt out.txt

Convert simplified chiese to traditional chinese
> bsdconv ascii,utf-8:zhtw:zhtw_words:utf-8,ascii
ignoring whitespaces mixed in words
> bsdconv utf-8,ascii:whitespace-derail:zhtw:zhtw_words:whitespace-rerail:utf-8,ascii

Convert big5 data, traditional chinese to simplified chinese, 
CRLF/CR/LF to CRLF, to big5 data, translate simplified chinese words, which are
not in big5, to HTML entities, and uppercase the ascii characters.
> bsdconv big5,ascii:zhcn:win:upper:cp950,ascii,htmlentity in.txt out.txt

Very useful for migrating MySQL DB from Big5 to UTF-8
> bsdconv htmlentity,big5-5c,big5,ascii:utf-8,ascii in.sql out.sql

Recover from mis-decoding/encoding (mistreated iso-8859-2 as iso-8859-1 and converted to utf-8)
> bsdconv 'ascii,utf-8:iso-8859-1,ascii|ascii,iso-8859-2:utf-8,ascii'

Decode javasript escaped data (byte/unicode mixed) like %u9644%20
> bsdconv 'escape,byte:unicode,byte|skip,ascii:utf-8,ascii'


More example:
> bsdconv big5,ascii:nl2br:ascii,html-img in.txt out.htm
#html-img is actually ASCII-HTML-UNICODE-IMG

> bsdconv ascii,utf-8:ascii,ascii-html-cns11643-img in.txt out.htm
#if you prefer to use glyph image from http://www.cns11643.gov.tw

Maintain inter map:
> bsdconv bsdconv_keyword,bsdconv:bsdconv_keyword,utf-8,ascii MAP.txt edit.tmp
> vi edit.tmp
> bsdconv bsdconv_keyword,ascii,utf-8:bsdconv_keyword,bsdconv edit.tmp MAP.txt

--
WINDOWS:
http://github.com/buganini/bsdconv/downloads
Download and extract it, then goto build/ and run mk_table.bat
then copy everythings in build/ to c:\bsdconv\

--
If you want to install to directory other than default path
set BSDCONV_PATH environment variable to your path
Something went wrong with that request. Please try again.