A detect encoding utility. juniversalchardet wrapper for Clojure.
Clojure
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src/det_enc
test
.gitignore
README.md
epl-v10.html
project.clj

README.md

clj-det-enc

clj-det-enc is a encoding detector using juniversalchardet java library.

Usage

(require '[det-enc.core :as det])

Usage: (det/detect target)

(det/detect "utf8.txt")
;=> "UTF-8"
(det/detect "unknown.txt")
;=> nil

Usage: (det/detect target encodingname-when-unknown)

(det/detect "unknown.txt" "EUC-JP")
;=> "EUC-JP"
(det/detect "unknown.txt" :default)
;=> "SHIFT_JIS"

return:
encoding name or nil when target encoding cannot be detected.
target:
Whatever clojure.java.io/input-stream can deal with.
(File, filename(String), InputStream, BufferedStream etc)
Target stream is closed automatically.
encodingname-when-unknown:
Return this value when target encoding cannot be detected.

  • :default means the default charset of your Java virtual machine.

What encodings can be detected? See juniversalchardet

Installation

leiningen

[clj-det-enc "1.0.0"]