Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Browse files

Edited via GitHub

  • Loading branch information...
commit d3d0d94fb2499fa6a97752954f340459e9bee8c7 1 parent 7effa71
@edouard authored
Showing with 24 additions and 1 deletion.
  1. +24 −1
@@ -3,7 +3,7 @@ rchardet
rchardet is an encoding auto-detection library in Ruby. This library is a port of the auto-detection code in Mozilla. It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key.
-This fork is compatible with ruby 1.9, and runs in production at []( Here’s an [introductory blog post to our encoding detection strategy](
+This fork is compatible with ruby 1.9, and runs in production at [](
@@ -17,6 +17,29 @@ encoding = cd['encoding']
confidence = cd['confidence'] # 0.0 <= confidence <= 1.0
+Encoding Detection Strategy
+rchardet isn’t a very reliable tool to determine a file encoding and should be used as the last resort. There are plenty of ways to detect a file’s encoding before having to use rchardet. For instance, by reading and detecting the [BOM](, or by looking for hints in the text you’re working on (for instance, don’t headers or footers have `charset="utf-8` somewhere?
+You can read an [introductory blog post to our encoding detection strategy](
+I suggest you open your file to detect in `ASCII-8BIT`.
+``` ruby
+file_content = open(self.file_path, external_encoding: 'ASCII-8BIT') { |f| }
+encoding = CharDet.detect(file_content)
+You don’t know what’s your file’s encoding just yet, so in which encoding will you open your file? Ruby defines the encoding `ASCII-8BIT`, with an alias of `BINARY`, which does not correspond to any known encoding. It is intended to be associated with binary data or for text of unknown encoding.
+Once you’ve detected the encoding you can then convert it:
+``` ruby
+converter =[:encoding].name.upcase, "UTF-8")
Running tests

0 comments on commit d3d0d94

Please sign in to comment.
Something went wrong with that request. Please try again.