Add 56: "US-ASCII-8BIT"

janlelis · May 25, 2016 · a5cb71a · a5cb71a
1 parent 9438699
commit a5cb71a
Show file tree

Hide file tree

Showing 2 changed files with 42 additions and 1 deletion.
diff --git a/source/categories/index.html.erb b/source/categories/index.html.erb
@@ -78,9 +78,10 @@ title: Idiosyncratic Ruby - Browse by Category
         <li><a href="/43-new-ruby-startup.html">Ruby's Initial State</a></li>
         <li><a href="/45-constant-shuffle">Constant Re-Assignment</a></li>
         <li><a href="/50-naming-too-good.html">Identifiers to Avoid</a><li>
-        <li><a href="/52-constant-visibility.html">Private & Deprecated Constants</a></li>
+        <li><a href="/52-constant-visibility.html">Private &amp; Deprecated Constants</a></li>
         <li><a href="/54-try-converting.html"><code>.try_convert</code></a></li>
         <li><a href="/55-struggling-four-equality.html"><code>.equal?</code>, <code>eql?</code>, <code>==</code>, <code>===</code></a></li>
+        <li><a href="/56-us-ascii-8bit.html">`ASCII-8BIT` vs. `US-ASCII`</a></li>
       </ul>
 
       <h2>Miscellaneous</h2>

diff --git a/source/posts/56-us-ascii-8bit.html.md b/source/posts/56-us-ascii-8bit.html.md
@@ -0,0 +1,40 @@
+---
+title: US-ASCII-8BIT
+date: 2016-05-26
+tags: string, encoding, ascii
+---
+
+How comes that Ruby has two [ASCII](https://en.wikipedia.org/wiki/ASCII) encodings?
+
+ARTICLE
+
+    Encoding.name_list.grep(/ASCII/)
+    # => ["ASCII-8BIT", "US-ASCII"]
+
+Which one is the *normal* one you should use for ASCII?
+
+## Aliases
+
+ ASCII-8BIT | US-ASCII
+------------|----------
+ BINARY     | ASCII
+            | ANSI_X3.4-1968
+            | 646
+
+So, **US-ASCII** is aliased to **ASCII**, but then what is **ASCII-8BIT** for? [Encodings' RDoc](http://ruby-doc.org/core-2.3.1/Encoding.html) has some help:
+
+    Encoding::ASCII_8BIT is a special encoding that is usually
+    used for a byte string, not a character string. But as the name insists,
+    its characters in the range of ASCII are considered as ASCII characters.
+    This is useful when you use ASCII-8BIT characters with other ASCII
+    compatible characters.
+
+So basically, it is not a real encoding, but represents an arbitrary stream of bytes (bytes with a value between 0 and 255). It is used for raw byte stream or if you want to make clear that you do not know about a string's encoding!
+
+The ASCII charset only takes 7 bytes, so in strict ASCII, the 8th byte should never be set. The allowed byte value range is from 0 to 127. This is what the **US-ASCII** encoding is all about: It is used when dealing with ASCII encoded strings. Think: **"ASCII-7BIT"**
+
+A simple example illustrating the difference:
+
+     out_of_ascii_range = 128.chr # => "\x80"
+     out_of_ascii_range.force_encoding("US-ASCII").valid_encoding? # => false
+     out_of_ascii_range.force_encoding("ASCII-8BIT").valid_encoding? # => true