-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
106 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# The 9th Bit: Encodings in Ruby 1.9 | ||
|
||
By [Norman Clarke](http://twitter.com/compay) | ||
|
||
I hope you enjoyed my presentation at [RubyConf Brazil 2010](http://www.rubyconf.com.br/)! | ||
|
||
This repository has my slides, some code demos you can run, and some links to | ||
resources to get more information on encodings and Ruby. | ||
|
||
Comments? Feel free to send me an email at norman@njclarke.com. | ||
|
||
## Encoding Resources | ||
|
||
### Basic Information | ||
|
||
* Fabio Akita - [Convertendo meu Banco de Latin1 para UTF-8](http://akitaonrails.com/2010/01/01/convertendo-meu-banco-de-latin1-para-utf-8) | ||
* Ilya Grigorik - [Secure UTF-8 Input in Rails](http://www.igvita.com/2007/04/11/secure-utf-8-input-in-rails/) | ||
* Yehuda Katz - [Encodings, Unabridged](http://yehudakatz.com/2010/05/17/encodings-unabridged/) | ||
|
||
### More Advanced | ||
|
||
* James Edward Grey II - [Understanding M17N](http://blog.grayproductions.net/articles/understanding_m17n) | ||
* Yui Naruse - [The Design and Implementation of Ruby M17N](http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html) | ||
* Ben Peterson - [Unicode in Japan](http://web.archive.org/web/20080122094511/http://www.jbrowse.com/text/unij.html) | ||
* Brian Candler - [String19](http://github.com/candlerb/string19) | ||
* Otfried Chong - [Han Unification in Unicode](http://tclab.kaist.ac.kr/~otfried/Mule/unihan.html) | ||
* Ken Lundie - [CJKV Information Processing](http://oreilly.com/catalog/9780596514471) (Book) | ||
|
||
### Libraries | ||
|
||
* [Unicode](http://github.com/blackwinter/unicode) | ||
* [Babosa](http://github.com/norman/babosa) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
DROP TABLE IF EXISTS example; | ||
|
||
CREATE TABLE example ( | ||
note VARCHAR(20), | ||
value CHAR(1) | ||
); | ||
|
||
-- MySQL: FAIL | ||
INSERT INTO example VALUES ('one byte:', 'a'); | ||
INSERT INTO example VALUES ('two bytes:', 'ã'); | ||
INSERT INTO example VALUES ('three bytes:', 'の'); | ||
INSERT INTO example VALUES ('four bytes:', '沿'); | ||
SELECT * FROM example; | ||
DELETE FROM example; | ||
|
||
-- WIN | ||
SET NAMES 'utf8'; | ||
ALTER TABLE example CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin; | ||
|
||
INSERT INTO example VALUES ('one byte:', 'a'); | ||
INSERT INTO example VALUES ('two bytes:', 'ã'); | ||
INSERT INTO example VALUES ('three bytes:', 'の'); | ||
INSERT INTO example VALUES ('four bytes:', '沿'); | ||
SELECT * FROM example; | ||
SELECT * FROM example WHERE value = 'ã'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# encoding: utf-8 | ||
# Yes, this is valid Ruby 1.9 - even though your text editor's | ||
# syntax highlighting will probably not think so. | ||
class Canção | ||
GÊNEROS = [:forró, :carimbó, :afoxé] | ||
attr_accessor :gênero | ||
end | ||
asa_branca = Canção.new | ||
asa_branca.gênero = :forró | ||
p asa_branca.gênero |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# encoding: utf-8 | ||
require "active_support" | ||
require "active_support/inflector" | ||
require "unicode" | ||
|
||
strings = ["ã", "ç", "ê", "ó"] | ||
strings2 = ["ø", "ß", "œ"] | ||
|
||
class String | ||
|
||
def to_ascii1 | ||
# You'll often see this recommended as a way to "asciify" characters by | ||
# stripping off accent marks. It works ok for Portuguese, but isn't a good | ||
# general solution because many common Latin characters don't decompose. | ||
Unicode.normalize_D(self).gsub(/[^\x00-\x7F]/, '') | ||
end | ||
|
||
def to_ascii2 | ||
# Instead, use a library that has transliteration tables to map the | ||
# characters to a reasonable ASCII representation. | ||
ActiveSupport::Inflector.transliterate(self).to_s | ||
end | ||
end | ||
|
||
# FAIL | ||
p strings.map &:to_ascii1 | ||
p strings2.map &:to_ascii1 | ||
|
||
# OK | ||
p strings.map &:to_ascii2 | ||
p strings2.map &:to_ascii2 |