Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Parsing UTF 8 input

jgarber edited this page · 1 revision

It’s not difficult to parse UTF-8 input in Treetop.

Ruby 1.8 and older

If you’re running Ruby 1.8 or older, you can just require 'active_support' and pass input.mb_chars to the parser. String#mb_chars creates a multibyte-safe proxy for string methods that would normally choke on multibyte characters. It’s not free, of course. Expect your parser to be about 10% slower.

Ruby 1.9

If you have Ruby 1.9, you don’t have to do anything special. Strings in 1.9 are (mostly) encoding aware. If you do require active_support, String#mb_chars just returns self. Thus, requiring active_support is the easy way to run one version of your parser on multiple Ruby versions.

Something went wrong with that request. Please try again.