It’s not difficult to parse UTF-8 input in Treetop.
Ruby 1.8 and older
If you’re running Ruby 1.8 or older, you can just require 'active_support' and pass input.mb_chars to the parser. String#mb_chars creates a multibyte-safe proxy for string methods that would normally choke on multibyte characters. It’s not free, of course. Expect your parser to be about 10% slower.
If you have Ruby 1.9, you don’t have to do anything special. Strings in 1.9 are (mostly) encoding aware. If you do require active_support, String#mb_chars just returns self. Thus, requiring active_support is the easy way to run one version of your parser on multiple Ruby versions.