UTF8 sucks! I tried to make this work with https://github.com/tmcw/geo-codepoints but it was too slow (>12hrs to run). I wanted to make it work with name_unicode_geo but the utf8 problem unravels forever.
Exercise for the reader: count unicode code points in a string in c++.
osm-rune is this.