Charsets #17

jbt opened this Issue Aug 1, 2011 · 5 comments


None yet

2 participants

jbt commented Aug 1, 2011

The latest GeoIP API has native UTF8 string conversion. Perhaps use the GeoIP_set_charset method with GEOIP_CHARSET_UTF8 to avoid charset issues (node uses UTF8 and breaks non-latin characters with ISO-8859-1)

kuno commented Aug 1, 2011

I am not aware the existence of this kind problem.
Could you offer some details about it or even samples?
Forgive my stupid, if so.

jbt commented Aug 1, 2011

The GeoIP data files are stored in the ISO-8859-1 character set, which has some issues when converting to UTF-8. When used directly, all non-ascii characters are converted to the same illegal character.

Examples: : M�rida (should be Mérida) : Orl�ans (Orléans) : S�o Paulo (São Paulo)

kuno commented Aug 2, 2011

hey, I remembered that someone called "TheDeveloper" told me that they already solved this problem by referencing gun iconv library.
It seems that where this people point to is your repository???

jbt commented Aug 3, 2011

Ah yes, TheDevelpoper's actually a friend of mine - I guess he reported that while I was working on the problem a while back.
I'm not terribly experienced with C++ so I'm not sure if using iconv is the best way to go about converting the charsets - maybe you have a better idea about that than me. It's probably better to use a native implementation (like copying _GeoIP_iso_8859_1__utf8) rather than depending on iconv - you probably have a better idea about that than me.

kuno commented Aug 3, 2011

OK ,I've pick up the _GeoIP_iso_8859_1_utf8 function to handle this issue.

@kuno kuno closed this Mar 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment