Change text encoding in UTFGrid driver to work with Windows #5342

Closed
wants to merge 3 commits into
from

Projects

None yet

3 participants

@geographika
Contributor

The use of the ICONV encoding "UCS-4LE" in the maputfgrid.cpp encoding causes junk output on Windows.

utf8 = msConvertWideStringToUTF8 (string, "UCS-4LE");

Changing this to "UCS-2LE" produces the correct output on Windows. I'm not sure if this also works
without issues on Linux. Hopefully there can be one encoding that works on both.

The only other place the encoding is specified is in the Windows-only MS SQL driver which uses "UCS-2LE"
https://github.com/mapserver/mapserver/blob/branch-7-0/mapmssql2008.c#L1741

I'm not sure of the exact difference between the two encodings. The most detailed descriptions I could find were:

"UCS2LE is a direct byte encoding of the first plane in which the low byte comes first"
http://interscript.sourceforge.net/interscript/doc/en_iscr_0279.html

"UCS4LE is a four byte direct encodings of of ISO-10646. UCS4LE puts the low byte first. "
http://interscript.sourceforge.net/interscript/doc/en_iscr_0281.html

geographika Change text encoding to work with Windows
65e2c34
@rouault
Contributor
rouault commented Oct 30, 2016

Looking at https://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html, I'd suggest trying "WCHAR_T" although I'm not sure. I found elsewhere that wchar_t on Windows was only 2 bytes wide, whereas it is 4 bytes on Unix systems, hence "UCS-4LE" is indeed inappropriate for Windows and wchar_t.

@geographika geographika Update maputfgrid.cpp
Try "WCHAR_T" encoding
c9776ac
@geographika
Contributor

@rouault thanks for the comment. I have tried the "WCHAR_T" encoding, as UCS-2LE broke the Linux tests on Travis.

A couple more links I found that might be relevant: http://stackoverflow.com/a/40150716/179520

UTF-32LE = UCS-4LE : UCS-4 in little endian flavour, without BOM

This link says WCHAR_T doesn't work correctly on OSX, and the best approach is to use macros.

http://www.firstobject.com/wchar_t-string-on-linux-osx-windows.htm

@geographika geographika Set encoding based on operating system
b51dcfd
@geographika
Contributor
geographika commented Nov 9, 2016 edited

I found several examples in the MapServer codebase where different code paths are used between Windows and Linux. The final commit works fine on Windows. Apologies for multiple commits, in one pull request - trying to resolve this now.

@tbonfort tbonfort added a commit that referenced this pull request Dec 5, 2016
@tbonfort geographika + tbonfort Fix utfgrid text encoding to work with Windows (#5342) 2ab0dc0
@tbonfort
Member
tbonfort commented Dec 5, 2016

backported and applied to branch-7-0 in 2ab0dc0

@tbonfort tbonfort closed this Dec 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment