Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change text encoding in UTFGrid driver to work with Windows #5342

Closed
wants to merge 3 commits into from

Conversation

geographika
Copy link
Member

The use of the ICONV encoding "UCS-4LE" in the maputfgrid.cpp encoding causes junk output on Windows.

utf8 = msConvertWideStringToUTF8 (string, "UCS-4LE");

Changing this to "UCS-2LE" produces the correct output on Windows. I'm not sure if this also works
without issues on Linux. Hopefully there can be one encoding that works on both.

The only other place the encoding is specified is in the Windows-only MS SQL driver which uses "UCS-2LE"
https://github.com/mapserver/mapserver/blob/branch-7-0/mapmssql2008.c#L1741

I'm not sure of the exact difference between the two encodings. The most detailed descriptions I could find were:

"UCS2LE is a direct byte encoding of the first plane in which the low byte comes first"
http://interscript.sourceforge.net/interscript/doc/en_iscr_0279.html

"UCS4LE is a four byte direct encodings of of ISO-10646. UCS4LE puts the low byte first. "
http://interscript.sourceforge.net/interscript/doc/en_iscr_0281.html

@rouault
Copy link
Contributor

rouault commented Oct 30, 2016

Looking at https://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html, I'd suggest trying "WCHAR_T" although I'm not sure. I found elsewhere that wchar_t on Windows was only 2 bytes wide, whereas it is 4 bytes on Unix systems, hence "UCS-4LE" is indeed inappropriate for Windows and wchar_t.

Try "WCHAR_T" encoding
@geographika
Copy link
Member Author

@rouault thanks for the comment. I have tried the "WCHAR_T" encoding, as UCS-2LE broke the Linux tests on Travis.

A couple more links I found that might be relevant: http://stackoverflow.com/a/40150716/179520

UTF-32LE = UCS-4LE : UCS-4 in little endian flavour, without BOM

This link says WCHAR_T doesn't work correctly on OSX, and the best approach is to use macros.

http://www.firstobject.com/wchar_t-string-on-linux-osx-windows.htm

@geographika
Copy link
Member Author

geographika commented Nov 9, 2016

I found several examples in the MapServer codebase where different code paths are used between Windows and Linux. The final commit works fine on Windows. Apologies for multiple commits, in one pull request - trying to resolve this now.

@tbonfort
Copy link
Member

tbonfort commented Dec 5, 2016

backported and applied to branch-7-0 in 2ab0dc0

@tbonfort tbonfort closed this Dec 5, 2016
@geographika geographika mentioned this pull request Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants