unicode path names for plugin parameters #1177

Closed
springmeyer opened this Issue Apr 14, 2012 · 25 comments

Comments

Projects
None yet
3 participants
Owner

springmeyer commented Apr 14, 2012

we use std::string for mapnik::parameters, used to configure datasources. It looks like this may not work on windows - need more tests for where things break down.

See https://github.com/mapbox/tilemill/issues/1280 for details, which is closed now and tracked instead here.

Owner

springmeyer commented Apr 14, 2012

/cc @avlee

Owner

springmeyer commented Apr 16, 2012

see also #976 - if we used mapnik::value inside mapnik::parameters then the icu string type could be used to help with this.

Contributor

avlee commented Apr 17, 2012

windows file api can not handle UTF8 path string, so in any way, we must convert UTF8 path string to window native multi byte.

Owner

springmeyer commented Feb 20, 2013

excellent writeup on this: http://www.nubaria.com/en/blog/?p=289

@ghost ghost assigned artemp May 9, 2013

Owner

artemp commented May 20, 2013

^^ yes, useful write-up!

Just to summarise:

Problem

Posix supports UTF8 which means std::string can be used for unicode
Windows is the only major OS which for historical reasons uses wide strings (2-byte).
Using std::wstring as internal storage for windows builds would really suck.

Solution

As outlined in the article above, it's possible to use UTF8 encoded strings (std::string) throughout (works on posix) and only convert to UTF16 encoded wide-strings at the point of using windows APIs. Windows has it's own non-standard :

std::ifstream file( wchar_t const* name, <mode>)
...

15e059a adds utf8 to/from utf16 in mapnik/utils.hpp and it's possible now:

#ifdef _WINDOWS
std::ifstream file( utf8_to_utf16(filename), std::ios::binary);
#endif 

So far so good, but boost::interprocess::file_mapping doesn't support wide chars:

http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/file_mapping.html <-- BOOM

Anyways, I think the outlined approach is what we should aim at, but this would require refactoring all I/O code where wide paths names might occur.

@springmeyer - lets discuss if this is at all feasible for upcoming 2.2 ?

Owner

artemp commented May 20, 2013

4c05d3a adds support for converting PyUnicode object into UTF-8 encoded std::strings

Owner

artemp commented May 21, 2013

871ac5b - adds support for utf16 encoded paths on windows
(shape.input only)

Owner

springmeyer commented May 21, 2013

okay, pushed support for sqlite, csv, and geojson. Confirmed that ogr works out of the box, so I presume gdal does as well. Only remaining would be raster, but punting on that for the moment since most people use gdal. Also, the attachdb stuff is likely not fully working but that is an edge case I'll need to set up better tests to validate later.

Contributor

avlee commented Jun 1, 2013

This issue is not fully fixed.

If the file name is 区县级行政区划 and other more, the file can not be load.

I check the codes and find that: before we convert the utf8 path string to utf16, we use boost::filesystem::absolute to modify the path in map_parser::ensure_relative_to_xml method. But the boost::filesystem::absolute can not handle utf8 path string in right way on windows.

avlee added a commit to avlee/mapnik that referenced this issue Jun 1, 2013

Revert "support for handling unicode paths on windows in sqlite, csv,…
… and geojson plugins - refs #1177"

This reverts commit 20076e5.

Conflicts:
	plugins/input/sqlite/sqlite_datasource.cpp
	plugins/input/sqlite/sqlite_utils.hpp

@springmeyer springmeyer reopened this Jun 2, 2013

Owner

springmeyer commented Jun 2, 2013

Thanks for testing. Reopened. We need to support both direct datasource access and loading from XML. Does loading from files in XML with absolute paths work or are both relative and absolute paths in XML broken?

Owner

springmeyer commented Jun 2, 2013

@artemp - I'm going to start taking a look at this. Will begin by adding tests to node-mapnik for XML loading, since I already have node-mapnik tests for datasources (because its easier for me to run node-mapnik tests on windows than the python tests, and then all pass currently)

springmeyer pushed a commit to mapnik/node-mapnik that referenced this issue Jun 2, 2013

springmeyer pushed a commit to mapnik/node-mapnik that referenced this issue Jun 3, 2013

Owner

springmeyer commented Jun 3, 2013

@avlee - I am unable to replicate a problem within ensure_relative_to_xml. Before getting set up to try to replicate I fixed #1886. Not sure if you were running into that - it was a weird - hard to make sense of - bug.

In testing more with node-mapnik I did hit a problem when trying to load an XML file with 区县级行政区划 in the path and will fix that shortly.

Can you try running the node-mapnik tests and see if you still see a case where this is broken? (hint: you need mapnik-config.bat from mapnik-packaging to build node-mapnik now).

Contributor

avlee commented Jun 3, 2013

I will test it with the latest mapnik and node-mapnik.

Yes, the patch I provided may be have problems with the direct datasource access.

I agree with @artemp's comments: "As outlined in the article above, it's possible to use UTF8 encoded strings (std::string) throughout (works on posix) and only convert to UTF16 encoded wide-strings at the point of using windows APIs."
But the ensure_relative_to_xml will modify the path and return the string by call boost::filesystem::path::string() is not UTF8 encoded string on windows. So to keep the ensure_relative_to_xml always returns utf 8 encoded string, we need using boost::filesystem::path::wstring() and convert it to utf8 on windows like this:

return mapnik::utf16_to_ut8(full.wstring());

/cc @artemp What do you think about it?

@avlee - sounds good to me. @springmeyer - do we have time to push this into 2.2 ?

Owner

springmeyer commented Jun 3, 2013

@avlee - thanks. sounds correct. I just need a way to replicate a problem to know how to fix.

@avlee - also see 0365d3e where I have centralized almost all filesystem access - this should help with making the fixes quicker and easier to understand/test.

Owner

springmeyer commented Jun 3, 2013

Okay, I am signing off for the night. All tests are passing for me with master node-mapnik except the "Handling unicode paths, filenames, and data register font...". It appears it is failing for two reasons on windows: 1) global fonts are registered before the test font and 2) Even after working around 1) I see that FT_New_Face is not working with the test path.

Owner

springmeyer commented Jun 3, 2013

okay, actually worked on this a bit more and got the font loading working on windows with this patch: https://gist.github.com/springmeyer/5696103

@artemp - could you take a look and apply if you think it is decent? I'm also happy to work on getting this right in a future 2.2.1 (likely a few more ifstream places we need to catch for windows).

Owner

artemp commented Jun 3, 2013

@springmeyer - applied in f4d2fba

But,yes, we should revisit it in 2.2.x !

Owner

springmeyer commented Jun 3, 2013

@artemp - did you forget to push (I don't see that hash)

@avlee - any progress testing on your end with latest code?

Contributor

avlee commented Jun 3, 2013

@springmeyer I tested the latest mapnik to load map files that contains unicode file paths. It passed.

I can not run the mocha test in node-mapnik now, with the error:

> mocha


  npm ERR! weird error -1073741819
npm ERR! not ok code 0
Owner

springmeyer commented Jun 3, 2013

@avlee - thanks, great that things work now - did you so a python test?

Regarding the mocha error - not sure what would cause that. Can you instead try npm test ? Also you might try upgrading node to v0.10.7 and running npm cache clean.

artemp added a commit that referenced this issue Jun 3, 2013

+ #1177 via @springmeyer
 (TODO: check if loading the whole font file is required for registering)
Owner

artemp commented Jun 3, 2013

@springmeyer - pushed

Owner

springmeyer commented Jun 3, 2013

okay, closing. Let' re-open new tickets for further issues that need fixing.

@springmeyer springmeyer closed this Jun 3, 2013

springmeyer pushed a commit that referenced this issue Jun 4, 2013

Owner

springmeyer commented Jun 4, 2013

reverted font fix due to extreme lag now in loading many fonts: #1896

Owner

springmeyer commented Jun 4, 2013

will track fixing in future revisions at #1897

onepremise added a commit to onepremise/mapnik that referenced this issue Jun 14, 2013

onepremise added a commit to onepremise/mapnik that referenced this issue Jun 14, 2013

+ #1177 via @springmeyer
 (TODO: check if loading the whole font file is required for registering)

onepremise added a commit to onepremise/mapnik that referenced this issue Jun 14, 2013

PetrDlouhy added a commit to PetrDlouhy/mapnik that referenced this issue Aug 22, 2013

Owner

springmeyer commented Jun 18, 2014

this is now working better after #1897. It may still be that relative/absolute path making in load_map does not work, but we don't yet have a failing test for this. Tracking adding more tests at mapnik/node-mapnik#275 because its easier to write and test windows issue from node than python tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment