Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

unicode path names for plugin parameters #1177

Closed
springmeyer opened this Issue · 25 comments

3 participants

@springmeyer
Owner

we use std::string for mapnik::parameters, used to configure datasources. It looks like this may not work on windows - need more tests for where things break down.

See mapbox/tilemill#1280 for details, which is closed now and tracked instead here.

@springmeyer
Owner

/cc @avlee

@springmeyer
Owner

see also #976 - if we used mapnik::value inside mapnik::parameters then the icu string type could be used to help with this.

@avlee

windows file api can not handle UTF8 path string, so in any way, we must convert UTF8 path string to window native multi byte.

@springmeyer
Owner

excellent writeup on this: http://www.nubaria.com/en/blog/?p=289

@artemp artemp was assigned
@artemp
Owner

^^ yes, useful write-up!

Just to summarise:

Problem

Posix supports UTF8 which means std::string can be used for unicode
Windows is the only major OS which for historical reasons uses wide strings (2-byte).
Using std::wstring as internal storage for windows builds would really suck.

Solution

As outlined in the article above, it's possible to use UTF8 encoded strings (std::string) throughout (works on posix) and only convert to UTF16 encoded wide-strings at the point of using windows APIs. Windows has it's own non-standard :

std::ifstream file( wchar_t const* name, <mode>)
...

15e059a adds utf8 to/from utf16 in mapnik/utils.hpp and it's possible now:

#ifdef _WINDOWS
std::ifstream file( utf8_to_utf16(filename), std::ios::binary);
#endif 

So far so good, but boost::interprocess::file_mapping doesn't support wide chars:

http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/file_mapping.html <-- BOOM

Anyways, I think the outlined approach is what we should aim at, but this would require refactoring all I/O code where wide paths names might occur.

@springmeyer - lets discuss if this is at all feasible for upcoming 2.2 ?

@artemp
Owner

4c05d3a adds support for converting PyUnicode object into UTF-8 encoded std::strings

@artemp
Owner

871ac5b - adds support for utf16 encoded paths on windows
(shape.input only)

@springmeyer
Owner

okay, pushed support for sqlite, csv, and geojson. Confirmed that ogr works out of the box, so I presume gdal does as well. Only remaining would be raster, but punting on that for the moment since most people use gdal. Also, the attachdb stuff is likely not fully working but that is an edge case I'll need to set up better tests to validate later.

@avlee

This issue is not fully fixed.

If the file name is 区县级行政区划 and other more, the file can not be load.

I check the codes and find that: before we convert the utf8 path string to utf16, we use boost::filesystem::absolute to modify the path in map_parser::ensure_relative_to_xml method. But the boost::filesystem::absolute can not handle utf8 path string in right way on windows.

@avlee avlee referenced this issue from a commit in avlee/mapnik
@avlee avlee Revert "support for handling unicode paths on windows in sqlite, csv,…
… and geojson plugins - refs #1177"

This reverts commit 20076e5.

Conflicts:
	plugins/input/sqlite/sqlite_datasource.cpp
	plugins/input/sqlite/sqlite_utils.hpp
5ce54a7
@springmeyer springmeyer reopened this
@springmeyer
Owner

Thanks for testing. Reopened. We need to support both direct datasource access and loading from XML. Does loading from files in XML with absolute paths work or are both relative and absolute paths in XML broken?

@springmeyer
Owner

@artemp - I'm going to start taking a look at this. Will begin by adding tests to node-mapnik for XML loading, since I already have node-mapnik tests for datasources (because its easier for me to run node-mapnik tests on windows than the python tests, and then all pass currently)

@springmeyer springmeyer referenced this issue from a commit in mapnik/node-mapnik
@springmeyer springmeyer add another test for mapnik/mapnik#1177 dac02cc
@springmeyer
Owner

@avlee - I am unable to replicate a problem within ensure_relative_to_xml. Before getting set up to try to replicate I fixed #1886. Not sure if you were running into that - it was a weird - hard to make sense of - bug.

In testing more with node-mapnik I did hit a problem when trying to load an XML file with 区县级行政区划 in the path and will fix that shortly.

Can you try running the node-mapnik tests and see if you still see a case where this is broken? (hint: you need mapnik-config.bat from mapnik-packaging to build node-mapnik now).

@avlee

I will test it with the latest mapnik and node-mapnik.

Yes, the patch I provided may be have problems with the direct datasource access.

I agree with @artemp's comments: "As outlined in the article above, it's possible to use UTF8 encoded strings (std::string) throughout (works on posix) and only convert to UTF16 encoded wide-strings at the point of using windows APIs."
But the ensure_relative_to_xml will modify the path and return the string by call boost::filesystem::path::string() is not UTF8 encoded string on windows. So to keep the ensure_relative_to_xml always returns utf 8 encoded string, we need using boost::filesystem::path::wstring() and convert it to utf8 on windows like this:

return mapnik::utf16_to_ut8(full.wstring());

/cc @artemp What do you think about it?

@avlee - sounds good to me. @springmeyer - do we have time to push this into 2.2 ?

@springmeyer
Owner

@avlee - thanks. sounds correct. I just need a way to replicate a problem to know how to fix.

@avlee - also see 0365d3e where I have centralized almost all filesystem access - this should help with making the fixes quicker and easier to understand/test.

@springmeyer
Owner

Okay, I am signing off for the night. All tests are passing for me with master node-mapnik except the "Handling unicode paths, filenames, and data register font...". It appears it is failing for two reasons on windows: 1) global fonts are registered before the test font and 2) Even after working around 1) I see that FT_New_Face is not working with the test path.

@springmeyer
Owner

okay, actually worked on this a bit more and got the font loading working on windows with this patch: https://gist.github.com/springmeyer/5696103

@artemp - could you take a look and apply if you think it is decent? I'm also happy to work on getting this right in a future 2.2.1 (likely a few more ifstream places we need to catch for windows).

@artemp
Owner

@springmeyer - applied in f4d2fba

But,yes, we should revisit it in 2.2.x !

@springmeyer
Owner

@artemp - did you forget to push (I don't see that hash)

@avlee - any progress testing on your end with latest code?

@avlee

@springmeyer I tested the latest mapnik to load map files that contains unicode file paths. It passed.

I can not run the mocha test in node-mapnik now, with the error:

> mocha


  npm ERR! weird error -1073741819
npm ERR! not ok code 0
@springmeyer
Owner

@avlee - thanks, great that things work now - did you so a python test?

Regarding the mocha error - not sure what would cause that. Can you instead try npm test ? Also you might try upgrading node to v0.10.7 and running npm cache clean.

@artemp artemp referenced this issue from a commit
@artemp artemp + #1177 via @springmeyer
 (TODO: check if loading the whole font file is required for registering)
f4d2fba
@artemp
Owner

@springmeyer - pushed

@springmeyer
Owner

okay, closing. Let' re-open new tickets for further issues that need fixing.

@springmeyer springmeyer closed this
@springmeyer
Owner

reverted font fix due to extreme lag now in loading many fonts: #1896

@springmeyer
Owner

will track fixing in future revisions at #1897

@onepremise onepremise referenced this issue from a commit in onepremise/mapnik
@artemp artemp + #1177 via @springmeyer
 (TODO: check if loading the whole font file is required for registering)
5ad4895
@onepremise onepremise referenced this issue from a commit in onepremise/mapnik
@springmeyer springmeyer revert f4d2fba - refs #1177 096aca1
@springmeyer
Owner

this is now working better after #1897. It may still be that relative/absolute path making in load_map does not work, but we don't yet have a failing test for this. Tracking adding more tests at mapnik/node-mapnik#275 because its easier to write and test windows issue from node than python tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.