Can't read Number columns from Natural Earth shapefiles? #1605

Closed
gravitystorm opened this Issue Nov 28, 2012 · 17 comments

Comments

Projects
None yet
7 participants

The easiest way to reproduce this is:

  • Grab the 10m populated places shapefile from here
  • Create a new project in TileMill, add the shapefile as a new layer

When you open the attributes table, various columns are "undefined" including SCALERANK. If you open the shapefile in QGIS, the numerical columns are fine. If you open the .dbf file in LibreOffice, again everything is fine.

It's not a TileMill problem, because mapnik can't render anything from these columns, nor use them as filters.

It looks to me like mapnik can't read column type 'N' properly?

Owner

springmeyer commented Nov 28, 2012

@gravitystorm - thanks for the report. I can replicate.

Basically the problem is that when we parse numbers from shapefiles we consider a successful parse a combination of 1) did we get a valid value from the parse? , and 2) did we successfully parse the whole char array length?. The latter is important so that we don't parse some number like 120 as 12. So, in this case we'd only consider it a successful parse if we parsed 3 characters.

Digging into one particularly failing field POP_MIN: In this case natural earth data gives an integer like 21714 that is clearly 5 characters long, but Mapnik is reading a binary record that is reporting a length of 9. Thusly, in this case itr != end right here: https://github.com/mapnik/mapnik/blob/master/plugins/input/shape/dbfile.cpp#L171

So, either this is a bug in natural earth data, or a bug in mapnik's reading of the length of a field, or perhaps some other binary data in the shapefile that is causing memory corruption and thus later reads of data to be messed up.

I notice that copying the shapefile with ogr2ogr immediately fixes the issue - a pretty strong indication that the source natural earth data is bogus (or maybe just has some ARCGIS-ism that Mapnik has never encountered and that OGR is capable of cleaning up):

So, doing:

ogr2ogr ne_10m_populated_places_fixed.shp ne_10m_populated_places.shp

Then try reading the file with mapnik - works then for me.

Owner

springmeyer commented Nov 28, 2012

@Andrey-VI - you are smart to spot the relationship between this ticket and #1314, but in this case the shapefile dbf parser does not use the same code.

Owner

springmeyer commented Nov 28, 2012

/cc @nvkelso just so he is aware that other programs may hit this.

Contributor

nvkelso commented Nov 28, 2012

I'm assuming this is 2.0 Natural Earth files, and specifically the "adm_0"
and "adm_1" variants? Those are auto-generated in ArcMap model builder now
doing some Excel magic that might be having this side effect.

On Wed, Nov 28, 2012 at 2:18 PM, Dane Springmeyer
notifications@github.comwrote:

/cc @nvkelso https://github.com/nvkelso just so he is aware that other
programs may hit this.


Reply to this email directly or view it on GitHubhttps://github.com/mapnik/mapnik/issues/1605#issuecomment-10825552.

Member

rcoup commented Nov 29, 2012

Interestingly, I discovered yesterday that GDAL/OGR:

  • trims leading and trailing whitespace from Shapefile fields (except double/ints) before returning them
  • runs doubles/ints through atof(), which trims leading whitespace and ignores any non-number suffixes.

See DBFReadAttribute() if you're interested

I wonder if leading/trailing whitespace is a common side-effect of DBF writers? Haven't been able to ask Frank W. for any insight on it yet.

Contributor

nvkelso commented Nov 29, 2012

Can you paste the Mapnik error here, please?

Owner

springmeyer commented Nov 29, 2012

@nvkelso - there is no error. Mapnik is simply refusing the parse some of the dbf fields, so the symptom is that features will be returned with blank values for some number fields (they are never set).

This is an example: https://gist.github.com/4165970

Contributor

nvkelso commented Nov 29, 2012

Hmm, that file should have the same setup in 2.0 as the 1.4 series.

Can you try with the 1.4 file here:
https://github.com/nvkelso/natural-earth-vector/raw/master/archive/ne_10m_populated_places_1.4.0.zip

To find out if this is an old bug or a new bug?

_Nathaniel

On Wed, Nov 28, 2012 at 4:59 PM, Dane Springmeyer
notifications@github.comwrote:

@nvkelso https://github.com/nvkelso - there is no error. Mapnik is
simply refusing the parse some of the dbf fields, so the symptom is that
features will be returned with blank values (they are never set).

This is an example: https://gist.github.com/4165970


Reply to this email directly or view it on GitHubhttps://github.com/mapnik/mapnik/issues/1605#issuecomment-10830641.

Owner

springmeyer commented Nov 29, 2012

looks like Mapnik has problems with the 1.4 file as well. But many fewer fields are blank, so I'd guess nobody has noticed this before: https://gist.github.com/4165970#file_result_1.4.0.txt

Contributor

nvkelso commented Nov 29, 2012

I'm scratching my head. Will take a closer look next couple days.

Owner

springmeyer commented Nov 29, 2012

@nvkelso - thanks very much for checking on this. I would not rule out a mapnik bug here. I'll also try to learn more when I have time.

@artemp artemp added a commit that referenced this issue Nov 29, 2012

@artemp artemp + don't expect we _must_ consume all input when parsing numbers
  some DBF can have some junk appended to records #1605
913e1d0
Owner

artemp commented Nov 29, 2012

Fixed in 913e1d0 - relax parsing constraints (don't assume we have to parse all input for numeric fields).

@nvkelso - wondering why population fields are real numbers in NE ?

POP1950(19,11):1682.00000000000 

artemp closed this Nov 29, 2012

Owner

artemp commented Nov 29, 2012

@rcoup - just looking at DBFReadAttribute() :

....
if( chReqType == 'N' )
    {
        psDBF->dfDoubleField = psDBF->sHooks.Atof(psDBF->pszWorkField);

    pReturnField = &(psDBF->dfDoubleField);
    }
...

^^ looks like GDAL treat all numeric fields as doubles which is not quite right.

nvkelso referenced this issue in nvkelso/natural-earth-vector Dec 1, 2012

Open

funky number field formats in Natural Earth (populated places) #5

dkerkow commented Apr 30, 2013

I have similar issue when loading a simple world covering shapefile created in qgis in tilemill. The difference is, that the attribute table in Tilemill shows valid entries, but always the content of the first object.

Thankfully, the ogr2ogr workaround mentioned above fixed it.

Owner

springmeyer commented Aug 7, 2013

@dkerkow - just seeing your comment now. do you have that shapefile laying around still to share?

dkerkow commented Aug 8, 2013

I'm not sure anymore but I think a simple ogr2ogr copying fixed it. Some
problems with shapefile formatting I think.
Am 08.08.2013 01:05 schrieb "Dane Springmeyer" notifications@github.com:

@dkerkow https://github.com/dkerkow - just seeing your comment now. do
you have that shapefile laying around still to share?


Reply to this email directly or view it on GitHubhttps://github.com/mapnik/mapnik/issues/1605#issuecomment-22291104
.

@PetrDlouhy PetrDlouhy added a commit to PetrDlouhy/mapnik that referenced this issue Aug 22, 2013

@artemp @PetrDlouhy artemp + PetrDlouhy + don't expect we _must_ consume all input when parsing numbers
  some DBF can have some junk appended to records #1605
cec1835

Quintus referenced this issue in gravitystorm/openstreetmap-carto Apr 17, 2014

Closed

Add information to INSTALL.md on ogr2ogr #482

@imagico imagico pushed a commit to imagico/openstreetmap-carto-german that referenced this issue Sep 9, 2014

@gravitystorm gravitystorm Add notes on where to get the shapefiles, and document the workaround… e4acdde

almccon referenced this issue in tilemill-project/tilemill Aug 25, 2015

Open

Trying to get backup of OSM tiles working #2525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment