Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong codepage of shapefile #14989

Closed
qgib opened this issue Mar 29, 2012 · 23 comments
Closed

Wrong codepage of shapefile #14989

qgib opened this issue Mar 29, 2012 · 23 comments
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!

Comments

@qgib
Copy link
Contributor

qgib commented Mar 29, 2012

Author Name: Stanislaw Kapustka (Stanislaw Kapustka)
Original Redmine Issue: 5255
Affected QGIS version: 1.7.4


When opening shapefiles, it doesn't matters what codepage You choose, it is always UTF-8 in QGIS 1.74, so polish letters are wrong displayed (when shapefile was saved in other codepage than UTF-8, of course). Other coding is on list but it not works. In QGIS 1.73 it works perfect. The same problem is in master version.



Related issue(s): #15040 (relates), #15349 (relates), #15355 (relates), #21264 (relates)
Redmine related issue(s): 5340, 5900, 5911, 13203


@qgib
Copy link
Contributor Author

qgib commented Mar 29, 2012

Author Name: Alexander Bruy (@alexbruy)


This is because 1.7.4 and master now compiled against GDAL 1.9.0.

@qgib
Copy link
Contributor Author

qgib commented May 14, 2012

Author Name: zirneklitis - (zirneklitis -)


When *.dbf file is re-saved with OpenOffice Calc, QGIS shows the correct characters with any given code page. Until any edits are saved within QGIS. Question marks are saved in place of any non-latin characters. It's impossible to switch the code page for any shape files created by QGIS.

@qgib
Copy link
Contributor Author

qgib commented May 14, 2012

Author Name: Giovanni Manghi (@gioman)


zirneklitis - wrote:

It's impossible to switch the code page for any shape files created by QGIS.

it is not qgis fault, is gdal one. see:

http://ssrebelious.wordpress.com/2012/03/11/qgis-and-gdal1-9-encoding-issue-a-workaround/

this is because 1.7.3 works, it is compiled with an old release of gdal.

@qgib
Copy link
Contributor Author

qgib commented May 14, 2012

Author Name: Alexander Bruy (@alexbruy)


Bug in GDAL already fixed, see http://trac.osgeo.org/gdal/ticket/4650

@qgib
Copy link
Contributor Author

qgib commented May 14, 2012

Author Name: Giovanni Manghi (@gioman)


  • resolution was changed from to upstream
  • status_id was changed from Open to Closed

@qgib
Copy link
Contributor Author

qgib commented May 15, 2012

Author Name: zirneklitis - (zirneklitis -)


Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

@qgib
Copy link
Contributor Author

qgib commented May 15, 2012

Author Name: Alexander Bruy (@alexbruy)


You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved


  • status_id was changed from Closed to Reopened

@qgib
Copy link
Contributor Author

qgib commented May 30, 2012

Author Name: Giovanni Manghi (@gioman)


zirneklitis - wrote:

Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

still a gdal issue, not a qgis one.


  • status_id was changed from Reopened to Closed

@qgib
Copy link
Contributor Author

qgib commented Jun 9, 2012

Author Name: zirneklitis - (zirneklitis -)


I insist that this is a QGIS issue.

GDAL 1.9.0 (and newer) is trying to interpret the encoding setting from the shape file itself. When creating a new shape file “ENCODING” should be passed as an attribute, which, obviously, is not done.

Calling qgis from terminal allows two track down an warning messages. Saving non-Latin characters in a shape files generates following warning message: “Warning 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1.
This warning will not be emitted anymore”.

On the other hand, most of the shape files used by users are without character encoding byte. So QGIS has to operate with environmental variable “SHAPE_ENCODING”. At present the only solution is to use the same character coding for the given QGIS session, e.g.:

SHAPE_ENCODING=UTF-8
export SHAPE_ENCODING
qgis

The example above allows to create and edit shape files with UTF-8 as a character encoding (example for Linux users, Windows users must use “SET SHAPE_ENCODING=UTF-8”).


Excerpt from

http://trac.osgeo.org/gdal/wiki/ConfigOptions

In C/C++ configuration switches can be set programmatically like this:

#include "cpl_conv.h"
...
CPLSetConfigOption( "GDAL_CACHEMAX", "64" );

Normally a configuration option applies to all threads active in a program, but they can be limited to only the current thread this way:

CPLSetThreadLocalConfigOption( "GDAL_CACHEMAX", "64" );

@qgib
Copy link
Contributor Author

qgib commented Jun 9, 2012

Author Name: zirneklitis - (zirneklitis -)


The Linux example above should be as follows:

$ SHAPE_ENCODING=UTF-8
$ export SHAPE_ENCODING
$ qgis

@qgib
Copy link
Contributor Author

qgib commented Jun 9, 2012

Author Name: Alexander Bruy (@alexbruy)


zirneklitis - wrote:

I insist that this is a QGIS issue.

This is GDAL issue. GDAL always reports that it returned attributes is UTF-8, even when attributes have different encoding. SHAPE_ENCODING environment variable didn't work in most cases. This bug was partially fixed (see http://trac.osgeo.org/gdal/ticket/4650), but some more fixes needed

@qgib
Copy link
Contributor Author

qgib commented Jun 10, 2012

Author Name: Jürgen Fischer (@jef-n)


Alexander Bruy wrote:

You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved

how?

@qgib
Copy link
Contributor Author

qgib commented Jun 10, 2012

Author Name: Alexander Bruy (@alexbruy)


Jürgen Fischer wrote:

how?

This is only workaround, not real fix. We simply reverted some parts of 2d0edcd (related to OLCStringsAsUTF8). With GDAL 2.0 in most cases all works fine without this workaround and we are working on final fix for GDAL

@qgib
Copy link
Contributor Author

qgib commented Jun 10, 2012

Author Name: Even Rouault (@rouault)


Note that I've just pushed additonnal fixes in GDAL ( see http://trac.osgeo.org/gdal/ticket/4650 ) that should make OLCStringsAsUTF8 more reliable.

@qgib
Copy link
Contributor Author

qgib commented Jun 10, 2012

Author Name: Tim Sutton (Tim Sutton)


Hi

Could you please provide a Free, minimal test dataset so the we can add a test to our test suit, along with an idea of how we can evaluate the test as passing.

@qgib
Copy link
Contributor Author

qgib commented Jun 10, 2012

Author Name: Even Rouault (@rouault)


I'm attaching a small shapefile generated by the following OGR Python script (needs latest GDAL trunk, to support recoding of field name from UTF-8 to CP936 - reading should be OK with GDAL 1.9)

import sys
from osgeo import ogr, osr, gdal
import struct

ds = ogr.GetDriverByName('ESRI Shapefile').CreateDataSource('chinese.dbf')
lyr = ds.CreateLayer('chinese', options = ['ENCODING=LDID/77'])
chinese_str = struct.pack('B' * 6, 229, 144, 141, 231, 167, 176)
lyr.CreateField(ogr.FieldDefn(chinese_str, ogr.OFTString))
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetField(0, chinese_str)
lyr.CreateFeature(feat)
ds = None



  • 4570 was configured as chinese.zip

@qgib
Copy link
Contributor Author

qgib commented Jun 14, 2012

Author Name: zirneklitis - (zirneklitis -)


Who should create the .cpg files – GDAL or QGIS? Shape file with _.cpg_ present works as expected (partly – QGIS has no idea of the existence of this file). The attribute values are not crippled any more. More about *.cpg files:

http://support.esri.com/en/knowledgebase/techarticles/detail/21106

@qgib
Copy link
Contributor Author

qgib commented Jun 22, 2012

Author Name: Minoru Akagi (@minorua)


I installed GDAL 1.9.1 by using OSGeo4W.

When I convert a dataset of Shapefile which dbf file has "19" value (it means "CP932") in LDID field to KML format with ogr2ogr, the following message is shown.

Warning1: Recode from CP932 to UTF-8 not supported, treated as ISO8859-1 to UTF-8

The Japanese characters of generated KML file is incorrect. This will also result character corruption in QGIS.

I think that recoding of GDAL with iconv library is not enabled now.
For testing, I built GDAL 1.9.1 compiled with HAVE_ICONV constant declared and linked with iconv library.
With my built ogr2ogr, the warning is not appeared and a KML file with readable Japanese characters is generated.

I, as a Japanese user of the great softwares, desired that QGIS use GDAL with iconv library linked.

@qgib
Copy link
Contributor Author

qgib commented Jun 25, 2012

Author Name: Minoru Akagi (@minorua)


I've also reported this recoding issue to OSGeo4W Trac.
http://trac.osgeo.org/osgeo4w/ticket/294

@qgib
Copy link
Contributor Author

qgib commented Jul 1, 2012

Author Name: Minoru Akagi (@minorua)


Sorry, I noticed that I had a problem, which had been solved already in latest GDAL trunk. There is no problem converting CP932 to UTF-8.

@qgib
Copy link
Contributor Author

qgib commented Jul 13, 2018

Author Name: Jürgen Fischer (@jef-n)


@relume
Copy link

relume commented Feb 28, 2023

MacOS : 12.6.3
QGIS : 3.23.3

Hello

It seems that in QGIS 3.23.3 but also in LTR 3.22 in the vector layer import dialog for shape files the encoding pull-down selection for codepage/encoding has no influence how the db file is imported to QGIS. Correct display (codepage) of db values are only accomplishable after the import by select manualy correct codepage/encoding in layer properties (UTF-8 seems to be default). That is very confusing. Codepage/encoding selection at the import dialog worked in other versions, but I can not say when in which version it was the case.

best

@nyalldawson
Copy link
Collaborator

@relume see #52056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!
Projects
None yet
Development

No branches or pull requests

3 participants