Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding CP852 missing in windows instalation #36871

Closed
AlzbetaGardonova opened this issue Jun 1, 2020 · 6 comments · Fixed by #37147
Closed

encoding CP852 missing in windows instalation #36871

AlzbetaGardonova opened this issue Jun 1, 2020 · 6 comments · Fixed by #37147
Assignees
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!

Comments

@AlzbetaGardonova
Copy link

**There is no possibility to set encoding CP 852 to layer in windows instalation. **
Code CP 852 is not in the list of encodings. But is available in ubuntu instalation. If I have shapefile with this encoding defined in cpg file the encoding pre-set to System. QGIS seems like ignore cpg file. There is no more possibility so set this by settings ->datasource->ignore shapefile encoding declaration in this version.

QGIS and OS versions
QGIS 3.10.6 on win 10 and ubuntu 18.04

@AlzbetaGardonova AlzbetaGardonova added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jun 1, 2020
@nyalldawson
Copy link
Collaborator

This setting was moved to Layer Properties - Source tab.

@gioman gioman added the Feedback Waiting on the submitter for answers label Jun 1, 2020
@kadarivan
Copy link

@nyalldawson The character encoding settings was moved, but CP852 still missing on Win10.
encoding

In previous QGIS versions, OGR decoded CP852 correctly with the "ignore shapefile encoding declaration" option. Without this option QGIS can't decode CP852 on Win10.

@gioman gioman removed the Feedback Waiting on the submitter for answers label Jun 3, 2020
@agiudiceandrea
Copy link
Contributor

hi @AlzbetaGardonova @kadarivan, it seems the QGIS encodings list is taken from the Qt codecs list, and CP852 seems missing in current Qt version (5.11.2) used by OSGeo4W at least on Windows with Italian locale.
Anyway, if your system is in CP852, shouldn't "System" mean CP852?

@kadarivan
Copy link

CP852/IBM852 is a very old codepage from the DOS era. Just some legacy program export data in this codepage (like Hungarian Public Road Company's databank). No one use it as system default encoding.
Before a recent patch OGR converted on-the-fly every data to utf8, so the user don't have to select codepage manually. But now this option is missing so qgis not support CP852 (and lots of other codepages I guess).

@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Jun 7, 2020

hi @nyalldawson, I did some testing and it seems to me that actually it's impossible to properly read a CP852 encoded shapefile using QGIS 3.10.6 on Windows, while it is possible using QGIS 3.10.1.

This test_852.zip is a zipped point shapefile, test_852.shp, with a single text field, "test852", containing a single record with the following 4 char in hex: 0x61, 0x62, 0x63, 0x8A, as the field value.

The test_852.cpg file contains the string "852".

In the Code page 852 the hex chars 0x61, 0x62, 0x63, 0x8A should be decoded as abcŐ (the last char is LATIN CAPITAL LETTER O WITH DOUBLE ACUTE).

On Windows 7 64 bit (Italian language),

  • using QGIS 3.10.1/GDAL 3.0.2 (with "Ignore shapefile encoding declaration" set to off, as default), the chars are correctly decoded using IBM/CP 852:
    image
    the Layer properties display Data source encoding: UTF-8, grayed out
    image

  • using QGIS 3.10.6/GDAL 3.0.4 (there is no "Ignore shapefile encoding declaration" setting), the chars are decoded incorrectly using Windows-1252 code page (which is the System code page for Italian language Windows) as abcŠ (the last char is LATIN CAPITAL LETTER S WITH CARON):
    image
    the Layer properties display Data source encoding: System
    image
    It's possible to manually change the code page, e.g. with code page CP850
    image
    but it is not possible to obtain the correct decoding, since the encoding list lacks of IBM/CP852 codec as the QGIS encodings list is taken from the Qt codecs list, and CP852 seems missing in current Qt version (5.11.2) used by OSGeo4W.

So it seems this regression may have be introduced in the effort of fixing other shapefile encoding issues like with PR #34381 / #34607

@agiudiceandrea
Copy link
Contributor

so qgis not support CP852 (and lots of other codepages I guess).

You are right @kadarivan,
on Windows 7 / 10, QgsVectorDataProvider.availableEncodings() outputs 135 available codecs names using QGIS 3.4 / 3.10 / 3.12 running against Qt 5.11.2, and 134 available codecs names using QGIS 2.18.23 running against Qt 4.8.5,
while on Ubuntu 20.04, QgsVectorDataProvider.availableEncodings() outputs 812 available codecs names using QGIS 3.12 running against Qt 5.12.8.

@nyalldawson nyalldawson self-assigned this Jun 8, 2020
nyalldawson added a commit to nyalldawson/QGIS that referenced this issue Jun 12, 2020
the conversion over to GDAL's API

Resolves missing text codecs like CP852 on windows builds. These
were previously available, but then Qt upstream dropped the ICU
library from their windows builds, and accordingly a whole bunch
of older text codecs are no longer available by default on the
windows builds.

Fixes qgis#36871
nyalldawson added a commit to nyalldawson/QGIS that referenced this issue Oct 20, 2020
the conversion over to GDAL's API

Resolves missing text codecs like CP852 on windows builds. These
were previously available, but then Qt upstream dropped the ICU
library from their windows builds, and accordingly a whole bunch
of older text codecs are no longer available by default on the
windows builds.

Fixes qgis#36871
nyalldawson added a commit that referenced this issue Oct 20, 2020
the conversion over to GDAL's API

Resolves missing text codecs like CP852 on windows builds. These
were previously available, but then Qt upstream dropped the ICU
library from their windows builds, and accordingly a whole bunch
of older text codecs are no longer available by default on the
windows builds.

Fixes #36871
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!
Projects
None yet
5 participants