-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When opening Shapefile the .cpg file is ignored in Windows 8.1 #21264
Comments
Author Name: Adrian Klink (@aklink) I have found the reason for my "bug" which seems to be a "feature". It is due to default Setting Ignore Shapefile Encoding in Quantum GIS Options that seems to have been introduced here:
I fixed my problem by disabling Ignore Shapefile Encoding in Options (see screenshot attached). However, I find it irritating or confusing that this Option is enabled by default (I would have expected this Option to be disabled, in fact I didn't even know about this option before reading old bug reports and forum entries). Why isn't it at least mentioned in QGIS documentation? e.g. here: http://docs.qgis.org/2.8/de/docs/user_manual/working_with_vector/supported_data.html#esri-shapefiles
|
Author Name: Adrian Klink (@aklink) Wrong screenshot... replaced.
|
Author Name: Adrian Klink (@aklink) I understand, that Ignore Shapefile Encoding has been set to true for new created vector layers: But why has it been generally set to true since QGIS 2.0 ? This default true option leads to corrupted encoding if doing the following:
Surprise: special chars in file are now corrupted! Same happens when saving file to kml (always using UTF-8!). In short: IMHO the default option is not compatible to drag-and-drop usage. Please explain and/or add comments to documentation - aspecially in the shapefile part as mentioned above, I know there is a really short description in options part but this doesn't really help if someone does not expect such a behaviour. It still seems to be a bug to me (according to what a user expects), although I am pretty sure this option was not set to true just by accident. But I do not understand why it was. In QGIS 1.9 this option was originally set to false which makes more sense to me. |
Author Name: Adrian Klink (@aklink) Well, after more investigating into the issue, I found the following discussion going back to QGIS 1.8, why this was changed #15720
My question: Is this still true for current QGIS versions? As I mentioned above, there are downsides...
Again: IMHO it's not any more a win win situation since it interferes with drag and drop usage of QGIS 2.x. Anyone who was in this discussion 3 years ago, could you please check if this default option still makes sense? I could not see any problems when disabling this option (Ignore Shapefile Encoding), but if option is active by default if interferes with drag and drop usage. |
Author Name: Minoru Akagi (@minorua) Hi, -In Japan, we usually get shapefiles that encoding is CP932 (the code page in Japanese windows). Bad thing is that they sometimes have LDID/87 in the LDID field of dbf file. OGR shapefile driver handles LDID/87 as ISO-8859-1 so character corruption occurs.- When the "Ignore Shapefile Encoding" option is not checked and a shapefile to be loaded has non-zero LDID or a cpg file, the encoding selection on the open vector layer dialog is not applied to the layer. In this case, the user selection is ignored. Experienced users might be able to avoid the corruption by creating a cpg file or checking the option, but I am afraid of beginners' (or general users') confusion. So I think the option should be checked by default. IMHO, a more flexible way to deal with this issue:
|
Author Name: Adrian Klink (@aklink) I am not sure if I get it right, but if I understand it correctly the automatic shapefile encoding detection is blocking manual selection of encoding in dialog (if not ignored). Therefore, if detection fails like e.g. for Japanese CP932 Encoding (or other non ASCII compatible encodings), user has to select encoding manually, but can not do so if "Ignore Shapefile Encoding" is not true. IMHO, same flexible way as mentioned by Minoru Akagi, but from different view:
|
Author Name: Minoru Akagi (@minorua) Adrian Klink wrote:
What the layer encoding is set to and whether characters in the attribute table are right or not when a CP932 shapefile is opened by drag & drop:
-*** Characters are wrong, and user cannot correct the encoding setting on the GUI.- LDID/19 means CP932. Edited: 2015-09-05 |
Author Name: Minoru Akagi (@minorua) @adrian Klink, I'm sorry. I had a misunderstanding. I do not get CP932 shapefiles with LDID/87. I get shapefiles with LDID/0 or LDID/19. I've edited my above comments. And there is a nice plugin to fix the encoding declaration on GUI: "Shapefile Encoding Fixer":https://plugins.qgis.org/plugins/shapefile_encoding_fixer/ |
Author Name: Adrian Klink (@aklink) @Minoru Akagi: Thank you very much for investigating into Japanese CP932 Shapefile opening by Drag&Drop with and w/o Ignore Shapefile Encoding option. And thank you for the link to the Shapefile Encoding Fixer Plugin. It is very usefull. So I think we do agree that Option "Ignore Shapefile Encoding" should be handled differently for Drag&Drop (disabled) and opening via Dialog (enabled), unlike it is currently implemented. The remaining question is how this can be done the best way without any side effects. |
Author Name: Peter Drexel (Peter Drexel) I just ran into the same issue... Opening Shapefiles using Drag and Drop should definitely use .cpg-File-Settings. So as mentioned above So Thanks Peter |
Author Name: Giovanni Manghi (@gioman)
|
Author Name: Johannes Kroeger (Johannes Kroeger) This is really unexpected behaviour. Ignoring the .cpg file seems weird to me. Users can override the encoding the open dialog so if they must, the option exists. The option is not clear, what exactly does it mean if one disables "on-the-fly conversion to UTF-8"? Our files might be declared as UTF-8 in the .cpg, what happens if we untick the option? Please at least notify the user in a message about the ignoring. If magic must happen, maybe use something to try to detect the encoding? .cpg files are there for a reason, it is not a good idea to assume that the system encoding is a sane, modern one (Hi Windows!). This just gave me flashbacks to ArcGIS... |
Author Name: Quan Tum (Quan Tum) +1, same problem here on Windows... |
Author Name: Johannes Kroeger (Johannes Kroeger) As this continues to be a common source of frustration and student aversion against QGIS, I spent some time documenting what is happening and how unpredictable things are in this current state. Pleeeeeaase:
I installed QGIS 3.2.0 on a fresh Windows 10 (free by Microsoft at https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/). @ps C:\Users\IEUser> [System.Text.Encoding]::Default QGIS' options menu shows "Ignore shapefile encoding declaration" checked as active. I created a new Shapefile, "System encoding" in the dialog was chosen and I did not change it. I added a text field "text". I then added one single feature, settings its "text" to "äöü". Drawing that as label was no problem. I then used right-click → Export → Save Features to save the data to new shapefiles, manually setting the encoding to:
This led to all files but the initial one ("System") having a .cpg file: | File | Content of CPG | The files are attached to this comment. The files were automatically loaded after creation and their labels are fine. I saved the project. Text was fine after loading.
I made a new project and used drag and drop from the Windows Explorer to add the files. For the UTF-8 file the text field contents are now shown as garbage: "äöü". The others are fine. I made a new project and used the Data Source Manager to add the files. Encoding "ISO-8859-1" was pre-selected in the dialog (not "System", interestingly). Of course loading the files with that forced encoding leads to the UTF-8 one being garbled again. Out of interest I tried setting the dialog to "UTF-8". As expected this made the UTF-8 file load fine and the others become garbage (���). I made a new project and used double-clicking in the Browser to load the files. They were magically loaded in UTF-8 mode, rendering all but the UTF-8 file garbage. I looked around in the Browser to find its settings and I find none. I guess this forced override comes from my change in the Data Source Manager? I enabled the Browser's Information Panel. Selecting the ISO-8859-1 file yielded the line "Encoding UTF-8" in it. Hell no, that file is NOT UTF-8! Still, probably from the forced override in the Data Source Manager. I opened the Data Source Manager again, changed the Encoding to "System" and ... how do I save this ... clicked Close. No change in the Information Panel. I tried again, this time loading any random file. Now the Information Panel shows "System" for all my differently encoded files.
Next I unticked the "Ignore shapefile encoding declaration" option in QGIS' options menu. Checking the Information Panel again, it now shows "UTF-8" for all my files except the one with "System" encoding (the first one I created and used as base for the others). OK, I guess that is what was to be expected as the "Ignore shapefile encoding declaration" mentions some automatic conversion (not override!) from the original encoding to UTF-8 that OGR does. No idea why the one file is still shown as "System" though... Loading the files via the Browser works fine for all of them. Drag and drop from a Windows Explorer works fine for all of them. Using the Data Source Manager (where the Encoding was set to "System" again) works fine for all of them. Wat. Using the Data Source Manager where the Encoding was set to "UTF-8" works fine for all of them except for the "System" file which gives ���. Using the Data Source Manager where the Encoding was set to "latin1" works fine for all of them. Wat².
|
Author Name: Adrian Klink (@aklink) I agree to it. The default "IgnoreShapefileEncoding" causes way too much confusion! It should be off by default and can be enabled if necessary. @Minoru Akagi: Can you please make a Pull request to revert this commit ddb5117 for further qgis versions? Commit ddb5117: src/providers/ogr/qgsogrprovider.cpp
src/app/qgsoptions.cpp
|
Author Name: Jürgen Fischer (@jef-n)
|
Author Name: Jürgen Fischer (@jef-n)
|
Author Name: Jürgen Fischer (@jef-n)
|
Author Name: Jérôme Seigneuret (Jérôme Seigneuret) Hi, |
Author Name: Jérôme Seigneuret (Jérôme Seigneuret) It's in relation with ignoreShapeEncoding Because default value is true. So there is no autodetection in drag&drop But if you set ignoreShapeEncoding=false in QGIS.ini the encoding is UTF-8 and in layer properties, encoding is not editable (grey combobox is shaded) I edit this directety in QGIS.ini textfile because there is an error on parameters dialogbox #27566 |
Author Name: Giovanni Manghi (@gioman)
that error is caused by a 3rd party plugin, not qgis itself. If you remove/disable the plugin it works as expected?
I saved a shapefile (point, line, or polygon) with UTF-8 encoding (.cpg with UTF-8 was created). When opening via Drag-and-Drop the .cpg file is ignored and file is being opened with wrong encoding (ISO8859-1 instead of UTF-8) resulting in broken chars. When opening via add vector layer (ctrl + shift + v) using open file dialog, UTF-8 is used as default (which can be changed), but .cpg file is ignored as well. I have to pick the proper encoding manually, if shapefile has different encoding then UTF-8. tested using:
Windows 7 64bit (different machine)
I saved a shapefile (point, line, or polygon) with UTF-8 encoding (.cpg with UTF-8 was created). When opening via Drag-and-Drop the .cpg file is ignored and file is being opened with wrong encoding (ISO8859-1 instead of UTF-8) resulting in broken chars. When opening via add vector layer (ctrl + shift + v) using open file dialog, UTF-8 is used as default (which can be changed), but .cpg file is ignored as well. I have to pick the proper encoding manually, if shapefile has different encoding then UTF-8. tested using:
Windows 7 64bit (different machine)
|
Author Name: Jérôme Seigneuret (Jérôme Seigneuret) Giovanni Manghi wrote:
I have desactive all 3rd party plugin I do a test to activate an desactivate the ignore encoding checkbox. There is no crash but there is no modification to... Encoding is gray all time My test
That is not ignore shapefile encoding but use shapefile encoding |
Author Name: Giovanni Manghi (@gioman)
what qgis version? |
Author Name: Giovanni Manghi (@gioman)
|
Author Name: Jérôme Seigneuret (Jérôme Seigneuret) Version de QGIS 3.2.2-Bonn |
Author Name: Giovanni Manghi (@gioman)
|
Author Name: Jürgen Fischer (@jef-n) Bulk closing 82 tickets in feedback state for more than 90 days affecting an old version. Feel free to reopen if it still applies to a current version and you have more information that clarify the issue.
|
Author Name: Johannes Kroeger (Johannes Kroeger) Related #29131 |
Author Name: Jürgen Fischer (@jef-n)
|
This is still happening and still a major issue in acceptance of QGIS as fully qualified GIS software ("it can't even handle Umlauts?!") Please re-open. |
@kannes can you elaborate? Attach a sample I can use to check? |
Sure, try #21264 (comment) |
…o unchecked If this setting is enabled by default, then we are bypassing OGR's internal logic for determining the correct shapefile encoding declaration and ignoring information embedded in shapefiles themselves indicating the correct encoding. This results in greater likelihood of encoding issues when opening shapefiles. We should instead default to trusting the original data creator and OGR to get this right, and offer the "ignore" option as a non-default backup setting ONLY. Fixes qgis#21264, user frustration on mailing lists e.g. http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html
And instead always do the decoding on QGIS' side. This unifies the encoding handling whether or not we are using the underlying shapefile declared encoding (e.g. via LDID or .cpg file) OR are overriding it manually by a user-set encoding. Why? - if we DON'T disable GDAL side encoding support, then there's NO way to change the encoding used when reading shapefiles. And unfortunately the embedded encoding (which is read by GDAL) is sometimes wrong (because shapefiles!), so we need to expose support for users to be able to change and correct this - we can't change this setting on-the-fly. If we don't set it upfront, we can't reverse this decision later when a user does want/need to manually specify the encoding This also removes a lot of confusing code logic in the provider! Fixes qgis#21264, user frustration on mailing lists e.g. http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html
And instead always do the decoding on QGIS' side. This unifies the encoding handling whether or not we are using the underlying shapefile declared encoding (e.g. via LDID or .cpg file) OR are overriding it manually by a user-set encoding. Why? - if we DON'T disable GDAL side encoding support, then there's NO way to change the encoding used when reading shapefiles. And unfortunately the embedded encoding (which is read by GDAL) is sometimes wrong (because shapefiles!), so we need to expose support for users to be able to change and correct this - we can't change this setting on-the-fly. If we don't set it upfront, we can't reverse this decision later when a user does want/need to manually specify the encoding This also removes a lot of confusing code logic in the provider! Fixes #21264, user frustration on mailing lists e.g. http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html
And instead always do the decoding on QGIS' side. This unifies the encoding handling whether or not we are using the underlying shapefile declared encoding (e.g. via LDID or .cpg file) OR are overriding it manually by a user-set encoding. Why? - if we DON'T disable GDAL side encoding support, then there's NO way to change the encoding used when reading shapefiles. And unfortunately the embedded encoding (which is read by GDAL) is sometimes wrong (because shapefiles!), so we need to expose support for users to be able to change and correct this - we can't change this setting on-the-fly. If we don't set it upfront, we can't reverse this decision later when a user does want/need to manually specify the encoding This also removes a lot of confusing code logic in the provider! Fixes qgis#21264, user frustration on mailing lists e.g. http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html (cherry picked from commit f36bd8f)
And instead always do the decoding on QGIS' side. This unifies the encoding handling whether or not we are using the underlying shapefile declared encoding (e.g. via LDID or .cpg file) OR are overriding it manually by a user-set encoding. Why? - if we DON'T disable GDAL side encoding support, then there's NO way to change the encoding used when reading shapefiles. And unfortunately the embedded encoding (which is read by GDAL) is sometimes wrong (because shapefiles!), so we need to expose support for users to be able to change and correct this - we can't change this setting on-the-fly. If we don't set it upfront, we can't reverse this decision later when a user does want/need to manually specify the encoding This also removes a lot of confusing code logic in the provider! Fixes #21264, user frustration on mailing lists e.g. http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html (cherry picked from commit f36bd8f)
Author Name: Adrian Klink (@aklink)
Original Redmine Issue: 13203
Affected QGIS version: 3.2.2
Redmine category:vectors
When opening a shapefile the .cpg file is ignored and default is used instead (ISO8859-1 in my case when using Drag-and-Drop, UTF-8 when using file open dialog).
I saved a shapefile (point, line, or polygon) with UTF-8 encoding (.cpg with UTF-8 was created). When opening via Drag-and-Drop the .cpg file is ignored and file is being opened with wrong encoding (ISO8859-1 instead of UTF-8) resulting in broken chars. When opening via add vector layer (ctrl + shift + v) using open file dialog, UTF-8 is used as default (which can be changed), but .cpg file is ignored as well. I have to pick the proper encoding manually, if shapefile has different encoding then UTF-8.
tested using:
Windows 8.1 64bit
Windows 7 64bit (different machine)
Related issue(s): #14989 (relates), #15355 (relates), #26669 (relates), #29131 (relates)
Redmine related issue(s): 5255, 5911, 18782, 21313
The text was updated successfully, but these errors were encountered: