-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shapefile data corruption #23493
Comments
Author Name: Even Rouault (@rouault) @Nyall When this happens,
The cause of this is that somethings locks the xxx.shp and/or xxx.shx files which prevents them from being deleted and their _packed versions (the new version that has been packed) to be renamed as xxx.shp/shx. But in that event, you should still be able to open the original xxx.shp/shx files (the ones before packing). So I'm perhaps missing something. The issue might be instead that something (the browser? an antivirus ?) locks the _packed files, which prevents them from being renaming (but in that case the xxx.shp/shx should be deleted, unless they are locked too). The related logic in the packing method in GDAL is there https://github.com/OSGeo/gdal/blob/ce6c6098ac4c9e6805a4c9d5c8ee0acace09191a/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L2560 ( oTempFile is the basename of the _packed files). I've just committed https://trac.osgeo.org/gdal/changeset/35467 in GDAL trunk and 2.1 branch to increase the verbosity when things go wrong (adding errors for failed VSIUnlink() and changing the debug traces into loud errors). Hum seeing https://github.com/qgis/QGIS/blob/master/src/providers/ogr/qgsogrprovider.cpp#L2951 , which is the only function to call repack(), I'm wondering if the QgsOgrConnPool::instance()->invalidateConnections( dataSourceUri() ) call done at line 2969 shouldn't be added as well to the top of repack(), around https://github.com/qgis/QGIS/blob/master/src/providers/ogr/qgsogrprovider.cpp#L144 . The issue might be that a reader (the attribute table e.g) has a connection from the connection pool on the shapefile, which could interfere with deletion/renaming. |
Author Name: Jukka Rahkonen (Jukka Rahkonen) I have a feeling that this issue is related #23323. Perhaps the "read only" copies keep some files locked. |
Author Name: Nyall Dawson (@nyalldawson) Even - thanks for the hints! Here's some answers:
Yes, both those files are still present
Yes, correct. If I also rename the _packed.shp then the file cannot be opened.
Hmmm - it's possibly related, but the first time I encountered this I had the same project opened in two QGIS instances, so the shapefile would have been opened in both but being edited only in one. Maybe this situation could cause the rename to fail? Unfortunately I tried to reproduce this scenario immediately after but could not reproduce.
When I hit this issue on Friday I definitely had the attribute table window open for the layer - so maybe that is related. Here's the exact steps I did leading up to the issue:
This was the 3rd time I've hit this - the first 2 were a month or so ago (on 2.16), and I couldn't reproduce. This most recent occurance was using the master_2 builds from osgeo4w. Same machine though - using Windows 7 64 bit. I'll do some more tests on monday and try to narrow it down to a reproducable workflow. |
Author Name: Even Rouault (@rouault) I can't make sense of the fact that if you rename both the _packed.shp and _packed.shx as non packed, you get a corrupted dataset. If both exist, they should be consistant between them. Would be potentially helpful that you attach the .shp, .shx, _packed.shp, _packed.shx, etc... files when a corruption occurs. Have you noticed remains of _packed.dbf files as well ? If you delete records at some steps, which seem to be the case in your last case, there should be compaction of the dbf and a temporary _packed.dbf being creating as well |
Author Name: Nyall Dawson (@nyalldawson) Even - I've got a copy of the corrupted data here (as it was immediately afterwards, with the _packed files) but cannot share publicly. I'll email it to you. |
Author Name: Even Rouault (@rouault) In the zip sent by Nyall, there's :
The fact that there's no _packed.dbf file and the .dbf file has a hole would make think that DBF compaction didn't occur at all, which would be a different bug in itself, for which I don't have any explanation (or that for some weird reason the _packed.dbf file disappeared during the failed renaming to .dbf, but that doesn't make much sense). Regarding the .shp,.shx, given that the order of operations is :
my theory is the following one :
|
Author Name: Antoine de Beaurepaire (Antoine de Beaurepaire) Having the same issue here on QGIS 2.14.6. I have noticed a few weeks ago the same bug and managed to avoid it by waiting carfully during saving and not clicking at all in the QGIS interface. Now, even if being very carful, the issue is still happening. Can't found any explanation. I can give some more infos on demand if needed. |
Author Name: Even Rouault (@rouault) I spent the last hours replicating the issue, and was somehow successful in doing so. There are 2 different reliable scenarios I found: A) First scenario :
Approximatively half of time, in the Log messages panel, the following error will be reported: "Possible corruption after REPACK detected. %1 still exists. This may point to a permission or locking problem of the original DBF.". But even when it is not reported, looking with ogrinfo shows that the .dbf file wasn't succesfully compacted (holes in feature identifiers). B) Second scenario :
You'll get a message "Cannot reopen datasource xxx.shp|layerid=0 in read-only mode" and "Data source is invalid (Unable to open xxx.shx or xxx.SHX.Try --config SHAPE_RESTORE_SHX true to restore or create it)" in the log. This is with released versions of GDAL. After the fix I did in https://trac.osgeo.org/gdal/changeset/35467, the failure to delete the old .shp file make it early exit, but the .shp isn't effectively compacted on inspection. For the same reason as the .dbf case: the other QGIS process holds a lock on the .shp file. I guess that people can also reproduce those issues with a single process, but in a far less reliable way, if other processes transiently lock the files. Besides antivirus and other Windows services, the QGIS Browser panel and its background tasks probing for new files could also be a culprit. One option I've tried is to add the FILE_SHARE_DELETE share permission (https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx) to the permissions used by GDAL when calling CreateFile(). With that, you can effectively delete the file while still being opened elsewhere, but as the file is still reported as present for the above mentionned reason, you can not rename the _packed files over it. So we're stuck. I've tried the ReplaceFile() Win32 API rather than the delete + rename approach and it seems to work, even if the file to be replaced is opened, but that requires that all handles on it have FILE_SHARE_DELETE set, which we cannot control easily for external code. I come to think that the only reliable solution is to modify the PACK implementation in the OGR Shapefile driver to :
And modify QGIS code to actually detect GDAL errors emitted by the repack function. |
Author Name: Even Rouault (@rouault)
|
Author Name: Even Rouault (@rouault) The fix is in GDAL https://trac.osgeo.org/gdal/ticket/6672. This will land in GDAL 2.1.2 I've committed an improvement in QGIS to raise a QGIS error when GDAL emits a GDAL error + test.
|
Author Name: Jamie Portman (Jamie Portman) Any idea when GDAL 2.1.2 will be released? |
Author Name: Jürgen Fischer (@jef-n)
|
Author Name: Nyall Dawson (@nyalldawson)
Original Redmine Issue: 15570
Affected QGIS version: 2.16.2
Redmine category:unknown
Assignee: Even Rouault
Really nasty bug: QGIS 2.16 (+master) can corrupt shapefiles in certain circumstances. I haven't been able to exactly track down a reproducible workflow for replicating this, but I every time I've encountered this has been doing operations like:
This is a very nasty regression - if users are unaware they can recover the file with the shx rename then they will assume the data is unrecoverable.
The text was updated successfully, but these errors were encountered: