Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted/edited features within SHAPEFILE are still recognized in other software packages #19349

Closed
qgib opened this issue Aug 5, 2014 · 111 comments
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Crash/Data Corruption Data Provider Related to specific vector, raster or mesh data providers
Milestone

Comments

@qgib
Copy link
Contributor

qgib commented Aug 5, 2014

Author Name: Saber Razmjooei (@saberraz)
Original Redmine Issue: 11007
Affected QGIS version: 2.8.5
Redmine category:data_provider/ogr
Assignee: Marco Hugentobler


When working with SAGA modules, it ignores "Delete flag" in shapefile and uses those deleted features within the processes.

To replicate the issue:
1- Open a shapefile
2- Delete a feature
3- Save the shapefile
4- Run a SAGA process on the shapefile
5- Result of the process takes into account the deleted feature, despite the fact that it has been deleted and saved.



Related issue(s): #17110 (relates), #17515 (relates), #18895 (relates), #18967 (relates), #19680 (relates), #20772 (relates), #21471 (relates), #21478 (duplicates), #21968 (duplicates), #22095 (relates), #22485 (duplicates), #23337 (relates)
Redmine related issue(s): 8317, 8822, 10483, 10560, 11398, 12660, 13422, 13431, 13953, 14085, 14512, 15407


@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Giovanni Manghi (@gioman)


Hi Saber, is this a SAGA issue or a QGIS one? What happens if you do the same using native SAGA instead of Processing?


  • status_id was changed from Open to Feedback

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Saber Razmjooei (@saberraz)


Gio, I suppose it is a problem with SAGA. But will be to have it fixed upstream or work-around in QGIS. Otherwise, it might confuse users.

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Victor Olaya (@volaya)


This is really weird. If you have modified the shapefile and deleted data...how can SAGA know that there was a feature in there that now is missing?? If you have changed the shapefile, there is no way the deleted feature can be recovered.

Maybe you are saving to a different file and then using the original to call SAGA?

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Saber Razmjooei (@saberraz)


Victor,

I guess in QGIS, when you delete a feature in a shapefile, it does not physically delete that feature. It flags it as deleted but it is still there.

The shapefile gets "cleaned" when you save as...

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Giovanni Manghi (@gioman)


Saber Razmjooei wrote:

Victor,

I guess in QGIS, when you delete a feature in a shapefile, it does not physically delete that feature. It flags it as deleted but it is still there.

The shapefile gets "cleaned" when you save as...

Hi Saber, this does not seems to be the case. Delete a feature, save edits and load the shape in another QGIS window (or any other GIS package), the feature is not there.

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2014

Author Name: Saber Razmjooei (@saberraz)


Sorry Gio, that's correct. The bug only happens in active QGIS window, where editing took place.

@qgib
Copy link
Contributor Author

qgib commented Sep 25, 2014

Author Name: Victor Olaya (@volaya)


Saber

Can you confirm the QGIS version you are using? I am finding that behaviour (deleted features that are actually not deleted in the shapefile, even after closing the edition), not just in Processing, but in QGIS in general. Taking the modified layer to a different software (as Processing does when passing the layer to SAGA) shows the deleted features. But I see this issue only in 2.4, not in 2.2. IF this is the case, then it's a QGIS issue and we should open another ticket.

@qgib
Copy link
Contributor Author

qgib commented Sep 25, 2014

Author Name: Victor Olaya (@volaya)


IT seems DBF files have a deleted flag (http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_NOTE_9_TARGET). SAGA might not recognise it, and probably the way QGIS was removing features before was just actually eliminating them, while now it sets that flag to deleted. That would explain the error.

@qgib
Copy link
Contributor Author

qgib commented Sep 25, 2014

Author Name: Saber Razmjooei (@saberraz)


Victor, it is in 2.4. As you said the problem is not limited only to the Processing. The deleted features appear in Vector > Geoprocessing results too.

@qgib
Copy link
Contributor Author

qgib commented Oct 4, 2014

Author Name: Giovanni Manghi (@gioman)


  • version was configured as 2.4.0
  • project_id was changed from 78 to 17
  • category_id removed 56
  • crashes_corrupts_data was configured as 0

@qgib
Copy link
Contributor Author

qgib commented Oct 4, 2014

Author Name: Giovanni Manghi (@gioman)


  • category_id was configured as Processing/SAGA

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Giovanni Manghi (@gioman)


see also #19592

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Giovanni Manghi (@gioman)


Victor Olaya wrote:
But I see this issue only in 2.4, not in 2.2. IF this is the case, then it's a QGIS issue and we should open another ticket.

it affects also 2.2 and previous qgis releases (and master).


  • crashes_corrupts_data was changed from 0 to 1
  • subject was changed from SAGA does not recognise deleted features within SHAPEFILE to Deleted/edited features within SHAPEFILE are still recognized in other software packages
  • category_id was changed from Processing/SAGA to Digitising
  • status_id was changed from Feedback to Open
  • assigned_to_id removed Victor Olaya
  • priority_id was changed from Normal to High
  • version was changed from 2.4.0 to master

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Giovanni Manghi (@gioman)


Saber Razmjooei wrote:

Victor, it is in 2.4. As you said the problem is not limited only to the Processing. The deleted features appear in Vector > Geoprocessing results too.

Hi Saber, while I confirm it affects also other software packages I cannot confirm that affects the own QGIS geoprocessing tools.

See also #19680

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Saber Razmjooei (@saberraz)


Gio,
My bad, you are right. It works even in edit mode, when your changes still have not been saved!

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Giovanni Manghi (@gioman)


In #19680 there are example of software packages affected bu this issue, anyway SAGA it is also affected and it is easy to test as it comes with QGIS.

A few notes:


a) after deleting a feature using the shape as input for SAGA gives

error: DBase file could not be opened.

removing and re-adding the shape and running it against SAGA again works without errors, but the operation (ex: buffer) takes into account also the features that it should have been deleted


b) using the node tool to edit a feature and using the shape as input in SAGA gives

error: DBase file could not be opened.

removing and re-adding the shape and running it against SAGA again works without errors and as expected


c) using the reshape tool to edit a feature and using the shape as input in SAGA gives

error: DBase file could not be opened.

removing and re-adding the shape and running it against SAGA again works without errors, but the operation (ex: buffer) result is as it was run against the input before being edited in qgis.


d) using the split features tool to edit a feature and using the shape as input in SAGA gives

error: corrupted shapefile.

removing and re-adding the shape does not help.

It is not strictly a regression, but given the huge interoperability issues that this issue creates I would like to ask to raise this as blocker.


  • status_id was changed from Open to Feedback

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2014

Author Name: Giovanni Manghi (@gioman)


I made some tests with ArcGIS 10:


a) delete a feature, save. Add the shape into arcgis, the deleted feature is there (both geometry and attribute).

so this confirms (again) #19592

Using the shape as input in arcgis (ex: buffer) then the "phantom" feature is removed from input and output is as expected.


doing operations as b) c) d) does not seems to have any bad effect when adding the shape in arcgis

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


it is being very hard to find a clear pattern:

there are shapes that when edited always causes SAGA (again, we are using SAGA in first place because it is easy available and works together with QGIS) to use also the edited/deleted features or to throw an error (error: corrupted shapefile).

other sw packages seems affected also in the cases when SAGA it is not.

the only thing that seems to work almost all the times is to re-save the shapefile as a copy.

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Jukka Rahkonen (Jukka Rahkonen)


I was reading this stackexchange question http://gis.stackexchange.com/questions/118689/cant-transform-lines-to-polygons with a sample data available for a few days now at http://dropcanvas.com/7i4oq.
When the shapefile is opened with QGIS it seems to have 12 linestrings (shp_qgis.png). However, when the same shapefile is opened with OpenJUMP it contains 85 lines which are different (shp_oj1.png). Ogrinfo does also see 85 linestrings but after running ogr2ogr from shape to shape, the new shapefile is clean and has 12 linestrings (shp_oj2.png). See attached images. Notice that for another software there are some extra features in the shapefile but also some other, newly added with QGIS, which are missing. I would suggest to raise the priority level from high because of corrupted data.

Ogrinfo summary:

ogrinfo TesteLayer.shp -al -so
INFO: Open of TesteLayer.shp' using driver ESRI Shapefile' successful.

Layer name: TesteLayer
Geometry: Line String
Feature Count: 85
Extent: (-54831.944803, 144077.515208) - (-20131.746100, 200497.422392)


  • 7962 was configured as shp_oj1.png
  • 7960 was configured as shp_qgis.png
  • 7961 was configured as shp_oj2.png

  • shp_oj1.png (Jukka Rahkonen) - Shp opened with QGIS
  • shp_qgis.png (Jukka Rahkonen) - Shp opened with QGIS
  • shp_oj2.png (Jukka Rahkonen) - Original shp and an ogr2ogr saved copy together

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


Jukka Rahkonen wrote:

I was reading this stackexchange question http://gis.stackexchange.com/questions/118689/cant-transform-lines-to-polygons with a sample data available for a few days now at http://dropcanvas.com/7i4oq.
When the shapefile is opened with QGIS it seems to have 12 linestrings (shp_qgis.png). However, when the same shapefile is opened with OpenJUMP it contains 85 lines which are different (shp_oj1.png). Ogrinfo does also see 85 linestrings but after running ogr2ogr from shape to shape, the new shapefile is clean and has 12 linestrings (shp_oj2.png). See attached images. Notice that for another software there are some extra features in the shapefile but also some other, newly added with QGIS, which are missing. I would suggest to raise the priority level from high because of corrupted data.

it is indeed a strange situation:

another software, gvsig, reads the shape in a different way from qgis and openjump. Re-saving the shape with this sw the first returns a shape with 85 features
and the other with 93 (as shown by ogr and qgis).

Any software shows the same data when re-saving the shapefile with qgis or ogr2ogr (12 features).

My guess is that the shape was edited (features deleted) with qgis, this we know now that leave the shape in a state that is inconsistent and gives unexpcted results in other software. If the shape is re-saved after the edits apparently something is used (a parameter in ogr2ogr?) and the vector gets a consistent state.

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Matthias Kuhn (@m-kuhn)


QGIS calls repack() which is supposed to clean the dbf whenever removing/unloading a layer from which features have been removed in the running QGIS session. You can test this behavior by deleting a feature from the test dataset mentioned above and then unloading the layer. The file size of the dbf decreases by a couple of kB.

Calling repack() in a running QGIS session is dangerous because it changes feature ids. And QGIS (and its plugins) should be able to rely on feature ids being unchanged for the lifetime of a layer.

  • The easiest solution would be to rewrite the file to another location before sending it to SAGA.
  • Or raising a feature request for SAGA to support the dbf flag for deleted features (is this an official feature of this file format?)
  • Create an OGR feature request, that repack() can be called without changing feature ids (e.g. it could remap from old feature ids to new feature ids internally. We can't do this on our side, the repack() process is transparent to QGIS I think).
  • Let SAGA use a QGIS feature iterator :)
  • Decide that feature ids can change in a running session (and internally send a signal that any cached information needs to be discarded)

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


Matthias Kuhn wrote:

QGIS calls repack() which is supposed to clean the dbf whenever removing/unloading a layer from which features have been removed in the running QGIS session. You can test this behavior by deleting a feature from the test dataset mentioned above and then unloading the layer. The file size of the dbf decreases by a couple of kB.

Calling repack() in a running QGIS session is dangerous because it changes feature ids. And QGIS (and its plugins) should be able to rely on feature ids being unchanged for the lifetime of a layer.

  • The easiest solution would be to rewrite the file to another location before sending it to SAGA.
  • Or raising a feature request for SAGA to support the dbf flag for deleted features (is this an official feature of this file format?)
  • Create an OGR feature request, that repack() can be called without changing feature ids (e.g. it could remap from old feature ids to new feature ids internally. We can't do this on our side, the repack() process is transparent to QGIS I think).
  • Let SAGA use a QGIS feature iterator :)
  • Decide that feature ids can change in a running session (and internally send a signal that any cached information needs to be discarded)

Hi Matthias, it is not (just) a SAGA issue, that would be the last of our problems.

The issue is also with other very popular gis packages.

Add a shape in qgis, edit it (delete,reshape,split,etc.), save edits. Remove the shape (or not) and open it in such software, the "phantom" features are there...
re-save the shape with another name in qgis and everything is ok also in other software. Asking users to re-save shapes before exchanging them seems a bit... strange and unpractical to say the least.

cheers!

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Matthias Kuhn (@m-kuhn)


If you remove features in QGIS, close the project and open the shapefile in other software, the file should be clean. Is it not?

Can you confirm that QGIS cleans the sample data from this report with the steps I outlined above:

  • delete a feature, save
  • unload layer

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


Matthias Kuhn wrote:

If you remove features in QGIS, close the project and open the shapefile in other software, the file should be clean. Is it not?

Can you confirm that QGIS cleans the sample data from this report with the steps I outlined above:

  • delete a feature, save
  • unload layer

cannot confirm, unloading the layer from project and/or closing qgis does not clean the shape.

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


cannot confirm, unloading the layer from project and/or closing qgis does not clean the shape.

re-saving it does.

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


Matthias Kuhn wrote:

If you remove features in QGIS, close the project and open the shapefile in other software, the file should be clean. Is it not?

Can you confirm that QGIS cleans the sample data from this report with the steps I outlined above:

  • delete a feature, save
  • unload layer

in my pc I can only test open source programs, but what you can see in the attached image happens also with closed source one. The shape was reshaped in qgis, saved, removed from project, closed qgis and opened in openjump.


  • 7963 was configured as screenshot3.png

  • screenshot3.png (Giovanni Manghi) - shape as it shows after editing in qgis

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Matthias Kuhn (@m-kuhn)


Cannot reproduce this here in this case.

Just to be sure: you deleted a feature first, then you saved, then you removed the layer from the legend, correct (i.e. the deleting part is important, just because you didn't mention it in your comment)?

After doing this here:

INFO: Open of /tmp/orbit-kk/TesteLayer.shp' using driver ESRI Shapefile' successful.

Layer name: TesteLayer
Geometry: Line String
Feature Count: 11
Extent: (-43350.686045, 162465.264117) - (-31426.039433, 174737.909080)
Layer SRS WKT:
PROJCS["ETRS89_Portugal_TM06",
GEOGCS["GCS_ETRS_1989",
DATUM["European_Terrestrial_Reference_System_1989",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",39.66825833333333],
PARAMETER["central_meridian",-8.133108333333334],
PARAMETER["scale_factor",1],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["Meter",1]]
LAYER: String (32.0)
COLOR: Integer (6.0)
ID: Integer (5.0)

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Giovanni Manghi (@gioman)


Matthias Kuhn wrote:

Cannot reproduce this here in this case.

Just to be sure: you deleted a feature first, then you saved, then you removed the layer from the legend, correct (i.e. the deleting part is important, just because you didn't mention it in your comment)?

  • open shape in qgis
  • edit shape (delete feature, reshape, split and probably other tools exept for the node tool, that does not seems (underline "seems") create issues)
  • save shape
  • remove shape from project
  • close qgis
  • add shape in another software

notes from the many comments (also in duplicate tickets)

  • not all input shapes seems to be affected in the same way. This one

https://issues.qgis.org/attachments/7917/Test501_before_editing.zip

is a good example to replicate the issues.

  • not all other gis sw shows the issues in the same way, for example: loading such "dirty" shapes in ArcGIS causes the program to show also the incorrect/deleted features, but then if the vector is used for a geoprocessing operation the program seems to clean the shape beforehand. SAGA and many other sw shows on canvas the incorrect/deleted features and the same features are considered as good when using the vector for geoprocessing. Common are also messages that the shapefile is corrupted, or the dbf is corrupted or the dbf has the wrong number of records.

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Matthias Kuhn (@m-kuhn)


I am using

ogrinfo -al -so

And it always shows the correct feature count. For Test501 it's originally 2 (it's clean in the state when it's downloaded, right?) and subsequently it always reflects the correct number (tried to remove / split / split and remove). I don't know what I can do to try harder... :(

@qgib
Copy link
Contributor Author

qgib commented Oct 17, 2014

Author Name: Saber Razmjooei (@saberraz)


Matthias,

Try the attached dataset (sample_data.zip).
In processing toolbox, use Shape Buffer (SAGA module with 1 metre buffer) and you should get something like edited_vector_buffered_1m.png. Despite the fact that there is no feature on the east side, still it uses the deleted features from the original vector (vector.zip)

Tested in QGIS 2.4 and Master under windows 7 (OSGeo4W install)


  • 7965 was configured as edited_vector_buffered_1m.png
  • 7964 was configured as sample_data.zip
  • 7966 was configured as vector.zip

@qgib qgib added Data Provider Related to specific vector, raster or mesh data providers Crash/Data Corruption labels May 25, 2019
@qgib qgib added this to the Version 2.14 milestone May 25, 2019
@qgib qgib closed this as completed May 25, 2019
This was referenced May 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Crash/Data Corruption Data Provider Related to specific vector, raster or mesh data providers
Projects
None yet
Development

No branches or pull requests

1 participant