-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668
Comments
I can confirm: it takes ages but it eventually completes without errors. QGIS processes one feature at a time, and the OGR provider reads the CSV line by line every time progressively slowing down as the process proceeds. |
Do you mean after one feature is updated with the new field the whole CSV (= all features) is read again before updating the next feature? |
Yes, that's what happens. |
@pathmapper from https://gdal.org/drivers/vector/csv.html: The OGR CSV driver supports reading and writing. Because the CSV format has variable length text lines, reading is done sequentially. Reading features in random order will generally be very slow. |
Thanks @elpaso for taking a look. I've read the OGR docs but wasn't aware that writing (with the new field) also involves reading the CSV again on every feature update, which means for the sample data reading it 15495 times. Although reading is very slow according to the docs , the performance is very good when loading the CSV in QGIS (1x read). Do you think there could be something improved on QGIS side or should we close this issue? A workaround would be to load the CSV in QGIS, export as GPKG, do the edits and finally export the GPKG as CSV (which is fast). |
@pathmapper I'm having a look but I'm afraid there isn't an easy fix, a possible approach would be to handle the case of full layer update differently than the random edit which is implemented in the layer edit buffer, but this would require a new API in QGIS (thinking out loud now) such as a flag in the update buffer to know if we are performing a full layer update, in that case we could use an iterator to loop through all the features sequentially and update them, this would cut down the CSV read number from 15000 to 1. I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget. |
this might be a candidate for a grant project. |
Yeah, I was thinking the same, but let me spend a little more time to see if is there a quick fix before I give up. |
What is the bug or the crash?
Saving CSV edit (new field) results in QGIS not responding anymore
(QGIS never comes back, waited at least 15 minutes on a powerful machine with lots of cores and ram 馃槈 ).
Steps to reproduce the issue
Versions
Supported QGIS version
New profile
Additional context
Doing the same with the first 1000 features of the sample data, there are no issues.
The text was updated successfully, but these errors were encountered: