Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

Closed
2 tasks done
pathmapper opened this issue Jan 31, 2023 · 8 comments 路 Fixed by #51686
Assignees
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!

Comments

@pathmapper
Copy link
Contributor

pathmapper commented Jan 31, 2023

What is the bug or the crash?

Saving CSV edit (new field) results in QGIS not responding anymore
(QGIS never comes back, waited at least 15 minutes on a powerful machine with lots of cores and ram 馃槈 ).

Steps to reproduce the issue

  1. Drag'n'drop test.csv in QGIS
  2. Open attribute table
  3. Create new text field:

  1. Save edits (toggle edit mode in attribute table)
  2. See error: QGIS is not responding anymore

Versions

QGIS version 3.29.0-Master QGIS code revision 10748bd
Qt version 5.15.3
Python version 3.10.6
GDAL/OGR version 3.7.0dev-6bdeb5623b
PROJ version 8.2.1
EPSG Registry database version v10.041 (2021-12-03)
GEOS version 3.10.2-CAPI-1.16.0
SQLite version 3.37.2
PostgreSQL client version unknown
SpatiaLite version 5.0.1
QWT version 6.1.4
QScintilla2 version 2.11.6
OS version Ubuntu 22.04.1 LTS
Active Python plugins
grassprovider 2.12.99
MetaSearch 0.3.6
processing 2.12.99
db_manager 0.1.20

Supported QGIS version

  • I'm running a supported QGIS version according to the roadmap.

New profile

  • I tried with a new QGIS profile

Additional context

Doing the same with the first 1000 features of the sample data, there are no issues.

@pathmapper pathmapper added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jan 31, 2023
@elpaso elpaso self-assigned this Feb 1, 2023
@elpaso
Copy link
Contributor

elpaso commented Feb 1, 2023

I can confirm: it takes ages but it eventually completes without errors.

QGIS processes one feature at a time, and the OGR provider reads the CSV line by line every time progressively slowing down as the process proceeds.

@pathmapper
Copy link
Contributor Author

Do you mean after one feature is updated with the new field the whole CSV (= all features) is read again before updating the next feature?

@elpaso
Copy link
Contributor

elpaso commented Feb 1, 2023

Do you mean after one feature is updated with the new field the whole CSV (= all features) is read again before updating the next feature?

Yes, that's what happens.

@elpaso
Copy link
Contributor

elpaso commented Feb 1, 2023

@pathmapper from https://gdal.org/drivers/vector/csv.html:

The OGR CSV driver supports reading and writing. Because the CSV format has variable length text lines, reading is done sequentially. Reading features in random order will generally be very slow.

@pathmapper
Copy link
Contributor Author

Thanks @elpaso for taking a look.

I've read the OGR docs but wasn't aware that writing (with the new field) also involves reading the CSV again on every feature update, which means for the sample data reading it 15495 times.

Although reading is very slow according to the docs , the performance is very good when loading the CSV in QGIS (1x read).

Do you think there could be something improved on QGIS side or should we close this issue?

A workaround would be to load the CSV in QGIS, export as GPKG, do the edits and finally export the GPKG as CSV (which is fast).

@elpaso
Copy link
Contributor

elpaso commented Feb 1, 2023

@pathmapper I'm having a look but I'm afraid there isn't an easy fix, a possible approach would be to handle the case of full layer update differently than the random edit which is implemented in the layer edit buffer, but this would require a new API in QGIS (thinking out loud now) such as a flag in the update buffer to know if we are performing a full layer update, in that case we could use an iterator to loop through all the features sequentially and update them, this would cut down the CSV read number from 15000 to 1.

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

@elpaso elpaso changed the title [ogr provider] Saving CSV edit results in QGIS not responding anymore [ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) Feb 1, 2023
@andreasneumann
Copy link
Member

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

this might be a candidate for a grant project.

@elpaso
Copy link
Contributor

elpaso commented Feb 1, 2023

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

this might be a candidate for a grant project.

Yeah, I was thinking the same, but let me spend a little more time to see if is there a quick fix before I give up.

elpaso added a commit to elpaso/QGIS that referenced this issue Feb 1, 2023
elpaso added a commit to elpaso/QGIS that referenced this issue Feb 1, 2023
nyalldawson pushed a commit that referenced this issue Feb 1, 2023
qgis-bot pushed a commit that referenced this issue Feb 1, 2023
nyalldawson pushed a commit that referenced this issue Feb 7, 2023
nyalldawson pushed a commit that referenced this issue Mar 5, 2023
nyalldawson pushed a commit that referenced this issue Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants