[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

pathmapper · 2023-01-31T15:40:54Z

What is the bug or the crash?

Saving CSV edit (new field) results in QGIS not responding anymore
(QGIS never comes back, waited at least 15 minutes on a powerful machine with lots of cores and ram 😉 ).

Steps to reproduce the issue

Drag'n'drop test.csv in QGIS
Open attribute table
Create new text field:

Save edits (toggle edit mode in attribute table)
See error: QGIS is not responding anymore

Versions

QGIS version	3.29.0-Master	QGIS code revision	`10748bd`
Qt version	5.15.3
Python version	3.10.6
GDAL/OGR version	3.7.0dev-6bdeb5623b
PROJ version	8.2.1
EPSG Registry database version	v10.041 (2021-12-03)
GEOS version	3.10.2-CAPI-1.16.0
SQLite version	3.37.2
PostgreSQL client version	unknown
SpatiaLite version	5.0.1
QWT version	6.1.4
QScintilla2 version	2.11.6
OS version	Ubuntu 22.04.1 LTS

Active Python plugins
grassprovider	2.12.99
MetaSearch	0.3.6
processing	2.12.99
db_manager	0.1.20

Supported QGIS version

I'm running a supported QGIS version according to the roadmap.

New profile

I tried with a new QGIS profile

Additional context

Doing the same with the first 1000 features of the sample data, there are no issues.

The text was updated successfully, but these errors were encountered:

elpaso · 2023-02-01T09:40:02Z

I can confirm: it takes ages but it eventually completes without errors.

QGIS processes one feature at a time, and the OGR provider reads the CSV line by line every time progressively slowing down as the process proceeds.

pathmapper · 2023-02-01T09:46:15Z

Do you mean after one feature is updated with the new field the whole CSV (= all features) is read again before updating the next feature?

elpaso · 2023-02-01T10:02:47Z

Do you mean after one feature is updated with the new field the whole CSV (= all features) is read again before updating the next feature?

Yes, that's what happens.

elpaso · 2023-02-01T10:09:50Z

@pathmapper from https://gdal.org/drivers/vector/csv.html:

The OGR CSV driver supports reading and writing. Because the CSV format has variable length text lines, reading is done sequentially. Reading features in random order will generally be very slow.

pathmapper · 2023-02-01T10:40:09Z

Thanks @elpaso for taking a look.

I've read the OGR docs but wasn't aware that writing (with the new field) also involves reading the CSV again on every feature update, which means for the sample data reading it 15495 times.

Although reading is very slow according to the docs , the performance is very good when loading the CSV in QGIS (1x read).

Do you think there could be something improved on QGIS side or should we close this issue?

A workaround would be to load the CSV in QGIS, export as GPKG, do the edits and finally export the GPKG as CSV (which is fast).

elpaso · 2023-02-01T10:55:59Z

@pathmapper I'm having a look but I'm afraid there isn't an easy fix, a possible approach would be to handle the case of full layer update differently than the random edit which is implemented in the layer edit buffer, but this would require a new API in QGIS (thinking out loud now) such as a flag in the update buffer to know if we are performing a full layer update, in that case we could use an iterator to loop through all the features sequentially and update them, this would cut down the CSV read number from 15000 to 1.

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

andreasneumann · 2023-02-01T11:00:21Z

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

this might be a candidate for a grant project.

elpaso · 2023-02-01T11:04:15Z

I'm working with the QGIS bugfixing budget right now, I think this is too big an effort to rely on that budget.

this might be a candidate for a grant project.

Yeah, I was thinking the same, but let me spend a little more time to see if is there a quick fix before I give up.

Fix qgis#51668

Fix #51668

pathmapper added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Jan 31, 2023

elpaso self-assigned this Feb 1, 2023

elpaso changed the title ~~[ogr provider] Saving CSV edit results in QGIS not responding anymore~~ [ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) Feb 1, 2023

elpaso added a commit to elpaso/QGIS that referenced this issue Feb 1, 2023

OGR CSV: fix slow update

508e1dd

Fix qgis#51668

elpaso mentioned this issue Feb 1, 2023

OGR CSV: fix slow update #51686

Merged

elpaso added a commit to elpaso/QGIS that referenced this issue Feb 1, 2023

OGR CSV: fix slow update

923d7d9

Fix qgis#51668

nyalldawson closed this as completed in #51686 Feb 1, 2023

nyalldawson pushed a commit that referenced this issue Feb 1, 2023

OGR CSV: fix slow update

aa033db

Fix #51668

qgis-bot pushed a commit that referenced this issue Feb 1, 2023

OGR CSV: fix slow update

79a0547

Fix #51668

nyalldawson pushed a commit that referenced this issue Feb 7, 2023

OGR CSV: fix slow update

0f0ac43

Fix #51668

nyalldawson pushed a commit that referenced this issue Mar 5, 2023

OGR CSV: fix slow update

003b7db

Fix #51668

nyalldawson pushed a commit that referenced this issue Mar 6, 2023

OGR CSV: fix slow update

a4fc2b5

Fix #51668

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

pathmapper commented Jan 31, 2023 •

edited

Loading

elpaso commented Feb 1, 2023

pathmapper commented Feb 1, 2023

elpaso commented Feb 1, 2023

elpaso commented Feb 1, 2023

pathmapper commented Feb 1, 2023

elpaso commented Feb 1, 2023

andreasneumann commented Feb 1, 2023

elpaso commented Feb 1, 2023

[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

[ogr provider] Saving big CSV edit is very slow when the whole file has to be updated (such as adding a new field) #51668

Comments

pathmapper commented Jan 31, 2023 • edited Loading

What is the bug or the crash?

Steps to reproduce the issue

Versions

Supported QGIS version

New profile

Additional context

elpaso commented Feb 1, 2023

pathmapper commented Feb 1, 2023

elpaso commented Feb 1, 2023

elpaso commented Feb 1, 2023

pathmapper commented Feb 1, 2023

elpaso commented Feb 1, 2023

andreasneumann commented Feb 1, 2023

elpaso commented Feb 1, 2023

pathmapper commented Jan 31, 2023 •

edited

Loading