Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export selection from a large WFS layer fails #42049

Closed
LarsAtOmkeere opened this issue Mar 4, 2021 · 15 comments · Fixed by #43336
Closed

Export selection from a large WFS layer fails #42049

LarsAtOmkeere opened this issue Mar 4, 2021 · 15 comments · Fixed by #43336
Assignees
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Vectors Related to general vector layer handling (not specific data formats) WFS data provider

Comments

@LarsAtOmkeere
Copy link

[3.16.4-Hannover, Windowss 10]
** BUG **
Progress dialog stays on screen and export process does not finish by itself.

How to Reproduce

  1. I added a layer 'pand' from the WFS layer BAG (https://geodata.nationaalgeoregister.nl/bag/wfs/v1_1) through the "PDOK Services plugin".
  2. I make a selection of the features I want to export
  3. I choose [right click on layer] 'Export > Export selected objects' and export them to a Geopackage.
  4. I use the default settings and write the data to a new geopackage.
  5. The export starts, but after a short while the dialog stays in the screen, while the export is complete. It looks like it is not. If I choose "Afbreken" (I use the Dutch version, so probably something like Abort), the application tells me the task is done.

QGIS and OS versions
QGIS-versie
3.16.4-Hannover
QGIS code revisie
654e76b
Gecompileerd tegen Qt
5.11.2
Gebruikt nu Qt
5.11.2
Gecompileerd tegen GDAL/OGR
3.1.4
Gebruikt nu GDAL/OGR
3.1.4
Gecompileerd tegen GEOS
3.8.1-CAPI-1.13.3
Gebruikt nu GEOS
3.8.1-CAPI-1.13.3
Gecompileerd tegen SQLite
3.29.0
Uitgevoerd met SQLite
3.29.0
PostgreSQL Client Versie
11.5
SpatiaLite-versie
4.3.0
QWT versie
6.1.3
QScintilla2 Versie
2.10.8
Gecompileerd tegen PROJ
6.3.2
Gebruikt nu PROJ
Rel. 6.3.2, May 1st, 2020
Versie OS
Windows 10 (10.0)
Actieve plug-ins van Python
b4udignl2;
pdokservicesplugin;
pdok_locatieserver_locator_filter;
SpreadsheetLayers;
xyToPoint;
db_manager;
MetaSearch;
processing

@LarsAtOmkeere LarsAtOmkeere added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Mar 4, 2021
@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

I added a layer 'pand' from the WFS layer BAG (https://geodata.nationaalgeoregister.nl/bag/wfs/v1_1) through the "PDOK Services plugin".

@LarsAtOmkeere that seems a very large layer, on a very slow service.

How many features have you selected?

Does it makes any difference if you load the layer with the native WFS client?

@rduivenvoorde pinging you as author of "PDOK Services plugin"

@gioman gioman added the Feedback Waiting on the submitter for answers label Mar 5, 2021
@LarsAtOmkeere
Copy link
Author

LarsAtOmkeere commented Mar 5, 2021 via email

@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

The amount of features didn’t make a difference.

@LarsAtOmkeere does it makes any difference if you load that WFS layer with the QGIS native WFS client?

@LarsAtOmkeere
Copy link
Author

LarsAtOmkeere commented Mar 5, 2021 via email

@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

This time I also tried writing to an existing Geopackage. When I do that, the process finishes. So the bug only seems to arise when writing to a new geopackage.

@LarsAtOmkeere does not seems to make any difference for me on 3.18 on Linux.

@gioman gioman changed the title export selection fails Export selection from a large WFS layer fails Mar 5, 2021
@gioman gioman added WFS data provider Vectors Related to general vector layer handling (not specific data formats) and removed Feedback Waiting on the submitter for answers labels Mar 5, 2021
@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

And by the way, the export process is unkillable...

@LarsAtOmkeere
Copy link
Author

LarsAtOmkeere commented Mar 5, 2021 via email

@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

For me it is possible to kill the process. When I click the right button (“Afbreken” in Dutch) it tells me the process is finished and the data is added to the map. When I click the left button (“Verbergen” in Dutch), the dialog disappears, but the process stays active, but does not finish. And I am no longer capable of getting the dialog back. From that moment on I cannot kill the process.

one way or the other, is messy and not a good UX.

@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

@LarsAtOmkeere Well... it seems that leaving it doing its thing for an awful long amount of time it works as expected. I'm wondering if this is not just because the endpoint is very slow.

@LarsAtOmkeere
Copy link
Author

LarsAtOmkeere commented Mar 5, 2021 via email

@rduivenvoorde
Copy link
Contributor

@gioman @LarsAtOmkeere in my experience WFS is workable for smaller datasets or subsets.
I already 'optimized' the plugin to at least behave the same as when you use the wfs provider: so use paging and wfs 2.0 etc
(see F12 to see all requests that are sent to the services)

Note that in the capabilities you are instructed to only get 1000 buildings per time (which is actually pretty low for such a dataset... 10000 would be a better number), so doing larger area's is sometimes several tens of requests.

In this case: a huge dataset with 8miljon records (all buildings in NL), it works if you select a small area, say 2000 buildings (panden).

Also note that QGIS misses some parts sometimes, when requesting larger area's all grey buildings with a housenumber should be retrieved (pink features):

Screenshot-20210305142416-1495x1069

And even panning or zooming doe NOT give you the buildings anymore. QGIS has decided that in that area everything was received.

I do not know WHAT QGIS triggers to stop requesting (as in: deciding it does not need to request anymore...: the number of sent records? @rouault can you maybe tell this?) Is QGIS 'guessing' an extent for the amount of features? So if the request extent is to large for the 'max step size' you will miss features?

If you request the a smaller area (bantamstraat in Haarlem (there is a geocoder in the pdokservicesplugin)) QGIS will just sent you everything:

Screenshot-20210305142554-1280x935

My guess is that if you keep the needed requests small (so start with a small area and pan around) it works better then trying to retrieve a whole city in one go.

Selecting and then saving to file is an even trickier process, not sure why. I would think that the features are already on the clientside, but often you will see QGIS fire again a lot of requests...

All in all the WFS for a dataset like the BAG is maybe not a good marriage in combi with QGIS

Note that it is possible (from other sources) to retrieve the same dataset (for the whole of NL) as a postgis dump.
But it would be very cool if this would 'just work'...

We do have a lot of public services to test with. I think both Geoserver and Mapserver...
So just use the pdokservicesplugin to find some...

@LarsAtOmkeere
Copy link
Author

LarsAtOmkeere commented Mar 5, 2021 via email

@rduivenvoorde
Copy link
Contributor

With me here I can easily do a SMALL selection (in screenie for 230 buildings) and save to gpkg

Screenshot-20210305150522-1170x894

But as soon as larger (? maybe 1000) you will see that QGIS is apparently looking up the features in the spatialite/sqlite database cache:

Screenshot-20210305151812-1816x373

But at the same time it starts to sent requests to the server (as said: in steps of 1000) BUT WITHOUT BBOX:

https://geodata.nationaalgeoregister.nl/bag/wfs/v1_1?SERVICE=WFS&language=dut&SERVICE=WFS&REQUEST=GetFeature&VERSION=2.0.0&TYPENAMES=bag:pand&STARTINDEX=82000&COUNT=1000&SRSNAME=urn:ogc:def:crs:EPSG::28992

I really do not know WHY, maybe it tries to retrieve all features and then match against the ones in the 'selection'/cache? Which in this case is not very efficient (trying to retrieve 8.000.000 buildings, as we request withoug bbox....)

Anyway: WFS is tricky business and QGIS is doing it's best, we have had a lot of different ways of doing this already, but there are always use cases which miss something.
On the other end: maybe such huge datasets should not be a WFS (but a flat gpkg download service)?

@LarsAtOmkeere as said: I think it is easier to download the postgis dump, load it in postgis and work with that... although off course you will never have the latest/greatest data version then....

@rduivenvoorde
Copy link
Contributor

One other test I did: I hardcoded the mShared->mPageSize = 10000 in the qgswfsprovider.cpp
So EVERY request to the WFS server said: COUNT=10000 but then after the first request the service just sents 1000 and QGIS stops requesting (I think as it decided that because it did not receive the whole page, it was ready)...

@gioman
Copy link
Contributor

gioman commented Mar 5, 2021

Anyway: WFS is tricky business and QGIS is doing it's best, we have had a lot of different ways of doing this already, but there are always use cases which miss something.
On the other end: maybe such huge datasets should not be a WFS (but a flat gpkg download service)?

@rduivenvoorde

It seems to me that the problem is not certainly limited to the service/layer that is described in this ticket. I spent a very frustrating time testing heavy WFS layers (from a few different endpoints( on QGIS 3.18 and the experience has been very poor, both when it was time to just load a layer and also when I tried to select even a few features (even just 1) end then export them.

I think we can't just say "big layers should not go in a WFS service", we should try to give at least an user experience that is at least usable, especially if the "competition" does a better job.

Today I noticed:

  1. impossible to kill the export task if it takes to long, necessary to kill QGIS

  2. "hide" button (of the WFS progress baar) when loading large WFS layers than frequently does not work

  3. very long loading time, even from endpoints that are known to be quite fast (and the server used do not seems the bottleneck, tried at least GeoServer and MapServer).

It could have been just a very bad day... hopefully, but maybe not.

rouault added a commit to rouault/QGIS that referenced this issue May 21, 2021
…res by fids...

and when they are already in the local cache.

Fixes qgis#42049
@rouault rouault self-assigned this May 21, 2021
nyalldawson pushed a commit that referenced this issue May 21, 2021
…res by fids...

and when they are already in the local cache.

Fixes #42049
rouault added a commit to rouault/QGIS that referenced this issue May 21, 2021
…res by fids...

and when they are already in the local cache.

Fixes qgis#42049
nyalldawson pushed a commit that referenced this issue May 25, 2021
…res by fids...

and when they are already in the local cache.

Fixes #42049
nyalldawson pushed a commit that referenced this issue May 31, 2021
…res by fids...

and when they are already in the local cache.

Fixes #42049
nyalldawson pushed a commit that referenced this issue Jun 14, 2021
…res by fids...

and when they are already in the local cache.

Fixes #42049
nyalldawson pushed a commit that referenced this issue Jun 21, 2021
…res by fids...

and when they are already in the local cache.

Fixes #42049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Vectors Related to general vector layer handling (not specific data formats) WFS data provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants