Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OGR provider: only enable journal_mode = wal when editing. requires GDAL >= 3.4.2 (fixes #23991) #47098

Merged
merged 4 commits into from Feb 1, 2022

Conversation

rouault
Copy link
Contributor

@rouault rouault commented Jan 31, 2022

This requires the NOLOCK open option of the GPKG driver added per OSGeo/gdal#5207
For earlier GDAL version, previous behaviour is kept.

With GDAL >= 3.4.2, when creating a QgsOgrProvider object, we first open
it in update mode without forcing WAL, to get the appropriate
capabilities, close it, and re-open it in read-only mode with the
NOLOCK=YES open option. This option will only be honoured if the file
wasn't already in WAL mode.
When editing a layer, the file is re-opened in update mode and with
enabling WAL to avoid blocking between readers and writers.
When closing a file, journal mode is attempted to be reset to DELETE as
before.

I've verified that test_provider_ogr_gpkg.py and test_provider_ogr.py
pass locally with GDAL master + OSGeo/gdal#5207

Could be worth backporting to LTR, but perhaps would require some additional testing from others

@github-actions github-actions bot added this to the 3.24.0 milestone Jan 31, 2022
@nirvn
Copy link
Contributor

nirvn commented Jan 31, 2022

I'd love to see this backported. Maybe after it lives on 3.24 for a month or so.

@gioman
Copy link
Contributor

gioman commented Jan 31, 2022

I'd love to see this backported. Maybe after it lives on 3.24 for a month or so.

Agree, especially if this means that the GPKG open in read only mode will stop to be marked as changed on the file system.

…DAL >= 3.4.2 (fixes qgis#23991)

This requires the NOLOCK open option of the GPKG driver added per OSGeo/gdal#5207
For earlier GDAL version, previous behaviour is kept.

With GDAL >= 3.4.2, when creating a QgsOgrProvider object, we first open
it in update mode without forcing WAL, to get the appropriate
capabilities, close it, and re-open it in read-only mode with the
NOLOCK=YES open option. This option will only be honoured if the file
wasn't already in WAL mode.
When editing a layer, the file is re-opened in update mode and with
enabling WAL to avoid blocking between readers and writers.
When closing a file, journal mode is attempted to be reset to DELETE as
before.

I've verified that test_provider_ogr_gpkg.py and test_provider_ogr.py
pass locally with GDAL master + OSGeo/gdal#5207
@kannes
Copy link
Contributor

kannes commented Feb 1, 2022

This would fix #23991, thank you so much in advance.

@nyalldawson nyalldawson merged commit fe410ec into qgis:master Feb 1, 2022
@florisvdh
Copy link
Contributor

Thank you SO very much for handling this! 👍 🎉 Looking forward to the next releases.

FYI @paleolimbot

@gdt
Copy link
Contributor

gdt commented Feb 2, 2022

When a file is opened as NOLOCK=yes, does that really mean no locking? Isn't it then the case that reading from the geopackage is unsound if some other program has it open?

I think an underlying issue is likely a difference of opinion as to the nature of geopackage files. qgis seems to view them as private data files within the qgis world, as opposed to databases that can be accessed by others, or at least that's how it seems reading the discussions. sqlite is an odd case in databases, as it can be both private storage for a program (perhaps the more common case) as well as functioning like a regular database.

@rouault
Copy link
Contributor Author

rouault commented Feb 2, 2022

When a file is opened as NOLOCK=yes, does that really mean no locking?

yes

Isn't it then the case that reading from the geopackage is unsound if some other program has it open?

see bottom of comment of OSGeo/gdal#5207 (comment)

@agiudiceandrea
Copy link
Contributor

@rouault tested with QGIS 3.24.0 and GDAL 3.4.2 on Windows:

  • it seems not possible to set NOLOCK=YES as default for GPKG files dragged and dropped in QGIS
  • the possibility to set the NOLOCK=YES for GPKG files is only available if the GPKG file is added using the Add Vector Layer window
    Is this behavior expected? If yes, could it be improved in order to let the user set the NOLOCK=YES option as default for dragged and dropped GPKG files?

@rouault
Copy link
Contributor Author

rouault commented Mar 24, 2022

it seems not possible to set NOLOCK=YES as default for GPKG files dragged and dropped in QGIS. the possibility to set the NOLOCK=YES for GPKG files is only available if the GPKG file is added using the Add Vector Layer window. Is this behavior expected? If yes, could it be improved in order to let the user set the NOLOCK=YES option as default for dragged and dropped GPKG files?

The NOLOCK=YES open option is mostly an internal detail. Users shouldn't need to set it generally. The provider is supposed to do the right thing, and set it automatically when the layer is in read-only mode, whatever how it was added to the project

@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Mar 24, 2022

The provider is supposed to do the right thing, and set it automatically when the layer is in read-only mode, whatever how it was added to the project

It seems to me it is not the case. Adding a GPKG layer (not opened by any other process) with drag and drop or using the Add Vector Layer window and leaving the NOLOCK option as <Default>, then the whal and shm files are created and the modify date of the gpkg file is changed even if the layer in QGIS is not edited.
Did I miss something or I misunderstand the point of this PR?

@rouault
Copy link
Contributor Author

rouault commented Mar 24, 2022

Has your QGIS version being rebuilt against GDAL 3.4.2 headers (and not just using it at runtime) ?

@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Mar 24, 2022

Has your QGIS version being rebuilt against GDAL 3.4.2 headers (and not just using it at runtime) ?

Actually not. I'm using the GDAL 3.4.2 runtime with QGIS 3.24.0 built against GDAL 3.4.1. So it should is this the issue. Then I will try again with a QGIS version built against GDAL 3.4.2. Thanks for the clarification. I apologise for the noise...

@agiudiceandrea
Copy link
Contributor

agiudiceandrea commented Mar 28, 2022

Hi rduivenvoorde @rouault, I've tested this patch with QGIS 3.24.1 (qgis-3.24.1-2 from OSGeo4W on Windows 10) built against GDAL 3.4.2 and using GDAL 3.4.2 runtime DLL.

The behaviour seems correct to me when a GPKG file is dragged and dropped in QGIS or added to a map using the Browser panel (in such cases the whal and shm files are not created and the modify date of the gpkg file is not changed).

If the GPKG file is added to a map using the Vector Layer window, then the behaviour is always the same (whal and shm files are not created and the modify date of the gpkg file is not changed) however the user sets the NOLOCK option (Yes, No, <Default>).
Is this behaviour expected? If yes, what's the point to expose the NOLOCK option to the user in QGIS?

@rduivenvoorde
Copy link
Contributor

@agiudiceandrea I was mentioned in your comment, but you probably meant @rouault ?
Or do you want me to double check something...?

@agiudiceandrea
Copy link
Contributor

I was mentioned in your comment

Please, excuse me! I meant to tag Even Rouault.

rouault added a commit to rouault/QGIS that referenced this pull request Mar 28, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs qgis#47098 (comment)
@rouault
Copy link
Contributor Author

rouault commented Mar 28, 2022

If yes, what's the point to expose the NOLOCK option to the user in QGIS?

it shouldn't be exposed (it is currently, as it is done automatically since QGIS auto-discovers open options from GDAL driver metadata). Fixed per #47992

qgis-bot pushed a commit that referenced this pull request Mar 30, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs #47098 (comment)
qgis-bot pushed a commit that referenced this pull request Mar 30, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs #47098 (comment)
nirvn pushed a commit that referenced this pull request Mar 31, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs #47098 (comment)
nirvn pushed a commit that referenced this pull request Mar 31, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs #47098 (comment)
@agiudiceandrea
Copy link
Contributor

Hi @rouault, it seems to me the issues reported at #48003 and #48024 could be related to this PR.

lbartoletti pushed a commit to Oslandia/QGIS_CI that referenced this pull request Apr 1, 2022
This option is automatically set by the OGR provider. No need for the
user to mess with it.

Refs qgis/QGIS#47098 (comment)
rouault added a commit to rouault/QGIS that referenced this pull request Apr 2, 2022
Fixes qgis#48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before qgis#47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProviderUtils which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
rouault added a commit to rouault/QGIS that referenced this pull request Apr 2, 2022
Fixes qgis#48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before qgis#47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
t0b3 added a commit to t0b3/QGIS that referenced this pull request Apr 3, 2022
Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before qgis#47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
nyalldawson pushed a commit that referenced this pull request Apr 4, 2022
Fixes #48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before #47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
qgis-bot pushed a commit that referenced this pull request Apr 4, 2022
Fixes #48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before #47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
qgis-bot pushed a commit that referenced this pull request Apr 4, 2022
Fixes #48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before #47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
nyalldawson pushed a commit that referenced this pull request Apr 4, 2022
Fixes #48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before #47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
nyalldawson pushed a commit that referenced this pull request Apr 4, 2022
Fixes #48012

Basically, for offline editing, we ask QgsOgrProvider to adopt the
behaviour it had before #47098, that is
use SQLite3 journal_mode = WAL even in read-only mode.

Sorry, this "fix" is horrible, but I couldn't get with something better after
many hours of investigation. The root cause seems to be that QgsOfflineEditing
creates layers using direct OGR API or sqlite3 API, and also uses
QgsOgrProvider which has a GDAL dataset reuse mechanism. And all that does
not place nicely at all. Maybe the root root cause is that the OGR GPKG driver
establishes the list of layers at its opening, and if creates a new one behind its
back, which is prone to happen with the dataset caching metchanism, the
GDALDataset::GetLayerByName() method doesn't see the newly created layer.

With that in mind, I tried to add calls to
QgsOgrProviderUtils::invalidateCachedDatasets() but to no avail.
Also QgsOgrProviderUtils::invalidateCachedLastModifiedDate() which is
used by QgsOgrProvider::createEmptyLayer(), but to no avail.

So basically I don't understand why offline editing worked before with a
GPKG backend. I'd say it is pure luck.

Note: I also saw that qgsofflineediting.cpp at line 315 calls
offlineLayer->dataProvider()->invalidateConnections() but that method is
only implemented in the Spatialite provider. I tried to implement it in
the OGR one, but to no avail too.
@agiudiceandrea
Copy link
Contributor

Since this PR causes some regressions (#48003 not yet fixed and #48024 fixed in a not yet released GDAL 3.4.x version), would it be worth to revert its backport (#47301 and cf0af09) at least in the 3.22 LTR branch to prevent the occurrence of such regressions in the upcoming QGIS LTR 3.22.6?

@rouault
Copy link
Contributor Author

rouault commented Apr 7, 2022

Since this PR causes some regressions (#48003 not yet fixed and #48024 fixed in a not yet released GDAL 3.4.x version), would it be worth to revert its backport (#47301 and cf0af09) at least in the 3.22 LTR branch to prevent the occurrence of such regressions in the upcoming QGIS LTR 3.22.6?

hopefully #48119 should fix network issues, but only compile-tested here

@agiudiceandrea
Copy link
Contributor

hopefully #48119 should fix network issues, but only compile-tested here

Thanks a lot again! I will try the MingW64 build and report back ASAP.

@agiudiceandrea
Copy link
Contributor

I can confirm that #48119 fixes #48003 on Windows 10 (QGIS 3.25.0-Master MingW64 build)

@agiudiceandrea
Copy link
Contributor

Hi @rouault, it seems to me that also the issue reported at #48154 ("Adding additional layers to GPKG does not load them correctly") may be related to this PR.

@rouault
Copy link
Contributor Author

rouault commented Apr 10, 2022

Hi @rouault, it seems to me that also the issue reported at #48154 ("Adding additional layers to GPKG does not load them correctly") may be related to this PR.

fix in #48157

@agiudiceandrea
Copy link
Contributor

Hi @rouault, I think the issue #48671, which is similar to #48154, could be related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants