Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pg bigint pk not text #34780

Closed
wants to merge 412 commits into from

Conversation

espinafre
Copy link
Contributor

On PostgreSQL tables whose primary keys are of type bigint/bigserial,
QGIS used to cast the primary keys to text, causing the database to do
bigint::text castas and full table scans instead of using the indices,
causing in turn slowness on updates and other queries.

This fix creates a new QgsPostgresPrimaryKeyType, PktInt64. PostgreSQL
don't know about unsigned types, so the PK type PktUint64 loses sense.
Knowing that the PK is an integer type, QGIS won't cast it to text
anymore, thus enabling the correct use of database indices.

Fixes #34077

@jef-n
Copy link
Member

jef-n commented Feb 28, 2020

QGIS uses signed 64bit feature ids. Positive values for features from the data source and negative values for feature ids of new unsaved features. Where possible QGIS uses the actual key from the datasource, but for datasource that have wider keys (eg. full 64bit, bigint, strings, compound keys etc.), we remap the keys to positive features ids. See also #3036

@espinafre
Copy link
Contributor Author

QGIS uses signed 64bit feature ids. Positive values for features from the data source and negative values for feature ids of new unsaved features. Where possible QGIS uses the actual key from the datasource, but for datasource that have wider keys (eg. full 64bit, bigint, strings, compound keys etc.), we remap the keys to positive features ids. See also #3036

Thanks for pointing that out. So, PktUint64 stays. The problem with forcing bigint/bigserial keys to PktFidMap is that QGIS casts those to text, generating queries such as:

UPDATE "myschema"."tb_my_big_table" SET "class_of_feature"='Swamp' WHERE "id"::text='9067684'

This, on a table with currently 10 million records (polygons), takes 4+ seconds on a very robust PostgreSQL 12 server, with only a single user; when editing a . We chose to use bigserial for the id because we expect this table to grow larger than 2 billion records. With the changes in this PR, the generated queries are as follows:

UPDATE "myschema"."tb_my_big_table" SET "class_of_feature"='Swamp' WHERE "id"=9067684
and each query/update takes under 0.2 miliseconds.

@elpaso
Copy link
Contributor

elpaso commented Feb 29, 2020

I'm afraid that the only robust solution is to check at layer load time if the pk column contains any value <= 0 and use the pk map in that case.

If the check passes, we can probably handle the signed pk as if it was an unsigned one.

@m-kuhn
Copy link
Member

m-kuhn commented Mar 2, 2020

It should be possible to use aa pk map locally and still send the requests as integer.
Or am I missing something?

@jef-n
Copy link
Member

jef-n commented Mar 2, 2020

It should be possible to use aa pk map locally and still send the requests as integer.
Or am I missing something?

Yes, I'd think the main problem is the missing support for bigint as type - and that's why it is handled as text. Did anyone try int8 instead?

@github-actions github-actions bot added this to the 3.14.0 milestone Mar 4, 2020
jef-n and others added 24 commits March 6, 2020 09:53
Add missing attributes at QgsAttributeEditorContext creation.
Try to make PG raster test more robust
Adds the ability to set a temporal range for a project
Adds:

@map_start_time: Start of the map's temporal time range (as a datetime value)
@map_end_time: End of the map's temporal time range (as a datetime value)
@map_interval: Duration of the map's temporal time range (as an interval value)
* MDAL 0.5.90 : support for custom Logger and 1D meshes
* [FEATURE] [MESH] Support rendering of 1D meshes, see qgis/QGIS-Enhancement-Proposals#164

1D mesh consist of edges (edge is straight line segment with 2 vertices) and the data that is defined on either
vertices or edges. Such data can be loaded by MDAL and rendered as mesh layer in QGIS.
This gives users control over where a callout should join to the label
text. (Previously, you only had control over where the callout would
join to the corresponding feature geometry).

Choices include:
- Closest point (previous behavior)
- Label Centroid
- Fixed corners: Top left/top right/bottom left/bottom right/etc

Data defined control over the label anchor is also possible
@elpaso
Copy link
Contributor

elpaso commented Mar 14, 2020

Regarding point 6, looking at the code it seems there is something wrong here: if I get it right,

mPrimaryKeyType = i ? PktFidMap : pkType( fld );
is hit whenever a PK is specified in the URI (the other branch of the if is for when the PK needs to be guessed. So, in case the PK is known from the URI it is always treated like a FID map and that is probably wrong, we should get the PK type in that case and act accordingly (but that would not be enough to solve your issue of course).

The other consideration is that given the way QGIS is implemented you cannot use negative keys outside the provider internals (i.e. a QgsFeature cannot leave the provider with a negative ID unless they are new features that do not exist outside of the current QGIS editing session).

So, option 6 may work in a few cases (when there are no negative PKs in the existing table and there is a guarantee that other applications will never add rows with negative PKs to the table) and it has the "problem" of requiring an extra query to determine the presence of negative PKs at load time.

Solution 7 may be more easy to implement and looks more robust, I think you should work on that one first, you should be able use the map even when you have negative IDs in the existing table.

In any event, if you do want to continue work on this issue (which is of course very welcome) I think you should start adding a few failing test data cases to
https://github.com/qgis/QGIS/blob/9e5966cf8c079c49109254f5b8a3847e3f00c017/tests/testdata/provider/testdata_pg.sh (maybe create another .sql file for them) and implement the failing test cases in https://github.com/qgis/QGIS/blob/9e5966cf8c079c49109254f5b8a3847e3f00c017/tests/src/python/test_provider_postgres.py

With a decent test bed it will be much easier to test any potential solution we can come up with.

elpaso and others added 8 commits March 14, 2020 11:18
Other providers return an "invalid" col type from geometryColumnTypes
let's keep it that way (for now)
Returns the correct version for the Proj library linked against the
running PostGIS.
On PostgreSQL tables whose primary keys are of type bigint/bigserial,
QGIS used to cast the primary keys to text, causing the database to do
bigint::text castas and full table scans instead of using the indices,
causing in turn slowness on updates and other queries.

This fix creates a new QgsPostgresPrimaryKeyType, PktInt64. PostgreSQL
don't know about unsigned types, so the PK type PktUint64 loses sense.
Knowing that the PK is an integer type, QGIS won't cast it to text
anymore, thus enabling the correct use of database indices.
PostgreSQL bigint primary keys are no longer cast to text on
queries/updates. Internally we keep a FID map between the real database
value and the internal QGIS feature ID. Thus, we preserve the QGIS
semantics of using negative FIDs for newly added features, while still
allowing users to edit attributes whose PKs are bigint non-positive.
@espinafre
Copy link
Contributor Author

Following @elpaso 's recommendations, I've gone with @m-kuhn and @jef-n idea, which is to treat int8 keys as fidmaps internally, but sending those values to the DBMS as bare values, without quoting or casting them to text. My preliminary test went very well, as I could edit fields with both positive and negative IDs, and they were sent to the DBMS as they were originally; also, because we are using the existing fidmap mechanism, we don't have clashes with QGIS internal FID handling. Looks very good to me!

Now I will finish my homework (assigned by @elpaso ) and write those formal tests. Thank you so much!

@espinafre
Copy link
Contributor Author

I shouldn't have rebased my branch before merging, right?...

@m-kuhn
Copy link
Member

m-kuhn commented Mar 15, 2020

Thanks for the update. Promosing!

There is nothing wrong about rebasing per se. But here, something seems to have gone wrong.

What I would do:

Start a new branch from QGIS master and cherry-pick your commits onto it

E.g.

git checkout master
git checkout -b pg_bigint_pk_no_cast
git cherry-pick 626a45b d10a520 cda6ce7

Fingers crossed

PS:
I just read again "rebase before merge" -> it's normally either/or. You rebase or you merge. After a rebase you have to force push.
Very often you don't need to do anything at all and just continue with what you have. The most likely reason to have to do rebase or merge is when your pull request reports a conflict.

@roya0045
Copy link
Contributor

That rebase looks wrong.

@espinafre
Copy link
Contributor Author

@roya0045 , yes, it does. This is what I did:

git fetch upstream
git checkout master # from my local repo
git merge upstream/master
git push # to my GitHub fork
git checkout fix_pg_bigint_pk_not_text # this branch
git rebase master
git push

and that's how it got so messed up.

@roya0045
Copy link
Contributor

@espinafre why didn't you just git rebase upstream/master directly in your branch? That always worked fine for me.

@espinafre
Copy link
Contributor Author

@roya0045 because I hadn't thought of that, dumb me :D I'll do that next time, thanks for the tip!

@espinafre
Copy link
Contributor Author

Please see #35162 instead; it contains only the changes that fixes the bug, without the whole merge/rebase mess.

@espinafre espinafre closed this Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QGIS casts bigint Primary key field to a text field