Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix featureCount on postgres view when flag estimatedmetadata is set #32114

Merged
merged 2 commits into from Nov 21, 2019

Conversation

troopa81
Copy link
Contributor

@troopa81 troopa81 commented Oct 4, 2019

We can't estimate rows on postgres view.

Checklist

  • Commit messages are descriptive and explain the rationale for changes
  • Commits which fix bugs include Fixes #11111 at the bottom of the commit message
  • I have read the QGIS Coding Standards and this PR complies with them
  • New unit tests have been added for core changes
  • I have run the scripts/prepare-commit.sh script before each commit
  • I have evaluated whether it is appropriate for this PR to be backported, backport requests are left as label or comment

@troopa81 troopa81 added Bug Either a bug report, or a bug fix. Let's hope for the latter! backport release-3_4 labels Oct 4, 2019
@troopa81
Copy link
Contributor Author

troopa81 commented Oct 4, 2019

Not sure about the backport 3.4 though. It's quite an easy fix, so I would say yes

@nyalldawson
Copy link
Collaborator

What's the consequence of the bug? Only fixes for crashes or data corruption should be backported now.

@troopa81
Copy link
Contributor Author

troopa81 commented Oct 7, 2019 via email

@jef-n
Copy link
Member

jef-n commented Oct 23, 2019

Doesn't this make loading expensive views painfully slow?

@strk
Copy link
Contributor

strk commented Oct 23, 2019

It surely does. I suggest to use EXPLAIN and parse its output, for estimating number of rows contained in a query.

@strk
Copy link
Contributor

strk commented Oct 23, 2019

EXPLAIN could actually be also used for queries, dropping another performance bottleneck when user asked for "estimated" metadata...

Copy link
Contributor

@strk strk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EXPLAIN should be used for the estimation

src/providers/postgres/qgspostgresprovider.cpp Outdated Show resolved Hide resolved
tests/src/python/test_provider_postgres.py Outdated Show resolved Hide resolved
@nyalldawson nyalldawson added backport release-3_10 and removed Frozen Feature freeze - Do not merge! labels Oct 25, 2019
@troopa81
Copy link
Contributor Author

It surely does. I suggest to use EXPLAIN and parse its output, for estimating number of rows contained in a query.

Isn't it dangerous to rely on EXPLAIN output to estimate the number of rows? How can we be sure that the output is not gonna change in the future, that someday rows will be changed in number_of_rows ?

@strk
Copy link
Contributor

strk commented Oct 28, 2019

Isn't it dangerous to rely on EXPLAIN output to estimate the number of rows? How can we be sure that the output is not gonna change in the future, that someday rows will be changed in number_of_rows ?

Explain has XML output support since 9.0 (https://www.postgresql.org/docs/9.0/sql-explain.html).
Shall format change in future we'll deal with it. I don't see it that dangerous.

@troopa81 troopa81 force-pushed the fix_feature_count_estimated_metadata branch from 2a47c63 to 7352e3f Compare October 31, 2019 10:12
@troopa81
Copy link
Contributor Author

Explain has XML output support since 9.0 (https://www.postgresql.org/docs/9.0/sql-explain.html).

Thanks for the tip, I was not aware about explain formats. I choose JSON over XML, because it seems to me easy to write 7352e3f .

@troopa81 troopa81 force-pushed the fix_feature_count_estimated_metadata branch from 7352e3f to f3705ab Compare October 31, 2019 15:39
@jef-n
Copy link
Member

jef-n commented Oct 31, 2019

should be guarded by connectionRO()->pgVersion() >= 90200 as other occurrences of json. And the featureCount is unreliable with "estimated metadata" anyway and widely used to avoid expensive queries. I think we should just return -1 instead of 0, if there's no shortcut like EXPLAIN.

@troopa81
Copy link
Contributor Author

troopa81 commented Nov 4, 2019

should be guarded by connectionRO()->pgVersion() >= 90200

OK, I'll add it (with 90000, this feature was already available in 9.0 )

I think we should just return -1 instead of 0, if there's no shortcut like EXPLAIN

And displaying -1 in the UI? or "Unknown"?

@troopa81 troopa81 force-pushed the fix_feature_count_estimated_metadata branch from f3705ab to 5f43b3f Compare November 7, 2019 13:41
@troopa81
Copy link
Contributor Author

troopa81 commented Nov 7, 2019

I let the feature count display as it is, so if the estimate doesn't work, it will display -1

@troopa81
Copy link
Contributor Author

@strk is it OK for requested changes?

Copy link
Contributor

@strk strk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks fine here

@haubourg
Copy link
Member

Anyone to merge this ?

@elpaso elpaso merged commit 8913fb3 into qgis:master Nov 21, 2019
@haubourg
Copy link
Member

thanks @elpaso !

@Gustry
Copy link
Contributor

Gustry commented Nov 11, 2020

@troopa81 I understand why you made this change for views, it was indeed correct.
But why you didn't keep the previous behavior for tables ?

The Plan Rows makes quite different results and is not following the VACUUM ANALYSE that people might run on their database.

Look this example:

EXPLAIN (FORMAT JSON) SELECT 1 FROM pgmetadata.contact
-- QUERY PLAN [{'Plan': {'Node Type': 'Seq Scan', 'Parallel Aware': False, 'Relation Name': 'contact', 'Alias': 'contact', 'Startup Cost': 0.0, 'Total Cost': 15.1, 'Plan Rows': 510, 'Plan Width': 4}}]
-- > 'Plan Rows': 510

VACUUM ANALYSE pgmetadata.contact;
SELECT reltuples as approximate_row_count  FROM pg_class WHERE oid = 'pgmetadata.contact'::regclass;
-- 0.0

SELECT COUNT(*) FROM pgmetadata.contact;
-- 0

EXPLAIN (FORMAT JSON) SELECT 1 FROM pgmetadata.contact;
-- Still 510

@troopa81
Copy link
Contributor Author

@Gustry But why you didn't keep the previous behavior for tables ?

Because it would cost an extra request to check if mQuery is a view or a table (calling relkind)

But, I reproduce your issue when you remove all your data from your table. I'll try to find some time to fix this.

@Gustry
Copy link
Contributor

Gustry commented Nov 12, 2020

Because it would cost an extra request to check if mQuery is a view or a table

Hum, I though indeed we would have this info already in the context.

But, I reproduce your issue when you remove all your data from your table.

My tables are indeed neatly all empty for now.

But I tried with a simple line, it's still wrong.

SELECT COUNT(*) FROM pgmetadata.contact;
-- 1

EXPLAIN (FORMAT JSON) SELECT 1 FROM pgmetadata.contact;
-- Still 510

Should I create a ticket ?

@troopa81
Copy link
Contributor Author

Hum, I though indeed we would have this info already in the context.

It could be, I don't see any reason to retrieve this information several times

But I tried with a simple line, it's still wrong.

My guess here is that the Plan rows approach is more approximate than the reltuples one. I tried with a table containing 3+06 rows and removed all rows. The reltuples did say 0 while the Plan Rows says 300. Both have changed and both are kind of true for an estimation. But the reltuples is more true :)

Should I create a ticket ?

Yes, please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Needs Backporting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants