Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Only set distinct on count clause if explicitely set to improve performance #3140

Merged
merged 1 commit into from Oct 17, 2023

Conversation

Sebobo
Copy link
Member

@Sebobo Sebobo commented Aug 24, 2023

F.e. Postgres has performance issues with large datasets and the DISTINCT clause. In a test this change reduced the query time of a count query for ~900.000 entities by >80%.

In a custom project this affected their Neos Media.UI in which the following results were found:

  • Count all assets | 580ms -> 260ms
  • Query 20 assets | 690ms -> 350ms
  • Query 100 assets | 990ms -> 650ms
  • Module load | 1900ms -> 1400ms

Review instructions

Everything should work the same, as #415 already sets the distinct flag where (possibly) necessary.

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked wit !!! and have upgrade-instructions

F.e. Postgres has performance issues with large datasets and the DISTINCT clause.
In a test this change reduced the query time of a complex query for ~900.000 by >80%.
@albe
Copy link
Member

albe commented Aug 24, 2023

I guess that's a good solution 👍. Regarding #415 I just noticed that it was actually reverted due to issues: #1293, but that should not have too much influence on this. Implicit joins were returning duplicates before, so counts on them were off. Now they're at least faster at that :D

@albe
Copy link
Member

albe commented Aug 24, 2023

Or wait, I was never checking a $query->count() on implicit join results, but just the count($result)... https://github.com/neos/flow-development-collection/pull/1293/files#diff-b389146f7ff3bedefca76487b8b3421d40ebf42cb4f24e854d129897dacc03a8L269
So maaaaybeee this is indeed a bit more breaky than my last comment suggests. But breaky in the sense it behaves differently than before, though more logical, i.e. ->count() would return 2 now, when there is a duplicate result/2 result rows.

@Sebobo Sebobo requested a review from kitsunet August 25, 2023 06:57
@Sebobo
Copy link
Member Author

Sebobo commented Aug 25, 2023

Setting a query as distinct should always be explicitly set and not assumed in any case?
Doctrine already does more magic than wished for in some cases.

@Sebobo
Copy link
Member Author

Sebobo commented Aug 25, 2023

Just got results from my customers production system (Postgres, 850.000 assets) based only on this change:

Bildschirmfoto 2023-08-25 um 09 04 16

Copy link
Member

@kdambekalns kdambekalns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhsdesign mhsdesign requested a review from fcool October 12, 2023 07:29
@kitsunet kitsunet merged commit 5d20cd2 into 7.3 Oct 17, 2023
10 of 11 checks passed
@kitsunet kitsunet deleted the bugfix/distinct-count-clause branch October 17, 2023 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants