New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination returns wrong number of entries if duplicates exist #23

Closed
michael-e opened this Issue Aug 23, 2012 · 10 comments

Comments

Projects
None yet
3 participants
@michael-e
Member

michael-e commented Aug 23, 2012

I am using the extension with Symphony 2.2.5, so I checked out the last compatible commit (8c64a08).

As far as I see, there has been no fix for my problem in the meantime.

I am using two datasources pulling from the same section. If entries occur in both datasources, the output XML will not contain duplicates (which is right, of course), but it will contain the wrong number of entries – the number is reduced by the number of duplicates.

So if I say "25 per page", it may as well be only 17. Funny: If the last entry on page 1 is a "duplicate" (i.e. will be output by both datasources), it will also occur as first entry on page 2.

I hope that this is something very simple to debug...

@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Aug 23, 2012

Member

I just noticed that the total-entries count is wrong as well if there are "duplicates".

Member

michael-e commented Aug 23, 2012

I just noticed that the total-entries count is wrong as well if there are "duplicates".

@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Aug 23, 2012

Member

I tracked this down to the fetchByPage function returning the wrong count for $total_rows:

$total_rows = Symphony::Database()->fetchCol('total_rows', 'SELECT FOUND_ROWS() AS `total_rows`');
Member

michael-e commented Aug 23, 2012

I tracked this down to the fetchByPage function returning the wrong count for $total_rows:

$total_rows = Symphony::Database()->fetchCol('total_rows', 'SELECT FOUND_ROWS() AS `total_rows`');
@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Aug 23, 2012

Member

Here we go.

I found two problems, the first one being only a small one.

The preg_replace does nothing

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/^SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

It simply doesn't work because there is whitespace before the SELECT statement. So it should rather be:

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

Fortunately, FOUND_ROWS() falls back to use the most recent successful SELECT statement (see http://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_found-rows), so we never noticed this issue. Nevertheless I suggest to fix it.

UNION ALL?

UNION ALL does not remove duplicate rows, so the count must be wrong

If I change:

$sql = implode(" UNION ALL ", $this->data['sql']);

to:

$sql = implode(" UNION DISTINCT ", $this->data['sql']);

the count works properly.

It seems to me that the "uniqueness" of entries has been handled by Symphony itself (or maybe by some other extension code). I don't see any side effects of forcing the uniqueness by using UNION DISTINCT, but I admit that I may not see the big picture.

So can you please check this suggestion and think about it?

Member

michael-e commented Aug 23, 2012

Here we go.

I found two problems, the first one being only a small one.

The preg_replace does nothing

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/^SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

It simply doesn't work because there is whitespace before the SELECT statement. So it should rather be:

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

Fortunately, FOUND_ROWS() falls back to use the most recent successful SELECT statement (see http://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_found-rows), so we never noticed this issue. Nevertheless I suggest to fix it.

UNION ALL?

UNION ALL does not remove duplicate rows, so the count must be wrong

If I change:

$sql = implode(" UNION ALL ", $this->data['sql']);

to:

$sql = implode(" UNION DISTINCT ", $this->data['sql']);

the count works properly.

It seems to me that the "uniqueness" of entries has been handled by Symphony itself (or maybe by some other extension code). I don't see any side effects of forcing the uniqueness by using UNION DISTINCT, but I admit that I may not see the big picture.

So can you please check this suggestion and think about it?

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 28, 2012

Member

Hey @michael-e! Thanks for debugging this.

  • Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries. I think we can trim $sql or adjust how it it is built to avoid this whitespace so the regular expression will work.
  • I don't forsee any issues changing to UNION DISTINCT. I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice. I would want it to only return the entry once, but I have a feeling the same entry would actually appear in the ?debug twice..
Member

brendo commented Aug 28, 2012

Hey @michael-e! Thanks for debugging this.

  • Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries. I think we can trim $sql or adjust how it it is built to avoid this whitespace so the regular expression will work.
  • I don't forsee any issues changing to UNION DISTINCT. I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice. I would want it to only return the entry once, but I have a feeling the same entry would actually appear in the ?debug twice..
@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Aug 28, 2012

Member

Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries.

Good idea. If that doesn't work, we can still match the leading whitespace in the regex using \s.

I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice.

That was my use case, wasn't it? As I said, UNION DISTINCT actually forces uniqueness, but UNION ALL does not. So using UNION ALLyou will get double entries, which will then be filtered out by a later process. Using UNION DISTINCT there are no duplicates (and so the entry list and the count are working properly).

Member

michael-e commented Aug 28, 2012

Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries.

Good idea. If that doesn't work, we can still match the leading whitespace in the regex using \s.

I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice.

That was my use case, wasn't it? As I said, UNION DISTINCT actually forces uniqueness, but UNION ALL does not. So using UNION ALLyou will get double entries, which will then be filtered out by a later process. Using UNION DISTINCT there are no duplicates (and so the entry list and the count are working properly).

@brendo

This comment has been minimized.

Show comment
Hide comment
@brendo

brendo Aug 29, 2012

Member

Right, it's the latter process that I'm interested in (curiosity's sake), but sure, UNION DISTINCT it up baby.

Member

brendo commented Aug 29, 2012

Right, it's the latter process that I'm interested in (curiosity's sake), but sure, UNION DISTINCT it up baby.

@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Aug 29, 2012

Member

Ah, I see. No, actually in these cases (using UNION ALL) entries do not appear twice in the output. The duplicates get lost somewhere — which is why the wrong number of entries is returned in the end. I have no idea where the duplicates are eliminated.

Should I send a pull request for the UNION thingie? It occurs only once, so you might just fix it manually, if you like!

Member

michael-e commented Aug 29, 2012

Ah, I see. No, actually in these cases (using UNION ALL) entries do not appear twice in the output. The duplicates get lost somewhere — which is why the wrong number of entries is returned in the end. I have no idea where the duplicates are eliminated.

Should I send a pull request for the UNION thingie? It occurs only once, so you might just fix it manually, if you like!

@designermonkey

This comment has been minimized.

Show comment
Hide comment
@designermonkey

designermonkey Sep 14, 2012

Member

I was just about to log this one as well.

I logged an issue recently (last 6 months) on Symphony to ask for multiple sort by's in the datasource editor, and it was for this exact reason.

We have a project that sorts two datasources on different columns, and then Union's them to get one list of the two sorts, only problem is duplicates are removed. Setting either original datasource to not paginate (returning all entries) and then paginating in the Union DS still returns the incorrect count.

Member

designermonkey commented Sep 14, 2012

I was just about to log this one as well.

I logged an issue recently (last 6 months) on Symphony to ask for multiple sort by's in the datasource editor, and it was for this exact reason.

We have a project that sorts two datasources on different columns, and then Union's them to get one list of the two sorts, only problem is duplicates are removed. Setting either original datasource to not paginate (returning all entries) and then paginating in the Union DS still returns the incorrect count.

@designermonkey

This comment has been minimized.

Show comment
Hide comment
@designermonkey

designermonkey Sep 14, 2012

Member

So, UNION DISTINCT seemed to fix the issue I had enough to close it for the client, but I'm still sure there's something wrong here... It just doesn't feel right, like black magic or something...

Member

designermonkey commented Sep 14, 2012

So, UNION DISTINCT seemed to fix the issue I had enough to close it for the client, but I'm still sure there's something wrong here... It just doesn't feel right, like black magic or something...

@michael-e

This comment has been minimized.

Show comment
Hide comment
@michael-e

michael-e Sep 14, 2012

Member

Symphony simply doesn't output duplicate entries, that's why a count including duplicates must be wrong. (Pagination alike.) No black magic here. :-)

(I don't see any reason why Symphony should output duplicates, BTW.)

Member

michael-e commented Sep 14, 2012

Symphony simply doesn't output duplicate entries, that's why a count including duplicates must be wrong. (Pagination alike.) No black magic here. :-)

(I don't see any reason why Symphony should output duplicates, BTW.)

@brendo brendo closed this in a2e69d8 Dec 9, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment