Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBS-10923: Fix inconsistent ordering in find_by_collection #1591

Merged
merged 1 commit into from
Jul 23, 2020

Conversation

mwiencek
Copy link
Member

@mwiencek mwiencek commented Jul 7, 2020

DISTINCT ON (release.id) is used here because the joins create "duplicate" rows for every release. With DISTINCT ON, only the first row from each set is kept, so the ORDER BY in the inner query is important. If $order_by is say date_year, date_month, date_day, name COLLATE musicbrainz, then after sorting by id we also want it to consistently pick the row with the earliest year, then month, etc. (There will be a different row for each release event.)

In the outer query, we also want to make sure that releases that happen to have the same release date and name (or whatever the configured sort order is) are next ordered by id, otherwise the sort is undefined.

One gotcha here is that $order_by references column aliases that can be used in the outer query, but not the inner one. We have to return a separate $inner_order_by with explicit column references where needed. This is icky but I couldn't come up with anything better. One solution might be to replace all the joins with semi-joins, which would also remove the need for DISTINCT ON, but I haven't evaluated the performance of that.

It's not really feasible to write a test for this that fails consistently, since the order is going to depend on the order PG retrieves things, and that's unlikely to change in a very small test DB.

`DISTINCT ON (release.id)` is used here because the joins create
"duplicate" rows for every release. With DISTINCT ON, only the first row
from each set is kept, so the ORDER BY in the inner query is important.
If `$order_by` is say `date_year, date_month, date_day, name COLLATE
musicbrainz`, then after sorting by id we also want it to consistently
pick the row with the earliest year, then month, etc. (There will be a
different row for each release event.)

In the outer query, we also want to make sure that releases that happen
to have the same release date and name (or whatever the configured sort
order is) are next ordered by id, otherwise the sort is undefined.

One gotcha here is that `$order_by` references column aliases that can
be used in the outer query, but not the inner one. We have to return a
separate `$inner_order_by` with explicit column references where needed.
This is icky but I couldn't come up with anything better. One solution
might be to replace all the joins with semi-joins, which would also
remove the need for DISTINCT ON, but I haven't evaluated the performance
of that.

It's not really feasible to write a test for this that fails
consistently, since the order is going to depend on the order PG
retrieves things, and that's unlikely to change in a very small test DB.
Copy link
Member

@reosarevok reosarevok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally reasonable I think. Is there any good way to test this at all?

@ijc
Copy link

ijc commented Jul 15, 2020

I don't know if it's helpful but
releases.txt.gz is all the releases in the collection which lead to me reporting MBS-10923.

FWIW I've just now reproduced the issue with the curl runes I gave in the report but against the beta server instead (hitting 'https://beta.musicbrainz.org/ws/2/collection/72357d35-55c7-4525-9c7a-7d02e4377f2e/releases?offset=50&order=id', on the assumption that this fix would end up there before hitting production so knowing that the bug shows up there today I can help validate the fix there if it would be helpful.

I tried to setup a local musicbrainz-docker based env with a replica of the real db to try with but failed. As @mwiencek suggests it probably wouldn't be much use anyway since it's likely the issue needs a fully loaded PG db?

Copy link
Contributor

@yvanzo yvanzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwiencek
Copy link
Member Author

@ijc Thanks, I'd been able to reproduce it with other collections too. We'll update MBS-10923 once the fix is in beta testing, so if you can test it there once that happens that'd be helpful!

@mwiencek mwiencek merged commit 672cd7b into metabrainz:master Jul 23, 2020
@mwiencek mwiencek deleted the mbs-10923 branch July 23, 2020 18:12
reosarevok pushed a commit to reosarevok/musicbrainz-server that referenced this pull request Jul 24, 2020
…z#1591)

`DISTINCT ON (release.id)` is used here because the joins create
"duplicate" rows for every release. With DISTINCT ON, only the first row
from each set is kept, so the ORDER BY in the inner query is important.
If `$order_by` is say `date_year, date_month, date_day, name COLLATE
musicbrainz`, then after sorting by id we also want it to consistently
pick the row with the earliest year, then month, etc. (There will be a
different row for each release event.)

In the outer query, we also want to make sure that releases that happen
to have the same release date and name (or whatever the configured sort
order is) are next ordered by id, otherwise the sort is undefined.

One gotcha here is that `$order_by` references column aliases that can
be used in the outer query, but not the inner one. We have to return a
separate `$inner_order_by` with explicit column references where needed.
This is icky but I couldn't come up with anything better. One solution
might be to replace all the joins with semi-joins, which would also
remove the need for DISTINCT ON, but I haven't evaluated the performance
of that.

It's not really feasible to write a test for this that fails
consistently, since the order is going to depend on the order PG
retrieves things, and that's unlikely to change in a very small test DB.
yvanzo added a commit that referenced this pull request Aug 3, 2020
* master:
  Update POT files using the production database
  Update translations from Transifex
  MBS-7994 / MBS-10746: Add pregap icon and indicate pregap in edits (#1572)
  MBS-10949 (2/2): Add Migu Music URL to the sidebar
  MBS-10949 (1/2): Handle Migu Music URLs
  MBS-10989: Update VocaDB cleanup to support new Es (series) format (#1614)
  MBS-10980: Convert Remove Entity edit to React (#1606)
  MBS-10983: Convert Remove Release Label edit to React (#1609)
  MBS-10193: Support Apple Music and Apple Books (#1597)
  MBS-10974: Convert Remove ISRC edit to React (#1604)
  MBS-10975: Convert Remove ISWC edit to React (#1605)
  MBS-10969: Convert Add ISRCs edit to React (#1600)
  MBS-10970: Convert Add Release Label edit to React (#1601)
  MBS-10968: Convert Add ISWCs edit to React (#1599)
  Install bzip2 in the test-database docker image (#1621)
  MBS-8725: Allow mediums to have an unknown tracklist (#1103)
  Skip looking for latest vote when not logged in (#1607)
  MBS-10923: Fix inconsistent ordering in find_by_collection (#1591)
  Bump lodash version to 4.17.19
  Bump react-table version to 7.3.2
  Fix warning when displaying Add rel. attr. edit
  MBS-10926: Add 'copy date' button on relationship editor dialog (#1583)
  Strengthen the validation of Instagram URLs
  Refactor: Simplify cleanup regexp for Instagram
  MBS-10932: Match Instagram videos /p/ and /tv/
  Fix handling of js tape object serialization
  MBS-10973: Convert Add Relationship Attribute edit to React
  MBS-10930: Fix loading relationship entities for removed reltypes (#1589)
  MBS-10943: Fix loading attribute on Relationship::Delete (#1594)
  Avoid warnings in Entity::Relationship
  MBS-10833: Avoid crashing if entity0_id is missing
  MBS-10917: Remove no longer used "attendance" (#1579)
  MBS-10927: Check whether entity existed before setting allowNew (#1584)
  MBS-10937: Run load_meta on event lists for ratings (#1587)
  Improve upon MBS-9502 with list of releases
  Move help messages about barcode together in the release editor
  MBS-9502: Add search for pre-existing barcode value in release editor
  Bump Flow to 0.129.0
  Fix some left-behind "import React" imports (#1580)
  MBS-10928: Update the VGMdb favicon
  MBS-10925: Update the Baidu Baike favicon
  MBS-10816: Convert Add Label Edit to React (#1506)
  Bump react-table version to 7.2.1 (#1576)
  Add context to translation strings
  MBS-10562: Add phrases for future (sidebar) dates
yvanzo added a commit that referenced this pull request Aug 10, 2020
* beta:
  Drop autoselect for Apple Music (#1643)
  Update translations from Transifex
  Revert "MBS-8725: Allow mediums to have an unknown tracklist (#1103)"
  Update POT files using the production database
  Update translations from Transifex
  MBS-7994 / MBS-10746: Add pregap icon and indicate pregap in edits (#1572)
  MBS-10949 (2/2): Add Migu Music URL to the sidebar
  MBS-10949 (1/2): Handle Migu Music URLs
  MBS-10989: Update VocaDB cleanup to support new Es (series) format (#1614)
  MBS-10980: Convert Remove Entity edit to React (#1606)
  MBS-10983: Convert Remove Release Label edit to React (#1609)
  MBS-10193: Support Apple Music and Apple Books (#1597)
  MBS-10974: Convert Remove ISRC edit to React (#1604)
  MBS-10975: Convert Remove ISWC edit to React (#1605)
  MBS-10969: Convert Add ISRCs edit to React (#1600)
  MBS-10970: Convert Add Release Label edit to React (#1601)
  MBS-10968: Convert Add ISWCs edit to React (#1599)
  Install bzip2 in the test-database docker image (#1621)
  MBS-8725: Allow mediums to have an unknown tracklist (#1103)
  Skip looking for latest vote when not logged in (#1607)
  MBS-10923: Fix inconsistent ordering in find_by_collection (#1591)
  Bump lodash version to 4.17.19
  Bump react-table version to 7.3.2
  Fix warning when displaying Add rel. attr. edit
  MBS-10926: Add 'copy date' button on relationship editor dialog (#1583)
  Strengthen the validation of Instagram URLs
  Refactor: Simplify cleanup regexp for Instagram
  MBS-10932: Match Instagram videos /p/ and /tv/
  Fix handling of js tape object serialization
  MBS-10973: Convert Add Relationship Attribute edit to React
  MBS-10930: Fix loading relationship entities for removed reltypes (#1589)
  MBS-10943: Fix loading attribute on Relationship::Delete (#1594)
  Avoid warnings in Entity::Relationship
  MBS-10833: Avoid crashing if entity0_id is missing
  MBS-10917: Remove no longer used "attendance" (#1579)
  MBS-10927: Check whether entity existed before setting allowNew (#1584)
  MBS-10937: Run load_meta on event lists for ratings (#1587)
  Improve upon MBS-9502 with list of releases
  Move help messages about barcode together in the release editor
  MBS-9502: Add search for pre-existing barcode value in release editor
  Bump Flow to 0.129.0
  Fix some left-behind "import React" imports (#1580)
  MBS-10928: Update the VGMdb favicon
  MBS-10925: Update the Baidu Baike favicon
  MBS-10816: Convert Add Label Edit to React (#1506)
  Bump react-table version to 7.2.1 (#1576)
  Add context to translation strings
  MBS-10562: Add phrases for future (sidebar) dates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants