Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline Similarity Evaluation with Annoy #364

merged 380 commits into from Jul 21, 2021


Copy link

@aidanlw17 aidanlw17 commented Jul 31, 2019

Baseline Similarity Evaluation with Annoy

This PR contains the full body of similarity work on AB using Annoy.
It adds an evaluation, /uuid:mbid/similar/string:metric, allowing
for users to find similar recordings to an (MBID, offset) pair in terms
of a specific metric. Users can provide ratings and suggestions about
the recommendations.

The use of this evaluation, in combination with a variety of parameters
for building the Annoy index, will help us to improve and tune the
entire similarity engine.

aidanlw17 added 29 commits Aug 24, 2019
The functions get_similar_recordings for postgres similarity,
and get_all_metrics are written by Philip Tovstogan for his thesis
work on recording similarity.

get_similar_recordings is altered slightly for our purposes,
however get_all_metrics is not altered.

Philip's work can be found here:
Copy link
Contributor Author

@aidanlw17 aidanlw17 commented Jun 10, 2020

Hi @alastair, I still have a few broken tests to fix before the changes to this are totally done. Sorry it has taken longer than I thought to get it turned around.


Copy link

@pep8speaks pep8speaks commented Sep 18, 2020

Hello @aidanlw17! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 861:61: W291 trailing whitespace

Line 6:131: E501 line too long (217 > 130 characters)
Line 7:131: E501 line too long (155 > 130 characters)
Line 9:131: E501 line too long (186 > 130 characters)
Line 12:131: E501 line too long (170 > 130 characters)
Line 16:131: E501 line too long (206 > 130 characters)
Line 21:131: E501 line too long (214 > 130 characters)
Line 31:131: E501 line too long (203 > 130 characters)
Line 33:2: W292 no newline at end of file

Line 13:1: E302 expected 2 blank lines, found 1
Line 216:65: W291 trailing whitespace
Line 217:57: W291 trailing whitespace

Line 1:12: E741 ambiguous variable name 'l'

Line 17:1: E302 expected 2 blank lines, found 1
Line 71:4: E271 multiple spaces after keyword

Line 52:131: E501 line too long (135 > 130 characters)

Line 145:21: E128 continuation line under-indented for visual indent
Line 146:36: E127 continuation line over-indented for visual indent

Comment last updated at 2021-07-21 16:53:11 UTC


alastair added 19 commits Sep 23, 2020
Get MBID and submission offset for all similar recordings in a single
SQL query instead of one query per item
Format result as a dict, including MBID, offset, and distance
Add `threshold` parameter, allowing the ability to return only matches
with a distance below this value
A remove dups value of "samescore" will only remove dups if they have
the same distance score, whereas a value of "all" will remove all
duplicate mbids even if they have have a different score
Because this can be achieved by using the bulk lookup method with just a
single mbid referenced, encourage users to use this endpoint, so that
they remember it exists when they want to perform many lookups at once
For now, we want to show similar items but not receive evaluation
information from users. Once we better know how we'll use this feedback
we will re-enable the functionality.
Unfortunately youtube iframe search no longer works. Remove it until we
find a better solution
We shouldn't store data that should be consumed at the same time in two
different fields, in case they get out of order for some reason. By
replacing it with a jsonb field we can directly add the result of a
similarity lookup, and easily compare it.

This change will require updates to the db methods that read and update
this table, but because we have disabled it for now we'll skip the
change until we re-enable feedback submission.
@alastair alastair merged commit 79054a6 into metabrainz:master Jul 21, 2021
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
3 participants