Baseline Similarity Evaluation with Annoy #364
Baseline Similarity Evaluation with Annoy
This PR contains the full body of similarity work on AB using Annoy.
The use of this evaluation, in combination with a variety of parameters
The text was updated successfully, but these errors were encountered:
*Note* The functions get_similar_recordings for postgres similarity, and get_all_metrics are written by Philip Tovstogan for his thesis work on recording similarity. get_similar_recordings is altered slightly for our purposes, however get_all_metrics is not altered. Philip's work can be found here: https://zenodo.org/record/1479769#.XQ1rZNNKh25
Comment last updated at 2021-07-21 16:53:11 UTC
Get MBID and submission offset for all similar recordings in a single SQL query instead of one query per item Format result as a dict, including MBID, offset, and distance Add `threshold` parameter, allowing the ability to return only matches with a distance below this value
A remove dups value of "samescore" will only remove dups if they have the same distance score, whereas a value of "all" will remove all duplicate mbids even if they have have a different score
We shouldn't store data that should be consumed at the same time in two different fields, in case they get out of order for some reason. By replacing it with a jsonb field we can directly add the result of a similarity lookup, and easily compare it. This change will require updates to the db methods that read and update this table, but because we have disabled it for now we'll skip the change until we re-enable feedback submission.