Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of multiple items (pinned recordings) having same mbid/msid #1831

Merged
merged 4 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
53 changes: 28 additions & 25 deletions listenbrainz/db/msid_mbid_mapping.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import defaultdict
from typing import Iterable, Dict, List, TypeVar
from typing import Optional, Tuple

Expand Down Expand Up @@ -79,47 +80,49 @@ def fetch_track_metadata_for_items(items: List[ModelT]) -> List[ModelT]:
Returns:
The given list of MsidMbidModel objects with updated track_metadata.
"""
msid_item_map, mbid_item_map = {}, {}
# it is possible to that multiple items have same msid/mbid. for example, pinning the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit unclear. Need to remove a stray "to"?

# same recording twice. since the dict is keyed by mbid/msid the corresponding value
# should be a iterable of all items having that mbid
msid_item_map, mbid_item_map = defaultdict(list), defaultdict(list)
for item in items:
if item.recording_mbid:
mbid_item_map[item.recording_mbid] = item
mbid_item_map[item.recording_mbid].append(item)
else:
msid_item_map[item.recording_msid] = item
msid_item_map[item.recording_msid].append(item)

msid_metadatas = load_recordings_from_msids(msid_item_map.keys())
for metadata in msid_metadatas:
msid = metadata["ids"]["recording_msid"]
item = msid_item_map[msid]
item.track_metadata = {
"track_name": metadata["payload"]["title"],
"artist_name": metadata["payload"]["artist"],
"additional_info": {
"recording_msid": msid
for item in msid_item_map[msid]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I iterate one item via another, I always consider what happens if it gets out of sync for some reason - e.g. what if msid_item_map[msid] doesn't exist? One solution would be to use for item in msd_item_map.get(msid, [])

That being said, I see that similar code has been in use before this, so probably not a great issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is deleted from MsB ever so don't think this should be an issue. Also, if this is missing we need to fix code elsewhere as well because other places in codebase assume presence of title, artist name.

item.track_metadata = {
"track_name": metadata["payload"]["title"],
"artist_name": metadata["payload"]["artist"],
"additional_info": {
"recording_msid": msid
}
}
}

mapping_mbid_metadata, mapping_msid_metadata = load_recordings_from_mapping(mbid_item_map.keys(), msid_item_map.keys())
_update_items_from_map(mbid_item_map, mapping_mbid_metadata)
_update_items_from_map(msid_item_map, mapping_msid_metadata)
return items


def _update_items_from_map(models: Dict[str, ModelT], metadatas: Dict) -> Dict[str, ModelT]:
for _id, item in models.items():
def _update_items_from_map(models: Dict[str, Iterable[ModelT]], metadatas: Dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment saying that it updates models in place would be helpful

for _id, items in models.items():
if _id not in metadatas:
continue

metadata = metadatas[_id]
item.track_metadata = {
"track_name": metadata["title"],
"artist_name": metadata["artist"],
"release_name": metadata["release"],
"additional_info": {
"recording_msid": metadata["recording_msid"],
"recording_mbid": metadata["recording_mbid"],
"release_mbid": metadata["release_mbid"],
"artist_mbids": metadata["artist_mbids"]
for item in items:
metadata = metadatas[_id]
item.track_metadata = {
"track_name": metadata["title"],
"artist_name": metadata["artist"],
"release_name": metadata["release"],
"additional_info": {
"recording_msid": metadata["recording_msid"],
"recording_mbid": metadata["recording_mbid"],
"release_mbid": metadata["release_mbid"],
"artist_mbids": metadata["artist_mbids"]
Comment on lines +120 to +127
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related to this PR, but are we always going to be sure that the metadata dict has all of these fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are the exact fields selected in load_recordings_from_mapping (above in this file) and this function is only called with data from that query.

}
}
}

return models
18 changes: 18 additions & 0 deletions listenbrainz/db/tests/test_msid_mbid_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,21 @@ def test_fetch_track_metadata_for_items(self):
self.assertEqual(metadata["additional_info"]["recording_msid"], recording["recording_msid"])
self.assertEqual(metadata["additional_info"]["release_mbid"], recording["release_mbid"])
self.assertEqual(metadata["additional_info"]["artist_mbids"], recording["artist_mbids"])

def test_fetch_track_metadata_for_items_with_same_mbid(self):
recording = self.insert_recordings()[0]
models = [
MsidMbidModel(recording_msid=recording["recording_msid"], recording_mbid=recording["recording_mbid"]),
MsidMbidModel(recording_msid=recording["recording_msid"], recording_mbid=recording["recording_mbid"]),
]
models = fetch_track_metadata_for_items(models)
for model in models:
metadata = model.track_metadata
self.assertEqual(metadata["track_name"], recording["title"])
self.assertEqual(metadata["artist_name"], recording["artist"])
self.assertEqual(metadata["release_name"], recording["release"])
self.assertEqual(metadata["additional_info"]["recording_mbid"], recording["recording_mbid"])
self.assertEqual(metadata["additional_info"]["recording_msid"], recording["recording_msid"])
self.assertEqual(metadata["additional_info"]["release_mbid"], recording["release_mbid"])
self.assertEqual(metadata["additional_info"]["artist_mbids"], recording["artist_mbids"])