Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB-346: Import MusicBrainz data for every new recording added to AB db #278

Merged
merged 7 commits into from Jun 27, 2018

Conversation

Projects
None yet
2 participants
@rsh7
Copy link
Contributor

rsh7 commented Jun 20, 2018

I have wrote a script called mb_import.py which runs continously with the time gap of 30 seconds. Every 30 seconds it searches for a new recording if added to low-level table in AcousticBrainz db. If it finds any new recording, MusicBrainz data for the corresponding recording gets imported and thus keeps the musicbrainz schema in AcousticBrainz updated.

@rsh7 rsh7 changed the base branch from master to musicbrainz-integration-gsoc Jun 20, 2018



def main():
logging.info("Checking if any import is required...")

This comment has been minimized.

@paramsingh

paramsingh Jun 21, 2018

Member

This message should be musicbrainz_importer started.

This comment has been minimized.

@rsh7

rsh7 Jun 21, 2018

Author Contributor

👍

@@ -0,0 +1,33 @@
import db

This comment has been minimized.

@paramsingh

paramsingh Jun 21, 2018

Member

let's rename this file to musicbrainz_importer and move it to acousticbrainz-server/musicbrainz_importer/musicbrainz_importer.py

This comment has been minimized.

@rsh7

rsh7 Jun 21, 2018

Author Contributor

Added a new musicbrainz_importer folder.


def get_new_recordings_from_AB():
with db.engine.begin() as connection:
query = text("""SELECT lowlevel.gid

This comment has been minimized.

@paramsingh

paramsingh Jun 21, 2018

Member

You should apply a limit to the number of recordings you import in a batch here. The value should be taken from the config files.

This comment has been minimized.

@rsh7

rsh7 Jun 21, 2018

Author Contributor

I thought the same initially. As the time gap of checking the newly added recordings for import is 30 seconds only, so should we still add a limit here?

This comment has been minimized.

@paramsingh

paramsingh Jun 21, 2018

Member

Good point. I was thinking that what we'd do in prod is just run this script (without running python manage.py mb_import) and ideally it would just import all recordings over time. That would require batches here.

This comment has been minimized.

@rsh7

rsh7 Jun 22, 2018

Author Contributor

Got it. Done! 👶

ON lowlevel.gid = musicbrainz.recording.gid
WHERE musicbrainz.recording.gid is NULL
ORDER BY lowlevel.id
OFFSET :offset

This comment has been minimized.

@paramsingh

paramsingh Jun 22, 2018

Member

Why do you need offset here? The query would work fine without it.

This comment has been minimized.

@rsh7

rsh7 Jun 22, 2018

Author Contributor

Yeah, point. Removing offset.

time.sleep(SLEEP_DURATION)


def get_new_recordings_from_AB():

This comment has been minimized.

@paramsingh

paramsingh Jun 22, 2018

Member

Because this is a Sql query, I think we should move this to the db module. I should have mentioned this in the previous review, sorry.

gids_in_AB = [value[0] for value in gids]
offset = offset + rows_to_fetch

return gids_in_AB, rows_to_fetch

This comment has been minimized.

@paramsingh

paramsingh Jun 22, 2018

Member

Is there a need to return rows_to_fetch?

This comment has been minimized.

@rsh7

rsh7 Jun 22, 2018

Author Contributor

I am returning it to print a message of - Inserting 'rows_to_fetch' no. of recordings.

This comment has been minimized.

@paramsingh

paramsingh Jun 22, 2018

Member

Shouldn't len(gids) be printed instead? It isn't necessary that you'll have rows_to_fetch recordings.

rsh7 added some commits Jun 22, 2018

@@ -0,0 +1,21 @@
import db

This comment has been minimized.

@paramsingh

paramsingh Jun 26, 2018

Member

oh, maybe I was unclear, this should go in db/data.py

This comment has been minimized.

@paramsingh

paramsingh Jun 26, 2018

Member

and we should add a test for this too.

@paramsingh
Copy link
Member

paramsingh left a comment

woo! 🥇

@paramsingh paramsingh merged commit 168b5f5 into metabrainz:musicbrainz-integration-gsoc Jun 27, 2018

1 check passed

Jenkins Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.