Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB-348: Make the musicbrainz import more efficient #281

Merged
merged 11 commits into from Jul 12, 2018

Conversation

Projects
None yet
2 participants
@rsh7
Copy link
Contributor

rsh7 commented Jun 28, 2018

No description provided.

@rsh7 rsh7 changed the base branch from master to musicbrainz-integration-gsoc Jun 28, 2018

@paramsingh
Copy link
Member

paramsingh left a comment

Let's add more documentation as to what parameters these functions take.

This looks okay to me, but we still need to spend some time finding inefficiencies and removing them.

@@ -48,12 +66,12 @@ def load_artist_credit(connection, gids_in_AB, MB_release_data):
return MB_artist_credit_data


def load_artist_type(connection, gids_in_AB):
def load_artist_type(connection, gids_in_AB, artist_credit_from_recording):

This comment has been minimized.

@paramsingh

paramsingh Jul 2, 2018

Member

Is the var gids_in_AB still needed here?

This comment has been minimized.

@rsh7

rsh7 Jul 5, 2018

Author Contributor

No, not needed and I've removed it.

@rsh7 rsh7 force-pushed the rsh7:optimize branch 2 times, most recently from 8ec3fe8 to 1c1dfa3 Jul 5, 2018

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Jul 6, 2018

Data import was pretty fast, but data insertion was taking much time as there were a lot of duplications in the data fetched from many tables. So, I have again included the DISTINCT keyword in the queries. And also improved some queries to get all the data from MusicBrainz db as the number of rows in some tables are less than 20.

@paramsingh

This comment has been minimized.

Copy link
Member

paramsingh commented Jul 11, 2018

Rebase this once, please.

@rsh7 rsh7 force-pushed the rsh7:optimize branch from f7ef7fa to 0bf7156 Jul 11, 2018

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Jul 11, 2018

Rebased, and add a commit for some changes in docstrings of all the functions.

@rsh7 rsh7 force-pushed the rsh7:optimize branch from 99fa5ec to b6159f0 Jul 11, 2018

@paramsingh
Copy link
Member

paramsingh left a comment

docstrings should explicitly list the type of data the function is expecting also.

MB_release_status_data = result.fetchall()

return MB_release_status_data


def load_release_group_primary_type(connection, gids_in_AB, MB_release_group_data):
def load_release_group_primary_type(connection):
"""Fetch release_group_primary_type table data from MusicBrainz database for the

This comment has been minimized.

@paramsingh

paramsingh Jul 11, 2018

Member

now that these functions are fetching entire tables, we should change docs to reflect that.

)
result = connection.execute(area_type_query, filter_data)
area_type_query = text("""
SELECT * FROM area_type

This comment has been minimized.

@paramsingh

paramsingh Jul 11, 2018

Member

this should be

  SELECT column1,
         column2
    FROM area_type
ORDER BY id
@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Jul 12, 2018

Changes made.

@paramsingh paramsingh merged commit 711866c into metabrainz:musicbrainz-integration-gsoc Jul 12, 2018

1 check passed

Jenkins Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.