Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB-350: Update musicbrainz schema whenever actual MusicBrainz database is updated #282

Merged
merged 26 commits into from Aug 14, 2018

Conversation

Projects
None yet
2 participants
@rsh7
Copy link
Contributor

rsh7 commented Jul 7, 2018

This PR contains scripts to apply replications packets to musicbrainz schema in AcousticBrainz db to keep it updated with the actual Musicbrainz db.

@rsh7 rsh7 changed the base branch from master to musicbrainz-integration-gsoc Jul 7, 2018

@rsh7 rsh7 force-pushed the rsh7:apply_replication branch 2 times, most recently from bca7940 to 6eb9241 Jul 12, 2018

@paramsingh paramsingh force-pushed the metabrainz:musicbrainz-integration-gsoc branch 3 times, most recently from 711866c to 83604b1 Jul 24, 2018

@rsh7 rsh7 force-pushed the rsh7:apply_replication branch 2 times, most recently from 681d2f5 to b43b16f Jul 24, 2018

@paramsingh
Copy link
Member

paramsingh left a comment

There is a major need of docstrings in this PR.

Most of these functions would be much easily readable if I knew a gist of what they're supposed to be doing.



def get_data_from_musicbrainz(table_name, data, column='id'):
with musicbrainz_db.engine.begin() as connection:

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

Why are you starting a transaction to get data? Would a normal connect not be ok?

This comment has been minimized.

@rsh7

rsh7 Jul 27, 2018

Author Contributor

Actually, for the case of Integrity Error, we need to roll back a specific transaction and then start it again after getting the corresponding data.

transaction.commit()


def get_data_from_musicbrainz(table_name, data, column='id'):

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

Docstring missing.

This comment has been minimized.

@rsh7

rsh7 Jul 27, 2018

Author Contributor

Yeah, I intended to show you the code first and then work on docstrings.
In next commit :)

query = text("""
SELECT *
FROM %s
WHERE %s=%s

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

We should probably pass these variables into the query using parameters and corresponding dict.

result = connection.execute(text("""
               SELECT *
                 FROM :table
                WHERE :column = :data
    """), {
	'table': table,
	'column': column,
	'data': data,
    })
@@ -0,0 +1,264 @@
import tarfile

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

Add a license with copyright Lukas and you here

This comment has been minimized.

@rsh7

rsh7 Jul 27, 2018

Author Contributor

🎆

# print ' - Statistics:'
# for table in sorted(stats.keys()):
# print ' * %-30s\t%d\t%d' % (table, stats[table]['u'], stats[table]['d'])
print secsy

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

We should from __future__ import print_function and use print(secsy) here and everywhere else, so that migration to Python 3 in the future is easy.

This comment has been minimized.

@rsh7

rsh7 Jul 27, 2018

Author Contributor

Done!

update_row(sql, params, connection, trans)
print 'Updated rows in ' + table + ' table'
print 'COMMIT; --', xid
# print ' - Statistics:'

This comment has been minimized.

@paramsingh

paramsingh Jul 26, 2018

Member

Commented out code should be removed.

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Jul 30, 2018

@paramsingh, I am working on clean up for this script a bit. will push final commits tomorrow :)

@rsh7 rsh7 force-pushed the rsh7:apply_replication branch from 15baf97 to 754361a Aug 4, 2018

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Aug 5, 2018

@brainzbot retest this please

@paramsingh

This comment has been minimized.

Copy link
Member

paramsingh commented Aug 5, 2018

@brainzbot retest this please

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Aug 5, 2018

@paramsingh It's displaying an operational error that 'acousticbrainz' doesn't exist. Can you please take a look? Any access grant problem might be there.

@paramsingh

This comment has been minimized.

Copy link
Member

paramsingh commented Aug 5, 2018

Yeah, that's probably me. I'll fix and rerun the tests

@paramsingh paramsingh force-pushed the metabrainz:musicbrainz-integration-gsoc branch from a5f4506 to 7e7bda1 Aug 5, 2018

@rsh7 rsh7 force-pushed the rsh7:apply_replication branch from 7f51a0e to 61e72b7 Aug 5, 2018

@rsh7

This comment has been minimized.

Copy link
Contributor Author

rsh7 commented Aug 5, 2018

Working now. 🎉

token = None

with musicbrainz_db.engine.begin() as connection:
query = text("""

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

all queries should go inside the db module. Also if you just want the first, do a LIMIT 1 as well.

print (schema_seq)

with db.engine.begin() as connection:
query = text("""

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

this query should also be inside the db module

replication_seq: The number of the replication packet.
"""
print ("Processing", fileobj.name)
tar = tarfile.open(fileobj=fileobj, mode='r:bz2')

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

is this file being closed somewhere?

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

using a with clause might be good here?

This comment has been minimized.

@rsh7

rsh7 Aug 12, 2018

Author Contributor

Closed the file.

with db.engine.begin() as connection:
query = text("""
UPDATE musicbrainz.replication_control
SET current_replication_sequence = %s""" % (replication_seq)

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

use the correct way here please: #282 (comment)

This comment has been minimized.

@rsh7

rsh7 Aug 12, 2018

Author Contributor

done!

database.
"""
with musicbrainz_db.engine.begin() as connection:
query = text("""

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

I think you misunderstood what I said..format is basically equivalent to the % formatting we were using earlier.

I would like to create a sqlalchemy text object with parameters and then pass parameters to this in a dict. This leads to sqlalchemy handling cases which may cause sql injection etc.

This comment has been minimized.

@rsh7

rsh7 Aug 12, 2018

Author Contributor

Okay, done! :)

values: Data values of the rows to insert into the tables.
"""
trans = connection.begin()
query = text("""

This comment has been minimized.

@paramsingh

paramsingh Aug 5, 2018

Member

same thing here.

I think you misunderstood what I said..format is basically equivalent to the % formatting we were using earlier.

I would like to create a sqlalchemy text object with parameters and then pass parameters to this in a dict. This leads to sqlalchemy handling cases which may cause sql injection etc.

@rsh7 rsh7 force-pushed the rsh7:apply_replication branch from 61e72b7 to 7e5190b Aug 7, 2018

@paramsingh paramsingh merged commit 0253dfa into metabrainz:musicbrainz-integration-gsoc Aug 14, 2018

1 check passed

Jenkins Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.