Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API endpoint for getting data count for multiple recordings (AB-294) #221

Merged
merged 11 commits into from
Mar 7, 2017

Conversation

loujine
Copy link
Contributor

@loujine loujine commented Feb 8, 2017

First working draft

I see as possible improvements:

  • refactoring some common code between get_many_lowlevel and get_many_count (core.py)
  • add a NoDataFoundException in db/data.py
  • more tests :) (with at least a count 2 and a count 0)

@loujine
Copy link
Contributor Author

loujine commented Feb 8, 2017

Also, the JSON response format I chose {"mbid1": count1, "mbid2": count2} is not really consistent with the format of the MBID/count endpoint {"count": count, "mbid": mbid}

@loujine
Copy link
Contributor Author

loujine commented Feb 8, 2017

Added tests
Waiting for feedback for the other points

db/data.py Outdated
of MBID."""
with db.engine.connect() as connection:
query = text("""
SELECT gid, COUNT(*)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch the formatting here - we right align SQL keywords

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@alastair alastair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the patch! Some small changes to be made

db/testing.py Outdated
@@ -77,3 +78,22 @@ def load_low_level_data(self, mbid):
"""
with open(self.data_filename(mbid)) as json_file:
db.data.submit_low_level_data(mbid, json.loads(json_file.read()), gid_types.GID_TYPE_MBID)

def load_fake_low_level_data(self, mbid):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a handy method. good idea.
We should probably make the method submit_fake_low_level_data, instead of load_
Can you add a small docstring explaining what it does?

I don't think it's important to have average loudness a random variable. Why did you make it like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name and added a docstring

The random() allows me to submit several times the same mbid as a new separate entry, to test counts > 1. I tried to make that clear in the docstring, I'm not sure whether I managed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now. It would be more explicit to have a separate parameter to this function which you could manually change each time you want a different submission.

This might mean you need a bit more code when adding these items to the database (you won't be able to use a for loop unless you add some extra data), but I think it would make it clearer what the method does.

"More than 200 recordings not allowed per request")

mbids = set(r[0] for r in recordings)
recording_count = {}.fromkeys(mbids, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, I didn't know of this method before. Is the idea here to make sure that all MBIDs have a value in the returned dictionary even if it's 0?

This ties in to #223 a little bit too - we have to decide what to return for missing data.
I would be just as happy to return no data for mbids which are not in the database. In this case it's up to the caller to check if the keys exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all mbids returned by _parse_bulk_params (so I guess all valid musicbrainz mbids, whether they have low-level data associated or not) are initialized with count=0.
I guess invalid mbids (i.e. not existing in muscbrainz) are not returned by _parse_bulk_params? I didn"t check though


mbids = set(r[0] for r in recordings)
recording_count = {}.fromkeys(mbids, 0)
recording_count.update({str(mbid): int(count) for (mbid, count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the db method return this dictionary, instead of formatting it in the view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@alastair
Copy link
Collaborator

alastair commented Mar 2, 2017

Regarding format, what do you think about

{"mbid": {"count": 1}, "mbid": {"count":2}}

too complex?

@loujine
Copy link
Contributor Author

loujine commented Mar 4, 2017

I changed the response format, this looks fine for me (javascript object = easy to parse keys when receiving the response).

I thought you might want to have the same format as the MBID/count endpoint:
[ {"count": count1, "mbid": mbid1}, {"count": count2, "mbid": mbid2}, ... ]
but that make the response more annoying to parse for specific mbids on the client side

Copy link
Collaborator

@alastair alastair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better 👍, just some small tweaks remaining

db/data.py Outdated
IN :mbids
GROUP BY gid;""")
return {str(mbid): int(count) for mbid, count
in connection.execute(query, mbids=tuple(mbids))}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the first review - we use a dict as the second argument to execute: {"mbids": tuple(mbids)}. IN fact, I didn't know that this other syntax was valid, but we should stay consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

db/data.py Outdated
, COUNT(*)
FROM lowlevel
WHERE gid
IN :mbids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, we write this as WHERE gid IN :mbids, all on a single line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. I introduced this mistake when rereading my code just before pushing... it was fine beforehand :)


mbids = set(r[0] for r in recordings)
recording_count = {}.fromkeys(mbids, {'count': 0})
for (mbid, count) in six.iteritems(db.data.count_many_lowlevel(mbids)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided in #223 to not list an mbid in the response if it doesn't exist in the database. I think we should do the same here, this means we can just return jsonify(db.data.count_many_lowlevel(mbids)).
The documentation should say this - that if a mbid doesn't exist in the database then it is omitted from the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@alastair alastair self-assigned this Mar 5, 2017
@alastair
Copy link
Collaborator

alastair commented Mar 5, 2017

I like the format better now. It fits in with the format of the bulk get method more. You're right, returning a list would be more annoying to consume by the client.

loujine added 3 commits March 6, 2017 00:37
I introduced a mistake in the last fix commit
Dictionary argument, not keyword argument
@alastair alastair merged commit 6b23f30 into metabrainz:master Mar 7, 2017
@alastair
Copy link
Collaborator

alastair commented Mar 7, 2017

🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants