-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API endpoint for getting data count for multiple recordings (AB-294) #221
Conversation
Also, the JSON response format I chose {"mbid1": count1, "mbid2": count2} is not really consistent with the format of the MBID/count endpoint {"count": count, "mbid": mbid} |
Added tests |
db/data.py
Outdated
of MBID.""" | ||
with db.engine.connect() as connection: | ||
query = text(""" | ||
SELECT gid, COUNT(*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch the formatting here - we right align SQL keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the patch! Some small changes to be made
db/testing.py
Outdated
@@ -77,3 +78,22 @@ def load_low_level_data(self, mbid): | |||
""" | |||
with open(self.data_filename(mbid)) as json_file: | |||
db.data.submit_low_level_data(mbid, json.loads(json_file.read()), gid_types.GID_TYPE_MBID) | |||
|
|||
def load_fake_low_level_data(self, mbid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a handy method. good idea.
We should probably make the method submit_fake_low_level_data
, instead of load_
Can you add a small docstring explaining what it does?
I don't think it's important to have average loudness a random variable. Why did you make it like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name and added a docstring
The random() allows me to submit several times the same mbid as a new separate entry, to test counts > 1. I tried to make that clear in the docstring, I'm not sure whether I managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand now. It would be more explicit to have a separate parameter to this function which you could manually change each time you want a different submission.
This might mean you need a bit more code when adding these items to the database (you won't be able to use a for loop unless you add some extra data), but I think it would make it clearer what the method does.
webserver/views/api/v1/core.py
Outdated
"More than 200 recordings not allowed per request") | ||
|
||
mbids = set(r[0] for r in recordings) | ||
recording_count = {}.fromkeys(mbids, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, I didn't know of this method before. Is the idea here to make sure that all MBIDs have a value in the returned dictionary even if it's 0?
This ties in to #223 a little bit too - we have to decide what to return for missing data.
I would be just as happy to return no data for mbids which are not in the database. In this case it's up to the caller to check if the keys exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all mbids returned by _parse_bulk_params (so I guess all valid musicbrainz mbids, whether they have low-level data associated or not) are initialized with count=0.
I guess invalid mbids (i.e. not existing in muscbrainz) are not returned by _parse_bulk_params? I didn"t check though
webserver/views/api/v1/core.py
Outdated
|
||
mbids = set(r[0] for r in recordings) | ||
recording_count = {}.fromkeys(mbids, 0) | ||
recording_count.update({str(mbid): int(count) for (mbid, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the db method return this dictionary, instead of formatting it in the view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Regarding format, what do you think about {"mbid": {"count": 1}, "mbid": {"count":2}} too complex? |
I changed the response format, this looks fine for me (javascript object = easy to parse keys when receiving the response). I thought you might want to have the same format as the MBID/count endpoint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better 👍, just some small tweaks remaining
db/data.py
Outdated
IN :mbids | ||
GROUP BY gid;""") | ||
return {str(mbid): int(count) for mbid, count | ||
in connection.execute(query, mbids=tuple(mbids))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this in the first review - we use a dict as the second argument to execute: {"mbids": tuple(mbids)}
. IN fact, I didn't know that this other syntax was valid, but we should stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
db/data.py
Outdated
, COUNT(*) | ||
FROM lowlevel | ||
WHERE gid | ||
IN :mbids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, we write this as WHERE gid IN :mbids
, all on a single line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. I introduced this mistake when rereading my code just before pushing... it was fine beforehand :)
webserver/views/api/v1/core.py
Outdated
|
||
mbids = set(r[0] for r in recordings) | ||
recording_count = {}.fromkeys(mbids, {'count': 0}) | ||
for (mbid, count) in six.iteritems(db.data.count_many_lowlevel(mbids)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided in #223 to not list an mbid in the response if it doesn't exist in the database. I think we should do the same here, this means we can just return jsonify(db.data.count_many_lowlevel(mbids))
.
The documentation should say this - that if a mbid doesn't exist in the database then it is omitted from the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
I like the format better now. It fits in with the format of the bulk get method more. You're right, returning a list would be more annoying to consume by the client. |
I introduced a mistake in the last fix commit
Dictionary argument, not keyword argument
🥇 |
First working draft
I see as possible improvements: