New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AB-394 - Bulk get items with a single database query #337
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -334,6 +334,7 @@ def write_high_level(mbid, ll_id, data, build_sha1): | |
for model_name, data in json_high.items(): | ||
write_high_level_item(connection, model_name, model_version, ll_id, version_id, data) | ||
|
||
|
||
def load_low_level(mbid, offset=0): | ||
"""Load lowlevel data with the given mbid as a dictionary. | ||
If no offset is given, return the first. If an offset is | ||
|
@@ -359,6 +360,48 @@ def load_low_level(mbid, offset=0): | |
return row[0] | ||
|
||
|
||
def load_many_low_level(recordings): | ||
"""Uses a single query method to collect low-level JSON data for multiple recordings. | ||
|
||
Args: | ||
recordings: A list of tuples (mbid, offset). | ||
|
||
Returns: | ||
|
||
{"mbid1": {"offset1": {lowlevel_data}, | ||
"offsetn": {lowlevel_data}} | ||
... | ||
... | ||
|
||
"mbidn": {"offset1": {lowlevel_data}} | ||
} | ||
|
||
""" | ||
|
||
with db.engine.connect() as connection: | ||
query = text(""" | ||
SELECT gid, submission_offset, data | ||
FROM ( | ||
SELECT id, gid, | ||
ROW_NUMBER () | ||
OVER (PARTITION BY gid | ||
ORDER BY submitted DESC) - 1 submission_offset | ||
FROM lowlevel | ||
) AS partitions | ||
LEFT JOIN lowlevel_json | ||
USING (id) | ||
WHERE (gid, submission_offset) IN :recordings | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alastair here, I've simplified the query using tuples rather than jsonb. I agree that the previous complexity was unnecessary! |
||
""") | ||
|
||
result = connection.execute(query, { 'recordings': tuple(recordings) }) | ||
|
||
recordings_info = {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can use |
||
for row in result.fetchall(): | ||
recordings_info.setdefault(str(row['gid']), {})[str(row['submission_offset'])] = row['data'] | ||
|
||
return recordings_info | ||
|
||
|
||
def load_high_level(mbid, offset=0): | ||
"""Load high-level data for a given MBID.""" | ||
with db.engine.connect() as connection: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -183,6 +183,7 @@ def get_data_for_multiple_recordings(collect_data): | |
|
||
recording_details = {} | ||
|
||
# Loading low level data with multiple query method | ||
for recording_id, offset in recordings: | ||
try: | ||
recording_details.setdefault(recording_id, {})[offset] = collect_data(recording_id, offset) | ||
|
@@ -219,9 +220,22 @@ def get_many_lowlevel(): | |
Takes the form `mbid[:offset];mbid[:offset]`. Offsets are optional, and should | ||
be >= 0 | ||
|
||
:query query_method: *Optional.* Determines which query method should be used | ||
|
||
`single` will use the single-query method. No parameter or another value will revert | ||
to the multiple-query method | ||
|
||
:resheader Content-Type: *application/json* | ||
""" | ||
recording_details = get_data_for_multiple_recordings(db.data.load_low_level) | ||
# Query option allowed for testing purposes: | ||
if request.args.get("query_method") == "single": | ||
# Use single-query method | ||
recordings = check_bad_request_for_multiple_recordings() | ||
recording_details = db.data.load_many_low_level(recordings) | ||
return jsonify(recording_details) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've moved the calls to check_bad_request_for_multiple_recordings() and load_many_low_level(recordings) to make this process easier to follow. However, I've assumed that I should leave the pre-existing flow for the multiple query method as is for now? |
||
else: | ||
# Use multiple-query method | ||
recording_details = get_data_for_multiple_recordings(db.data.load_low_level) | ||
return recording_details | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alastair I've reformatted the query to match the styling that you suggested. I've also decremented the row number within the sub query, so we can use submission_offset in both the outer
SELECT
andWHERE
clauses. I've also ordered the recordings by submission date such that the most recent submission is always offset 0.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this query is much better! Just a small thing, we want to order by ASC - offset 0 is the first submission for a given gid.