Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB-394 - Bulk get items with a single database query #337

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
43 changes: 43 additions & 0 deletions db/data.py
Expand Up @@ -334,6 +334,7 @@ def write_high_level(mbid, ll_id, data, build_sha1):
for model_name, data in json_high.items():
write_high_level_item(connection, model_name, model_version, ll_id, version_id, data)


def load_low_level(mbid, offset=0):
"""Load lowlevel data with the given mbid as a dictionary.
If no offset is given, return the first. If an offset is
Expand All @@ -359,6 +360,48 @@ def load_low_level(mbid, offset=0):
return row[0]


def load_many_low_level(recordings):
"""Uses a single query method to collect low-level JSON data for multiple recordings.

Args:
recordings: A list of tuples (mbid, offset).

Returns:

{"mbid1": {"offset1": {lowlevel_data},
"offsetn": {lowlevel_data}}
...
...

"mbidn": {"offset1": {lowlevel_data}}
}

"""

with db.engine.connect() as connection:
query = text("""
SELECT gid, submission_offset, data
FROM (
SELECT id, gid,
ROW_NUMBER ()
OVER (PARTITION BY gid
ORDER BY submitted DESC) - 1 submission_offset
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alastair I've reformatted the query to match the styling that you suggested. I've also decremented the row number within the sub query, so we can use submission_offset in both the outer SELECT and WHERE clauses. I've also ordered the recordings by submission date such that the most recent submission is always offset 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this query is much better! Just a small thing, we want to order by ASC - offset 0 is the first submission for a given gid.

FROM lowlevel
) AS partitions
LEFT JOIN lowlevel_json
USING (id)
WHERE (gid, submission_offset) IN :recordings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alastair here, I've simplified the query using tuples rather than jsonb. I agree that the previous complexity was unnecessary!

""")

result = connection.execute(query, { 'recordings': tuple(recordings) })

recordings_info = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use defaultdict(dict) here, then you dont need to use setdefault below

for row in result.fetchall():
recordings_info.setdefault(str(row['gid']), {})[str(row['submission_offset'])] = row['data']

return recordings_info


def load_high_level(mbid, offset=0):
"""Load high-level data for a given MBID."""
with db.engine.connect() as connection:
Expand Down
16 changes: 15 additions & 1 deletion webserver/views/api/v1/core.py
Expand Up @@ -183,6 +183,7 @@ def get_data_for_multiple_recordings(collect_data):

recording_details = {}

# Loading low level data with multiple query method
for recording_id, offset in recordings:
try:
recording_details.setdefault(recording_id, {})[offset] = collect_data(recording_id, offset)
Expand Down Expand Up @@ -219,9 +220,22 @@ def get_many_lowlevel():
Takes the form `mbid[:offset];mbid[:offset]`. Offsets are optional, and should
be >= 0

:query query_method: *Optional.* Determines which query method should be used

`single` will use the single-query method. No parameter or another value will revert
to the multiple-query method

:resheader Content-Type: *application/json*
"""
recording_details = get_data_for_multiple_recordings(db.data.load_low_level)
# Query option allowed for testing purposes:
if request.args.get("query_method") == "single":
# Use single-query method
recordings = check_bad_request_for_multiple_recordings()
recording_details = db.data.load_many_low_level(recordings)
return jsonify(recording_details)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the calls to check_bad_request_for_multiple_recordings() and load_many_low_level(recordings) to make this process easier to follow. However, I've assumed that I should leave the pre-existing flow for the multiple query method as is for now?

else:
# Use multiple-query method
recording_details = get_data_for_multiple_recordings(db.data.load_low_level)
return recording_details


Expand Down