Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AB-406 Part 2: Offset for missing entries, offset computed on submission #340

Merged
merged 8 commits into from Jun 3, 2019

Conversation

@aidanlw17
Copy link
Contributor

commented May 10, 2019

AB-406 Part 2:

Add offset for missing items

When submissions are stopped, this migration adds the submission
offset to all rows that have submission offset null. This ensures that
there are no rows missing an offset.

Compute offset on submission

Computes submission when write_low_level is called, and offset is
inserted as part of the low level data when it is written. This step
ensures that all submissions in the future are tracked with an offset
as soon as they enter the database.

@aidanlw17 aidanlw17 marked this pull request as ready for review May 10, 2019

db/data.py Outdated
result = connection.execute(query, {"mbid": mbid})

row = result.fetchone()
if row[0] is not None:

This comment has been minimized.

Copy link
@alastair

alastair May 15, 2019

Contributor

if you give a column alias in the sql statement then you can use row['max_offset'] here to make it a bit more explicit what we're reading

@@ -46,7 +46,7 @@ def test_get_last_submitted_recordings(self, dbset, dbget):
"version": {"essentia_build_sha": "sha"},
},
}
db.data.write_low_level(rand_mbid, data, gid_types.GID_TYPE_MBID)
db.data.write_low_level(str(rand_mbid), data, gid_types.GID_TYPE_MBID)

This comment has been minimized.

Copy link
@alastair

alastair May 15, 2019

Contributor

did this stop working without the call to str()? I'm interested to know if we should make sure that other places that call this still work - do all places that call this method in the main code pass a string, or a uuid object?

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 May 18, 2019

Author Contributor

This did stop working without the call to str(), and I believe that it was because I casted the gid to text in the query for submission offset. This was the error:
Screen Shot 2019-05-18 at 12 00 20 PM

I removed the cast to text in the query and removing the call to str() allowed the test to pass. The api endpoint for submit_low_level makes a call to str() on the mbid, but in db/data.py the function submit_low_level_data also casts the mbid as a string before passing it to write_low_level.

So typically, the main code passes a string rather than a uuid. But it appears to me that the comparison between gid in the lowlevel table and a string mbid works without using gid::text. So by removing gid::text, we can then use write_low_level with both strings and uuids?

@@ -200,7 +218,8 @@ def _insert_lowlevel_json(connection, ll_id, data_json, data_sha256, version_id)
return

try:
ll_id = _insert_lowlevel(connection, mbid, build_sha1, is_lossless_submit, is_mbid)
submission_offset = _get_submission_offset(connection, mbid)
ll_id = _insert_lowlevel(connection, mbid, build_sha1, is_lossless_submit, is_mbid, submission_offset)

This comment has been minimized.

Copy link
@alastair

alastair May 15, 2019

Contributor

We should have a test here to ensure that the offset is the expected value (0 if we add an mbid once, 1 if we add it twice, and another mbid will be 0 again)

db/data.py Outdated
query = text("""
SELECT MAX(submission_offset) as max_offset
FROM lowlevel
WHERE gid = :mbid

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 May 18, 2019

Author Contributor

@alastair I've removed the cast gid::text here, which allowed the query to work without a call to str() in test_stats file.

@@ -72,6 +77,34 @@ def test_submit_low_level_data_missing_keys(self, clean, write, sanity):
db.data.submit_low_level_data(self.test_mbid, self.test_lowlevel_data, gid_types.GID_TYPE_MBID)


def test_write_low_level_submission_offset(self):

This comment has been minimized.

Copy link
@aidanlw17

aidanlw17 May 18, 2019

Author Contributor

@alastair This is a test to ensure the offset value is expected. Given that _get_submission_offset is a nested function, I was not sure about the best way to use unittestt. Does querying for the offset after writing data each time here seem like an appropriate method?

This comment has been minimized.

Copy link
@alastair

alastair May 20, 2019

Contributor

this test is fine, for now let's leave the duplicated code. If we need the same functionality somewhere else we should copy it out in the data module

@alastair
Copy link
Contributor

left a comment

This looks good, thanks! Once we've merged and processed part 1, we can re-base and release this one

@alastair

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

Can you rebase this against master so that it's ready to merge when we deploy part 1?

aidanlw17 added 6 commits May 10, 2019
Missing items offset, compute on submission
This commit includes the following changees as the second part
of AB-406:

- Migration adds submission offsets where they are missing when
  submissions are stopped
- Computes submission offset when writing low level data at the
  time of submission
Minor sql transaction fix
Fixes a minor error in sql transaction keyword
Change to submission offset logic
- Checks the max submission_offset is not null, however allows it
  to be zero.

- Casts gid uuid type to text for comparison with mbid string
Cast uuid mbid as string in unit test
- Cast a randomly generated uuid to a string for call to
  write_low_level during unit test
Add test for writing lowlevel offset
This commit includes the following changes:

- Creates a test to check the successful assingment of a submission
  offset when writing lowlevel data.

- Removes cast gid::text when querying for max submission offset,
  this cast is unnecessary.

- Creates an alias for max submission offset to improve clarity.
Removes nesting of get_next_submission_offset
- Moves get_next_submission_offset to top level of data module
- Refactors test_data to account for this change

@aidanlw17 aidanlw17 force-pushed the aidanlw17:ab-406-2 branch from a795503 to 5eaf188 May 23, 2019

alastair added 2 commits Jun 3, 2019
Reformat data files
Remove some unused code
Remove unneeded update script.
We can continue using the management script to update offsets

@alastair alastair merged commit 14fdaef into metabrainz:master Jun 3, 2019

1 check was pending

Jenkins Waiting in build queue...
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.