-
Notifications
You must be signed in to change notification settings - Fork 0
DM-36999: Complete integration test at USDF #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This appears to be a limit of the boto3 API; see #35 for discussion.
The Contents field does not exist if no files are found, so the old code would crash with a KeyError.
The topic is still hard-coded into the file, until we can decide how we want topics to be organized.
ad32cb8
to
9b4a6e7
Compare
_log.info(f"No raw files found for {instrument}, generating dummy files instead.") | ||
upload_from_random(producer, instrument, dest_bucket, n_groups, new_group_base) | ||
finally: | ||
producer.flush(30.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the warning but I'm not sure if it matters to flush and wait here. Don't they get sent out still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the messages get sent, but such warnings usually mean that a class finalizer is doing cleanup that it can't be trusted to do. Also, the correctness of not flushing depends on the lower-level code not using callbacks, and there's no need for main
to assume that.
As for waiting, the delay when I tested it was too small to measure.
The test assertion for raw export was originally written in terms of avoiding the standard behavior, but there's no reason not to test for the behavior we want instead, and it's easier to read.
This bug is currently caused by repetition of input exposure IDs, but it may also happen in the future as a result of retries, especially late-stage ones. Solution is to have the export code check for and only export the latest run, though this comes at the cost of not being able to retry previously failed exports.
We're not currently using callbacks, but Producer emits a warning if we don't flush anyway.
9b4a6e7
to
4bbce1b
Compare
This PR contains a number of miscellaneous fixes that prevented an end-to-end run on USDF. Many of these are preexisting bugs that were never exercised on GCP.