Skip to content

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Nov 15, 2022

This PR contains a number of miscellaneous fixes that prevented an end-to-end run on USDF. Many of these are preexisting bugs that were never exercised on GCP.

This appears to be a limit of the boto3 API; see #35 for discussion.
The Contents field does not exist if no files are found, so the old
code would crash with a KeyError.
The topic is still hard-coded into the file, until we can decide how
we want topics to be organized.
@kfindeisen kfindeisen marked this pull request as ready for review November 16, 2022 02:23
@kfindeisen kfindeisen requested a review from hsinfang November 16, 2022 18:26
_log.info(f"No raw files found for {instrument}, generating dummy files instead.")
upload_from_random(producer, instrument, dest_bucket, n_groups, new_group_base)
finally:
producer.flush(30.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the warning but I'm not sure if it matters to flush and wait here. Don't they get sent out still?

Copy link
Member Author

@kfindeisen kfindeisen Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the messages get sent, but such warnings usually mean that a class finalizer is doing cleanup that it can't be trusted to do. Also, the correctness of not flushing depends on the lower-level code not using callbacks, and there's no need for main to assume that.

As for waiting, the delay when I tested it was too small to measure.

The test assertion for raw export was originally written in terms of
avoiding the standard behavior, but there's no reason not to test for
the behavior we want instead, and it's easier to read.
This bug is currently caused by repetition of input exposure IDs, but
it may also happen in the future as a result of retries, especially
late-stage ones. Solution is to have the export code check for and only
export the latest run, though this comes at the cost of not being able
to retry previously failed exports.
We're not currently using callbacks, but Producer emits a warning if
we don't flush anyway.
@kfindeisen kfindeisen merged commit fbaf4e3 into main Nov 18, 2022
@kfindeisen kfindeisen deleted the tickets/DM-36999 branch November 18, 2022 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants