New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmrEtlRunner: treat archive_enriched and archive_shredded as separate steps #3401
Comments
/cc @stdfalse |
The current behavior is: if we skip Do we want to convert that to: if we skip and if we skip ? |
Can you clarify the nested booleans, "if |
I think @chuwy wrote this recovery code? I'm not familiar with it. |
yup |
Sorry for delay here. Regarding code: if we're in EmrEtlRunner can decides we're in Does it answer a question? |
Would you guys mind jumping on a Zoom together? Can be tomorrow morning of course, but we can't afford to introduce a regression here... |
At the moment we confusingly use the
archive_enriched
step to refer to archiving enriched and shredded. This is problematic if a user is running Snowplow without shredding & loading data into Redshift, because:archive_enriched
, then the enriched events are left in that folder and the next run can't startarchive_shredded
, then the S3DistCp trying to move the shredded data will fail due to no data being presentNote that I am open to other suggestions (e.g. hardening the S3DistCp step), but the solution of treat archive_enriched and archive_shredded as separate steps seems fairly clean and simple.
The text was updated successfully, but these errors were encountered: