-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EmrEtlRunner: fix srcPattern for copying stream enriched data to HDFS #3722
Comments
What's our chance of getting this into R103? |
This is already implemented, starting to test it. So question is for @BenFradet. |
103 has been in code freeze for over a week, so I'm 👎 |
plus it'd make sense to integrate at least #3719 with this |
Makes sense - let's start sketching out the next high priority batch release. @chuwy can you compose that milestone please. |
Ok, will do. |
Turns out fix should (can) be different from one described in title. When we're resuming from shred in batch-enrich mode, S3DistCp knows nothing about run-folder and also uses |
Don't understand why, but |
this is locking us down to a particular file format (gz) which I don't really like :( |
Do you mean because we can support other formats in future? AFAIK |
I'll try to test it with |
Ok, |
As long as there are no other files being unnecessarily moved, yes |
Ok, no unnecessarily files were moved. Pushing rc then. |
Sorry, I was wrong this regex handles |
Related to #3717
When we're resuming from shred in stream enrich mode, S3DistCp tries to copy data from
enriched/good
, not fromenriched/good/run=2018...
and fails because of that.Due this bug, we can recover R102 stream enrich mode only by re-staging enriched data back to
enriched.stream
.The text was updated successfully, but these errors were encountered: