Add an operator to check for _SUCCESS files #406
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an operator version of a sensor written in #395. This works in the same way as the
S3FSCheckSuccessSensor
, except it is not a long lived job.I noticed a few things when watching the sensor last night.
__HIVE_DEFAULT_PARTITION__
of S3FSCheckSuccess fails when __HIVE_DEFAULT_PARTITION__ exists #404)2019-01-07
. The logs show that the poke method was never called during the lifetime of the sensor.In light of this, I've disabled the
dataset_alerts
DAG. Instead of using a long-lived sensor, I think a better solution in our current Airflow configuration is to run an operator once (with a few retries), when the dataset should definitely be written to disk. This way, the scheduler should pick up the task to be run once another task has finished. This reuses most of the code from the last PR.There are a couple of small things in this PR too:
snakebite
, breaking python 3 support in pre-1.10.1. I've disabled py36 tests in tox. See https://issues.apache.org/jira/browse/AIRFLOW-2474.