-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfulfilled dependencies at run time #1552
Comments
I usually have this kind of problem when the dependency task (in your case, that would be |
Ok, I figured it out. In my I solve this by caching the list of files the first time Sorry about the noise. |
I was under the assumption that Luigi would internally call |
Ah, I spoke too soon. I hit the issue with code that didn't do the requires thing. I think this is an issue of S3's eventual consistency for listing files. All regions support read-after-write for new writes, so maybe that's a better way to go in the S3 module? |
|
I was hitting this occasionally with WrapperTasks that depend on the state of external (not under my control) files. The call to My solution: class CustomWrapperTask(luigi.WrapperTask):
CACHED_REQUIRES = []
def cached_requires(self):
# Only report on the tasks that were originally available on the first call to `requires()`
# A `requires()` method will need to append required tasks to self.CACHED_REQUIRES
# before yielding or returning them. This is backwards compatible for WrapperTasks that
# have not implemented this yet (the `or` below).
#
# https://luigi.readthedocs.io/en/stable/api/luigi.task.html#luigi.task.WrapperTask.complete
return self.CACHED_REQUIRES or self.requires()
def complete(self):
return all(r.complete() for r in self.cached_requires()) class MyWrapperTask(CustomWrapperTask):
...
def requires(self):
for x in range(10):
req = MyOtherTask(i=x)
self.CACHED_REQUIRES.append(req)
yield req or class MyWrapperTask(CustomWrapperTask):
...
def requires(self):
for x in range(10):
req = MyOtherTask(i=x)
self.CACHED_REQUIRES.append(req)
return self.CACHED_REQUIRES |
@kwilcox, thank you for the answer. It seems like CustomWrapperTask should be a bit extended to completely avoid problems with "Unfulfilled dependency". Method deps() should be also added, so CustomWrapperTask will look like following class CustomWrapperTask(luigi.WrapperTask):
CACHED_REQUIRES = []
def cached_requires(self):
# Only report on the tasks that were originally available on the first call to `requires()`
# A `requires()` method will need to append required tasks to self.CACHED_REQUIRES
# before yielding or returning them. This is backwards compatible for WrapperTasks that
# have not implemented this yet (the `or` below).
#
# https://luigi.readthedocs.io/en/stable/api/luigi.task.html#luigi.task.WrapperTask.complete
return self.CACHED_REQUIRES or self.requires()
def complete(self):
return all(r.complete() for r in self.cached_requires())
def deps(self):
return self.cached_requires() It is needed because of this line in luigi TaskProcess. |
I am also getting same error while running shell scripts in parallel. I tried the solutions mentioned but that doesn't solved the issue class Task2(ExternalProgramTask): class TaskParallel1(ExternalProgramTask): class TaskParallel2(ExternalProgramTask): |
@stynejohn Your issue is unrelated. Your tasks don't have any import os
import luigi
from luigi.contrib.external_program import ExternalProgramTask
class Task2(ExternalProgramTask):
def requires(self):
return[TaskParallel1(), TaskParallel2()]
def program_args(self):
return ["echo", "******hi********"]
def run(self):
super().run()
with open(self.__class__.__name__, 'w') as f:
f.write('done')
def output(self):
return luigi.LocalTarget(self.__class__.__name__)
class TaskParallel1(ExternalProgramTask):
def program_args(self):
return ["echo", "one"]
def run(self):
super().run()
with open(self.__class__.__name__, 'w') as f:
f.write('done')
def output(self):
return luigi.LocalTarget(self.__class__.__name__)
class TaskParallel2(ExternalProgramTask):
def program_args(self):
return ["echo", "two"]
def run(self):
super().run()
with open(self.__class__.__name__, 'w') as f:
f.write('done')
def output(self):
return luigi.LocalTarget(self.__class__.__name__) |
I have been getting what seem to be spurious "Unfulfilled dependencies at run time" errors with Luigi 2.0.1. They are fairly sporadic, happening every few hours. Subsequent runs tend to work. Here's is a part of a redacted log:
For context,
FirehoseDateHourTask
is a meta-task that given aTaskParameter
andDateHourParameter
, walks a list of files in S3 with a date-based prefix and yields tasks. It looks like this:What sticks out to me in the log are two things: one, that it marks the task as failed with failed dependencies without even checking them first. It then immediately checks for its dependencies and goes into a pending state.
When I look at the same task run for other date-hours, it doesn't have this problem. It checks first if tasks are finished, then goes into a PENDING state.
It seems like there's a race or some other ordering problem here, but I'm not quite sure how to debug it.
The text was updated successfully, but these errors were encountered: