Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faulty DONE status assignment #3273

Closed
HadiKutabi opened this issue Jan 8, 2024 · 2 comments
Closed

Faulty DONE status assignment #3273

HadiKutabi opened this issue Jan 8, 2024 · 2 comments

Comments

@HadiKutabi
Copy link

It is possible that the outputs of a task don't exist but but still assigned as the status DONE.

Here is an example:

import luigi


class Task1(luigi.Task):

    def run(self):
        with self.output()["foo_TASK_1"].open("w") as f:
            f.write("Hello World")

        with self.output()["bar_TASK_1"].open("w") as f:
            f.write("Hello World")

    def output(self):
        return {
            "foo_TASK_1": luigi.LocalTarget("foo_TASK_1.txt"),
            "bar_TASK_1": luigi.LocalTarget("bar_TASK_1.txt"),
        }


class Task2(luigi.Task):
    def requires(self):
        return Task1()

    def run(self):
        with self.output()["foo_TASK_2"].open("w") as f:
            f.write("Hello World")

    def output(self):
        return {
            "foo_TASK_2": luigi.LocalTarget("foo_TASK_2.txt"),
            "bar_TASK_2": luigi.LocalTarget("bar_TASK_2.txt"),
        }


if __name__ == "__main__":
    luigi.build([Task2()], local_scheduler=True, detailed_summary=True)

If you run this the detailed summary will show that both tasks are successful. However, Task2 cannot be successful because in the run() we only create one of the outputs.

I've traced this error to the worker.py (line 216) and fixed it in a hacky way as follows:

                        # update the cache
                        if self.task_completion_cache is not None:
                            self.task_completion_cache[self.task.task_id] = True
                        status = DONE if self.task.complete() else FAILED
                    elif self.check_complete(self.task):
                        status = DONE
                    else:

Can someone explain to me if my solution makes sense? or why luigi thinks that the task is successful?

Thanks :)

@lallea
Copy link
Contributor

lallea commented Jan 10, 2024

The developer is responsible for ensuring that Task.run creates all outputs. It is documented here: https://github.com/spotify/luigi/blob/master/doc/tasks.rst?plain=1#L158

I suggest closing this issue, since Luigi works as documented.

@RRap0so
Copy link
Contributor

RRap0so commented Jan 14, 2024

Working as intended. Thank you @lallea

@RRap0so RRap0so closed this as completed Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants