Unexpected behavior of missing/changed task `inputs`and `outputs` #1139

bollwyvl · 2024-04-07T17:44:25Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pixi, using pixi --version.

Reproducible example

# pixi.toml
[project]
name = "foo"
channels = ["conda-forge"]
platforms = ["linux-64"]

[dependencies]
sqlite = "*"

[tasks.preflight]
inputs = ["pixi.toml"]
outputs = ["build/preflight.txt", "build/preflight.txt"]
cmd = """
    mkdir -p build
    && cd build
    && echo "$(date) preflight" > preflight.txt
    && echo "$(date) preflight extra" > preflight-extra.txt
    """

[tasks.build]
inputs = ["build/preflight.txt"]
outputs = ["build/built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """

[tasks.test]
inputs = ["build/built.txt"]
outputs = ["build/test.txt"]
cwd = "build"
cmd = """
    rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    """

[tasks.clean]
cmd = ["rm", "-rf", "build"]


### Issue description

The above graph works for the manual "happy path":

```bash 
$> pixi run preflight && pixi run build && pixi run test
✨ Pixi task (default):     rm -f build/preflight.txt
        && mkdir -p build
        && cd build
        && touch preflight.txt preflight-extra.txt

✨ Pixi task (default):     rm -f built.txt
        && cat preflight.txt > built.txt

✨ Pixi task (default): cat built.txt > test.txt

But fails in unexpected ways with most other invocations:

$> pixi run clean && pixi run test
✨ Pixi task (default): rm -rf build
✨ Pixi task (default): cat built.txt > test.txt
  × invalid working directory 'build'

would have expected the same amount of work as as above

Expected behavior

I guess my mental model of inputs and outputs is something like, given the
above graph:

flowchart LR
pixi.toml -- inputs --> preflight -- outputs --> preflight.txt & preflight-extra.txt
preflight.txt -- inputs --> build -- outputs --> built.txt
built.txt -- inputs --> test -- outputs --> test.txt

aside: pixi task list --status --files --markdown >> $GITHUB_STEP_SUMMARY would be amazing

Where deleting a file would either mark a task as directly invalidated (❌)
or indeterminate (❓), since the parent state is unknown:

file	task
	`preflight`	`build`	`test`	`clean`
`pixi.toml`	❌	❓	❓
`preflight.txt`	❌	❌	❓
`preflight-extra.txt`	❌	❓	❓
`built.txt`		❌	❌
`test.txt`			❌

Running a goal task would first ensure all of the parent tasks were valid, and not even bother checking e.g. cwd.

I also see depends_on is a thing, but requires double-encoding of both a concrete file name and a semantic name feels very unsatisfying.

The text was updated successfully, but these errors were encountered:

bollwyvl · 2024-04-07T17:49:30Z

Oh, and got a bit wrapped around the form input: thanks for pixi, and congrats on the recent big feature releases! 😻

ruben-arts · 2024-04-08T06:52:12Z

He @bollwyvl, very nice issue report!

I think there is a misunderstanding of the feature of inputs and outputs. I'm not sure if it is the naming or the documentation but they're not used for task graph generation. Only the depends_on will create a graph. The inputs and outputs are only used to define caching for the task.

[tasks.build]
inputs = ["build/preflight.txt"]
outputs = ["build/built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """
depends_on = ["preflight"] # <<<

[tasks.test]
inputs = ["build/built.txt"]
outputs = ["build/test.txt"]
cwd = "build"
cmd = """
    rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    """
depends_on = ["build"] # <<<

Adding these two depends on creates the required graph.

Ps. I like the idea to generate mermaid graphs from the task definitions. 🤩

bollwyvl · 2024-04-08T13:06:30Z

Ok, adding the explicit depends_on:

diff --git a/pixi.toml b/pixi.toml
index dece1d5..d2aca77 100644
--- a/pixi.toml
+++ b/pixi.toml
@@ -17,6 +17,7 @@ cmd = """
     """
 
 [tasks.build]
+depends_on = ["preflight"]
 inputs = ["build/preflight.txt"]
 outputs = ["build/built.txt"]
 cwd = "build"
@@ -27,6 +28,7 @@ cmd = """
     """
 
 [tasks.test]
+depends_on = ["build"]
 inputs = ["build/built.txt"]
 outputs = ["build/test.txt"]
 cwd = "build"
@@ -38,4 +40,4 @@ cmd = """
     """

And running...

pixi build

Then...

pixi run clean
pixi run test

Yields...

✨ Pixi task (default):     mkdir -p build
    && cd build
    && echo "$(date) preflight" > preflight.txt
    && echo "$(date) preflight extra" > preflight-extra.txt
    
Task can be skipped (cache hit) 🚀

✨ Pixi task (default):     cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    
Task can be skipped (cache hit) 🚀

✨ Pixi task (default):     rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    
Task can be skipped (cache hit) 🚀

...Does not invalidate the graph, and it still thinks everything is fine. It can't be forced to actually run any of the tasks without digging into the .pixi folder.

It's also surprising that while there are a lot of emoji emitted, as well as the full body of the task, the name of the task running does not appear.

My goal (ha) is to for the time one person takes to capture the task dependency information in a pixi-compatible way, however it has to be captured, to yield the benefits for a whole project of other graph-based systems (make, doit, etc), which would include:

getting what you want and expect in one step
cacheability on CI

Some other things that would be lovely that a rock-solid graph can provide:

parallel execution (ideally including env creation) on the same machine
enhancing developer documentation (not just completion) from a dump of the task graph

Having to apply another task tool provisioned by pixi, but defined in another file would make things... even more confusing.

generate mermaid graphs from the task definitions.

As for mermaid: sure, it's really quite nice, and getting more portable. More, in that sadly the github renderer doesn't yet support the elkjs backend, which can do some really nice things at scale:

https://deathbeds.github.io/jupyak/graph.html

ruben-arts · 2024-04-08T13:35:05Z

I agree with you, this output or yours seems a bug. For me it does run the preflight but it doesn't invalidate the cache going down the graph.

Pinging @wolfv to ask for an explanation on the current design.

bollwyvl · 2024-04-10T19:22:38Z

I see today's release 🚀 has a tree command for packages by env: very nice! As a corollary to the markdown-style output, a tig-style pixi task graph would be another magical way to display robust task graph data:

state  name        depends_on  why state?
─────  ──────────  ─────────┬  ─────────────────────────
  ✓    preflight            ●      
  ?    build               ◓╯  1 inputs changed
  ?    test               ○╯   1 depends_on out-of-date   
─────  ──────────  ──────────  ─────────────────────────
       clean                ⊛  
─────  ──────────  ──────────  ─────────────────────────

bollwyvl · 2024-04-13T17:08:41Z

Looking a bit at file_hashes.rs, it seems like it only uses globs, so a missing file indeed wouldn't get picked up. Not sure how to fully reason about it: globs, glob intersections, and missing glob things all get very tricky.

wolfv · 2024-04-16T04:28:26Z

Point taken regarding not logging the skipped task name.

I debugged this and the main issue is that we take into account the cwd for the inputs/outputs tasks.

If you change your pixi.toml to:

[tasks.build]
depends_on = ["preflight"]
inputs = ["preflight.txt"]
outputs = ["built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """

Things start to work.

Now, I am not sure what we should do ... use the cwd or not? I do think that we may not want to use the cwd for the inputs and outputs path computation.

Lastly, we should warn the user when the globset didn't match anything at all so that they can further debug. If inputs or outputs are specified, it can be expected that the user meant to select something.

bollwyvl · 2024-04-16T12:40:29Z

Nice, #1202 looks like a good update, and dropping the cwd is indeed solid. Again, whatever the convention has to be, is fine, as long as its consistent: i think anchoring to the PROJECT_ROOT is the least surprising thing to do. The /build syntax specifically looks weird to me, so not requiring that to be able to use cwd will probably feel good.

Regarding some of the debugging: having a way to introspect from the CLI would be super helpful with e.g. pixi task info [--json] [TASK...], which would complement pixi task graph: from experience, it's not really plausible to display both a linked dual task and file hierarchy for the general case of all tasks in any project, but showing the task-specific slice of both would be very helpful.

bollwyvl added the bug Something isn't working label Apr 7, 2024

bollwyvl mentioned this issue Apr 12, 2024

try pixi deathbeds/jupyterlite-pyodide-lock#5

Closed

wolfv mentioned this issue Apr 16, 2024

fix: use project root directory instead of task.working_directory for base dir when hashing #1202

Merged

Hofer-Julian mentioned this issue Apr 19, 2024

Task caching only works if the directory is a subdirectory of cwd #1237

Closed

2 tasks

bollwyvl mentioned this issue Apr 22, 2024

Lifecycle hooks #1183

Open

bollwyvl mentioned this issue May 7, 2024

More task JSON #1344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior of missing/changed task `inputs`and `outputs` #1139

Unexpected behavior of missing/changed task `inputs`and `outputs` #1139

bollwyvl commented Apr 7, 2024

bollwyvl commented Apr 7, 2024

ruben-arts commented Apr 8, 2024

bollwyvl commented Apr 8, 2024

ruben-arts commented Apr 8, 2024

bollwyvl commented Apr 10, 2024

bollwyvl commented Apr 13, 2024

wolfv commented Apr 16, 2024

bollwyvl commented Apr 16, 2024

Unexpected behavior of missing/changed task inputsand outputs #1139

Unexpected behavior of missing/changed task inputsand outputs #1139

Comments

bollwyvl commented Apr 7, 2024

Checks

Reproducible example

Expected behavior

bollwyvl commented Apr 7, 2024

ruben-arts commented Apr 8, 2024

bollwyvl commented Apr 8, 2024

ruben-arts commented Apr 8, 2024

bollwyvl commented Apr 10, 2024

bollwyvl commented Apr 13, 2024

wolfv commented Apr 16, 2024

bollwyvl commented Apr 16, 2024

Unexpected behavior of missing/changed task `inputs`and `outputs` #1139

Unexpected behavior of missing/changed task `inputs`and `outputs` #1139