Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of missing/changed task inputsand outputs #1139

Open
2 tasks done
bollwyvl opened this issue Apr 7, 2024 · 8 comments
Open
2 tasks done

Unexpected behavior of missing/changed task inputsand outputs #1139

bollwyvl opened this issue Apr 7, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@bollwyvl
Copy link
Contributor

bollwyvl commented Apr 7, 2024

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pixi, using pixi --version.

Reproducible example

# pixi.toml
[project]
name = "foo"
channels = ["conda-forge"]
platforms = ["linux-64"]

[dependencies]
sqlite = "*"

[tasks.preflight]
inputs = ["pixi.toml"]
outputs = ["build/preflight.txt", "build/preflight.txt"]
cmd = """
    mkdir -p build
    && cd build
    && echo "$(date) preflight" > preflight.txt
    && echo "$(date) preflight extra" > preflight-extra.txt
    """

[tasks.build]
inputs = ["build/preflight.txt"]
outputs = ["build/built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """

[tasks.test]
inputs = ["build/built.txt"]
outputs = ["build/test.txt"]
cwd = "build"
cmd = """
    rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    """

[tasks.clean]
cmd = ["rm", "-rf", "build"]


### Issue description

The above graph works for the manual "happy path":

```bash 
$> pixi run preflight && pixi run build && pixi run test
✨ Pixi task (default):     rm -f build/preflight.txt
        && mkdir -p build
        && cd build
        && touch preflight.txt preflight-extra.txt

✨ Pixi task (default):     rm -f built.txt
        && cat preflight.txt > built.txt

✨ Pixi task (default): cat built.txt > test.txt

But fails in unexpected ways with most other invocations:

$> pixi run clean && pixi run test
✨ Pixi task (default): rm -rf build
✨ Pixi task (default): cat built.txt > test.txt
  × invalid working directory 'build'

would have expected the same amount of work as as above

Expected behavior

I guess my mental model of inputs and outputs is something like, given the
above graph:

flowchart LR
pixi.toml -- inputs --> preflight -- outputs --> preflight.txt & preflight-extra.txt
preflight.txt -- inputs --> build -- outputs --> built.txt
built.txt -- inputs --> test -- outputs --> test.txt

aside: pixi task list --status --files --markdown >> $GITHUB_STEP_SUMMARY would be amazing

Where deleting a file would either mark a task as directly invalidated (❌)
or indeterminate (❓), since the parent state is unknown:

file task
preflight build test clean
pixi.toml
preflight.txt
preflight-extra.txt
built.txt
test.txt

Running a goal task would first ensure all of the parent tasks were valid, and not even bother checking e.g. cwd.

I also see depends_on is a thing, but requires double-encoding of both a concrete file name and a semantic name feels very unsatisfying.

@bollwyvl bollwyvl added the bug Something isn't working label Apr 7, 2024
@bollwyvl
Copy link
Contributor Author

bollwyvl commented Apr 7, 2024

Oh, and got a bit wrapped around the form input: thanks for pixi, and congrats on the recent big feature releases! 😻

@ruben-arts
Copy link
Contributor

He @bollwyvl, very nice issue report!

I think there is a misunderstanding of the feature of inputs and outputs. I'm not sure if it is the naming or the documentation but they're not used for task graph generation. Only the depends_on will create a graph. The inputs and outputs are only used to define caching for the task.

[tasks.build]
inputs = ["build/preflight.txt"]
outputs = ["build/built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """
depends_on = ["preflight"] # <<<

[tasks.test]
inputs = ["build/built.txt"]
outputs = ["build/test.txt"]
cwd = "build"
cmd = """
    rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    """
depends_on = ["build"] # <<<

Adding these two depends on creates the required graph.

Ps. I like the idea to generate mermaid graphs from the task definitions. 🤩

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Apr 8, 2024

Ok, adding the explicit depends_on:

diff --git a/pixi.toml b/pixi.toml
index dece1d5..d2aca77 100644
--- a/pixi.toml
+++ b/pixi.toml
@@ -17,6 +17,7 @@ cmd = """
     """
 
 [tasks.build]
+depends_on = ["preflight"]
 inputs = ["build/preflight.txt"]
 outputs = ["build/built.txt"]
 cwd = "build"
@@ -27,6 +28,7 @@ cmd = """
     """
 
 [tasks.test]
+depends_on = ["build"]
 inputs = ["build/built.txt"]
 outputs = ["build/test.txt"]
 cwd = "build"
@@ -38,4 +40,4 @@ cmd = """
     """

And running...

pixi build

Then...

pixi run clean
pixi run test

Yields...

✨ Pixi task (default):     mkdir -p build
    && cd build
    && echo "$(date) preflight" > preflight.txt
    && echo "$(date) preflight extra" > preflight-extra.txt
    
Task can be skipped (cache hit) 🚀

✨ Pixi task (default):     cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    
Task can be skipped (cache hit) 🚀

✨ Pixi task (default):     rm -f test.txt
    && cat built.txt > test.txt
    && sleep 1
    && echo "$(date) tested" >> test.txt
    
Task can be skipped (cache hit) 🚀

...Does not invalidate the graph, and it still thinks everything is fine. It can't be forced to actually run any of the tasks without digging into the .pixi folder.

It's also surprising that while there are a lot of emoji emitted, as well as the full body of the task, the name of the task running does not appear.

My goal (ha) is to for the time one person takes to capture the task dependency information in a pixi-compatible way, however it has to be captured, to yield the benefits for a whole project of other graph-based systems (make, doit, etc), which would include:

  • getting what you want and expect in one step
  • cacheability on CI

Some other things that would be lovely that a rock-solid graph can provide:

  • parallel execution (ideally including env creation) on the same machine
  • enhancing developer documentation (not just completion) from a dump of the task graph

Having to apply another task tool provisioned by pixi, but defined in another file would make things... even more confusing.

generate mermaid graphs from the task definitions.

As for mermaid: sure, it's really quite nice, and getting more portable. More, in that sadly the github renderer doesn't yet support the elkjs backend, which can do some really nice things at scale:

https://deathbeds.github.io/jupyak/graph.html

@ruben-arts
Copy link
Contributor

I agree with you, this output or yours seems a bug. For me it does run the preflight but it doesn't invalidate the cache going down the graph.

Pinging @wolfv to ask for an explanation on the current design.

@bollwyvl
Copy link
Contributor Author

I see today's release 🚀 has a tree command for packages by env: very nice! As a corollary to the markdown-style output, a tig-style pixi task graph would be another magical way to display robust task graph data:

state  name        depends_on  why state?
─────  ──────────  ─────────┬  ─────────────────────────
  ✓    preflight            ●      
  ?    build               ◓╯  1 inputs changed
  ?    test               ○╯   1 depends_on out-of-date   
─────  ──────────  ──────────  ─────────────────────────
       clean                ⊛  
─────  ──────────  ──────────  ─────────────────────────

@bollwyvl
Copy link
Contributor Author

Looking a bit at file_hashes.rs, it seems like it only uses globs, so a missing file indeed wouldn't get picked up. Not sure how to fully reason about it: globs, glob intersections, and missing glob things all get very tricky.

@wolfv
Copy link
Member

wolfv commented Apr 16, 2024

Point taken regarding not logging the skipped task name.

I debugged this and the main issue is that we take into account the cwd for the inputs/outputs tasks.

If you change your pixi.toml to:

[tasks.build]
depends_on = ["preflight"]
inputs = ["preflight.txt"]
outputs = ["built.txt"]
cwd = "build"
cmd = """
    cat preflight.txt > built.txt
    && sleep 1
    && echo "$(date) built" >> built.txt
    """

Things start to work.

Now, I am not sure what we should do ... use the cwd or not? I do think that we may not want to use the cwd for the inputs and outputs path computation.

Lastly, we should warn the user when the globset didn't match anything at all so that they can further debug. If inputs or outputs are specified, it can be expected that the user meant to select something.

@bollwyvl
Copy link
Contributor Author

Nice, #1202 looks like a good update, and dropping the cwd is indeed solid. Again, whatever the convention has to be, is fine, as long as its consistent: i think anchoring to the PROJECT_ROOT is the least surprising thing to do. The /build syntax specifically looks weird to me, so not requiring that to be able to use cwd will probably feel good.

Regarding some of the debugging: having a way to introspect from the CLI would be super helpful with e.g. pixi task info [--json] [TASK...], which would complement pixi task graph: from experience, it's not really plausible to display both a linked dual task and file hierarchy for the general case of all tasks in any project, but showing the task-specific slice of both would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants