Skip to content

fix new extract logic#996

Open
valoq wants to merge 3 commits into
ouch-org:mainfrom
valoq:flatten
Open

fix new extract logic#996
valoq wants to merge 3 commits into
ouch-org:mainfrom
valoq:flatten

Conversation

@valoq
Copy link
Copy Markdown
Collaborator

@valoq valoq commented May 20, 2026

This PR ensures that extraction always happens inside the new dir that is created before the archive is parsed. This is a requirement for future sandbox implementations and it fixes the behavior that is intended to always produce a directory with the archive name. Previously archives with single files would be flatten and end up with files in the $CWD.

@valoq valoq marked this pull request as ready for review May 20, 2026 13:12
@marcospb19
Copy link
Copy Markdown
Member

marcospb19 commented May 25, 2026

I love that non-archives are flattened by default, so, I'll investigate further into landlock to see if it's possible to workaround this, if not we are merging.

I'd like a workaround like:

  1. Prohibit writing to current folder.
  2. Allow writing only to the new file.

That'd be slightly more complex since our Landlock implementation would need to be aware of whether or not we are about to decompress a single-file format (non-archive), and also, know that path.

But if we achieve that we don't have to tweak this behavior.

@valoq
Copy link
Copy Markdown
Collaborator Author

valoq commented May 26, 2026

I think you are right that this can be avoided. I will need to check again if its worth it but it also depends on the archive format;
For single-stream formats (gz, bz2, xz, zst, lz4, etc.) where the output is one file with a name derived purely from the input filename we can open a FD before parsing the archive, then restrict the runtime and then do the dangerous extraction.
For multi-file archives (tar/zip/7z/rar) this cannot work though sicne we have to parse the contents of the archives before we know the target filename/structure.

Can you tell me what use case you would like to see supported?

@marcospb19
Copy link
Copy Markdown
Member

marcospb19 commented May 26, 2026

For single-stream formats (gz, bz2, xz, zst, lz4, etc.) where the output is one file with a name derived purely from the input filename we can open a FD before parsing the archive, then restrict the runtime and then do the dangerous extraction.

Yeah, when I mentioned a workaround, I was strictly thinking about this scenario.

But,

... Previously archives with single files would be flatten and end up with files in the $CWD.

This ⬆️ is usually not the case anymore.

Please correct me if I'm wrong, but in the latest main and since 0.8.0 there is no automatic flattening by default anymore.

Decompression will happen inside a new directory unless one of these scenarios happen:

  1. ouch d archive.zip and archive already exists and user chose merge, will reuse that folder (can we sandbox that folder? Ouch needs to write to it, so it can also read existing files?).
  2. ouch d archive.zip --dir . or ouch d archive.zip --here will do it all inside of $CWD, so we can't sandbox $CWD?
  3. Non-archive formats like we discussed.

In summary, when decompression in $CWD happens:

  1. User chose merge
  2. User explicitly asked to decompress to $CWD (just works now like merge cause we don't ask to delete $CWD).
  3. Non-archives are decompressed in $CWD.

And, I believe this PR doesn't tackle these 3 because it was based on an assumption for something that changed recently, can you confirm if that's right?

@ofek
Copy link
Copy Markdown

ofek commented May 27, 2026

Here's a Nushell script I used to confirm that the latest version already keeps single-file archives under a new directory:

Details
let work = ($env.PWD | path join "ouch-pr-996-behavior")
rm -rf $work
mkdir $work
cd $work

git clone https://github.com/ouch-org/ouch.git repo
cd repo
git fetch origin main "+pull/996/head:refs/heads/pr-996"

let exe = if $nu.os-info.name == "windows" { "ouch.exe" } else { "ouch" }

mkdir ../bin

git switch --detach origin/main
cargo build --locked
cp (["target", "debug", $exe] | path join) (["..", "bin", $"ouch-main-($exe)"] | path join)

git switch --detach pr-996
cargo build --locked
cp (["target", "debug", $exe] | path join) (["..", "bin", $"ouch-pr996-($exe)"] | path join)

cd ..

let main_bin = ([$work, "bin", $"ouch-main-($exe)"] | path join)
let pr_bin = ([$work, "bin", $"ouch-pr996-($exe)"] | path join)

mkdir fixtures/src
"single contents" | save -f fixtures/src/single.txt
"same basename contents" | save -f fixtures/src/same
"plain gzip contents" | save -f fixtures/src/plain.txt
mkdir fixtures/src/wrapped
"wrapped contents" | save -f fixtures/src/wrapped/file.txt

^$main_bin --yes c fixtures/src/single.txt fixtures/single.zip
^$main_bin --yes c fixtures/src/same fixtures/same.zip
^$main_bin --yes c fixtures/src/wrapped fixtures/wrapped.zip
^$main_bin --yes c fixtures/src/plain.txt fixtures/plain.txt.gz

def snapshot [dir] {
  let prev = $env.PWD
  cd $dir
  let rows = (
    ls -a **/*
    | each { |it|
      let name = ($it.name | str replace -a "\\" "/")
      if $it.type == dir { $"($name)/" } else { $name }
    }
    | sort
  )
  cd $prev
  $rows
}

def run-one [bin branch case] {
  let dir = (["runs", $branch, $case.name] | path join)
  rm -rf $dir
  mkdir $dir
  cp $case.archive $dir

  if $case.name == "merge-existing-output-dir" {
    mkdir ($dir | path join "single")
    "pre-existing" | save -f (($dir | path join "single" | path join "marker.txt"))
  }

  let prev = $env.PWD
  cd $dir
  ^$bin --yes d ($case.archive | path basename) ...$case.args
  rm -f ($case.archive | path basename)
  cd $prev

  { branch: $branch, case: $case.name, tree: (snapshot $dir | str join ", ") }
}

def run-all [bin branch] {
  let cases = [
    { name: "default-single-file-archive", archive: "fixtures/single.zip", args: [] }
    { name: "default-same-basename-file", archive: "fixtures/same.zip", args: [] }
    { name: "default-duplicate-root-dir", archive: "fixtures/wrapped.zip", args: [] }
    { name: "merge-existing-output-dir", archive: "fixtures/single.zip", args: [] }
    { name: "explicit-dir-dot", archive: "fixtures/single.zip", args: ["--dir", "."] }
    { name: "explicit-here", archive: "fixtures/single.zip", args: ["--here"] }
    { name: "non-archive-gzip", archive: "fixtures/plain.txt.gz", args: [] }
  ]

  $cases | each { |case| run-one $bin $branch $case }
}

(run-all $main_bin "main") ++ (run-all $pr_bin "pr-996") | table -e

The output:

╭────┬────────┬─────────────────────────────┬───────────────────────────────────────────────╮
│  # │ branch │            case             │                     tree                      │
├────┼────────┼─────────────────────────────┼───────────────────────────────────────────────┤
│  0 │ main   │ default-single-file-archive │ single/, single/single.txt                    │
│  1 │ main   │ default-same-basename-file  │ same                                          │
│  2 │ main   │ default-duplicate-root-dir  │ wrapped/, wrapped/file.txt                    │
│  3 │ main   │ merge-existing-output-dir   │ single/, single/marker.txt, single/single.txt │
│  4 │ main   │ explicit-dir-dot            │ single.txt                                    │
│  5 │ main   │ explicit-here               │ single.txt                                    │
│  6 │ main   │ non-archive-gzip            │ plain.txt                                     │
│  7 │ pr-996 │ default-single-file-archive │ single/, single/single.txt                    │
│  8 │ pr-996 │ default-same-basename-file  │ same/, same/same                              │
│  9 │ pr-996 │ default-duplicate-root-dir  │ wrapped/, wrapped/file.txt                    │
│ 10 │ pr-996 │ merge-existing-output-dir   │ single/, single/marker.txt, single/single.txt │
│ 11 │ pr-996 │ explicit-dir-dot            │ single.txt                                    │
│ 12 │ pr-996 │ explicit-here               │ single.txt                                    │
│ 13 │ pr-996 │ non-archive-gzip            │ plain.txt                                     │
╰────┴────────┴─────────────────────────────┴───────────────────────────────────────────────╯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants