Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourmash panicked when it "Couldn't find End Of Central Directory Record" #3190

Closed
ccbaumler opened this issue Jun 5, 2024 · 4 comments
Closed

Comments

@ccbaumler
Copy link
Contributor

ccbaumler commented Jun 5, 2024

The command

While building the AllTheBacteria sourmash DB, I am using:

find /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/ -maxdepth 2 -type f -name "*.zip" -exec sh -c 'sourmash sig cat "$@" -o "$0" ' "../allthebacteria-r0.2.zip" {} +

This finds all the zip files nested in the path
The zip files found are placed into a bash array and used in the execution of sourmash sig cat

The error

The error produced:

sourmash.exceptions.Panic: sourmash panicked: thread 'unnamed' panicked with 'called `Result::unwrap()` on an `Err` value: InvalidArchive("Couldn't find End Of Central Directory Record")' at src/core/src/storage.rs:358 

The investigation

I seen two possible errors immediately:

  1. There is an issue with one of the 665 sourmash databases I created from the AllTheBacteria tar.xz files
  2. There was not enough memory and the command ended.

Due to the random order when using the find command I do not know which file the error occurred on. Therefore, I have run two separate attempts to find a signature that replicates the error above:

find /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/ -maxdepth 2 -type f -name "*.zip" -exec sh -c 'sourmash sig summarize "$0" ' {} \; | awk '{print}' ORS='" ' 2>&1 | tee -a summarize.log

This command will find all the zip files and execute a sig summarize for each one found. The output is converted into a single line by defining the Output Record Separator to a '" '.
According to @ctb , summarize may only look at the manifest. "sig cat and sig describe load the sketches themselves"
There was no error found in the

I am currently running this command to investigate further:

find /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/ -maxdepth 2 -type f -name "*.zip" -exec sh -c 'echo "$0"; sourmash sig summarize "$0"; sourmash sig cat "$0" -q -o trash' {} \; 2>&1 | tee -a blah.log

I am also attempting to sig cat only one k size at a time instead of all three. In case it is a working memory error.

@ctb
Copy link
Contributor

ctb commented Jun 5, 2024

ah-hah! I am virtually positive that the error is from zip itself, so sig summarize should trigger it, as should a straight up unzip -v. You might look for a zero-size zip file.

It may also be that sig summarize is handling the error properly while sig cat is not.

I'll have to think about ways to track this down and/or better handle this kind of error. Thanks for reporting!

@ccbaumler
Copy link
Contributor Author

The final command I listed worked like a charm. Took some time to run through all 700 files, but I was easily able to find the culprit by searching the log file created.

While each of the commands @ctb listed return a similar error, unzip -v did so the fastest.

sig summarize

sourmash sig summarize /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip
Error message
== This is sourmash version 4.8.5. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

** loading from '/group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip'
Traceback (most recent call last):
  File "/home/baumlerc/miniforge3/envs/sourmash/bin/sourmash", line 11, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/__main__.py", line 19, in main
    retval = mainmethod(args)
             ^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/cli/sig/fileinfo.py", line 46, in main
    return sourmash.sig.__main__.fileinfo(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/sig/__main__.py", line 1274, in fileinfo
    idx = sourmash_args.load_file_as_index(args.path,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 65, in load_file_as_index
    return _load_database(filename, yield_all_files)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 113, in _load_database
    db = load_fn(filename,
         ^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 216, in _load_zipfile
    db = ZipFileLinearIndex.load(filename,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/index/__init__.py", line 586, in load
    storage = ZipStorage(location)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/sbt_storage.py", line 107, in __init__
    self._objptr = rustcall(lib.zipstorage_new, to_bytes(path), len(path))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/utils.py", line 78, in rustcall
    raise exc
sourmash.exceptions.Panic: sourmash panicked: thread 'unnamed' panicked with 'called `Result::unwrap()` on an `Err` value: InvalidArchive("Couldn't find End Of Central Directory Record")' at src/core/src/storage.rs:358
0.70user 1.22system 0:04.13elapsed 46%CPU (0avgtext+0avgdata 565248maxresident)k
967504inputs+8outputs (7118major+33529minor)pagefaults 0swaps

sig cat

sourmash sig cat /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip -o delet-me
Error Message
== This is sourmash version 4.8.5. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

Traceback (most recent call last):
  File "/home/baumlerc/miniforge3/envs/sourmash/bin/sourmash", line 11, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/__main__.py", line 19, in main
    retval = mainmethod(args)
             ^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/cli/sig/cat.py", line 58, in main
    return sourmash.sig.__main__.cat(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/sig/__main__.py", line 130, in cat
    for ss, sigloc in loader:
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/sourmash_args.py", line 642, in load_many_signatures
    idx = load_file_as_index(loc, yield_all_files=yield_all_files)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 65, in load_file_as_index
    return _load_database(filename, yield_all_files)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 113, in _load_database
    db = load_fn(filename,
         ^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/save_load.py", line 216, in _load_zipfile
    db = ZipFileLinearIndex.load(filename,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/index/__init__.py", line 586, in load
    storage = ZipStorage(location)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/sbt_storage.py", line 107, in __init__
    self._objptr = rustcall(lib.zipstorage_new, to_bytes(path), len(path))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/baumlerc/miniforge3/envs/sourmash/lib/python3.12/site-packages/sourmash/utils.py", line 78, in rustcall
    raise exc
sourmash.exceptions.Panic: sourmash panicked: thread 'unnamed' panicked with 'called `Result::unwrap()` on an `Err` value: InvalidArchive("Couldn't find End Of Central Directory Record")' at src/core/src/storage.rs:358
0.79user 1.53system 0:04.34elapsed 53%CPU (0avgtext+0avgdata 563200maxresident)k
968704inputs+8outputs (7151major+33477minor)pagefaults 0swaps

unzip -v

unzip -v  /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip -d delete-me/
caution:  not extracting; -d ignored
Archive:  /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip or
        /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip.zip, and cannot find /group/ctbrowngrp4/2024-ccbaumler-allthebacteria/allthebacteria-r0.2-sigs/unknown__06/unknown__06.zip.ZIP, period.
0.00user 0.00system 0:00.00elapsed 37%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+354minor)pagefaults 0swaps

@ctb
Copy link
Contributor

ctb commented Jun 16, 2024

OK, so this error is triggered by faulty zip files. Maybe we should be returning a better error when the zip file is faulty 🤔

@ctb
Copy link
Contributor

ctb commented Jun 16, 2024

punting to #3213

@ctb ctb closed this as completed Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants