Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running PARAFAC decomposition #7

Closed
MiguelCos opened this issue Mar 10, 2021 · 5 comments
Closed

Error running PARAFAC decomposition #7

MiguelCos opened this issue Mar 10, 2021 · 5 comments
Assignees

Comments

@MiguelCos
Copy link

Hello Filip,

We manage to install the CANDIA singularity container on a DENBI Ubuntu server with 2 cuda-able GPUs.

We are still not able to make the test command ./candia test/test_experiment/config/candia.yaml run through completely.

Actually, it throws an error at the stage of PARAFAC decomposition. The previous steps of the processing seem to run through. The error is persistent even if I execute the commands separately for each stage.

Something like:

Running PARAFAC decomposition...
CANDIA: 2 GPUs found. Dividing input slices into 2 partitions.
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_0_20210302172404.log
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_1_20210302172404.log
done.
Indexing all PARAFAC models and components...
scripts/parafac/models.py:123: YAMLLoadWarning:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.

[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():54:    Wrote model index
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():58:    Wrote spectrum index
done.
Selecting best models
[2021-03-02 17:24:12] [PID 94478] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F12.pt
[2021-03-02 17:24:12] [PID 94477] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F10.pt
...
Traceback (most recent call last):
  File "scripts/parafac/collect_time_mode_values.py", line 113, in <module>
    main()
  File "scripts/parafac/collect_time_mode_values.py", line 45, in main
    model_peak_count = pd.concat(model_peak_count, ignore_index=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
    sort=sort,
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I am attaching here the log of the execution including the results of the previous steps and the complete error.

error_report_CANDIA.txt

Do you think there's something we might be missing in the installation? What would you suggest to trouble-shoot this?

Many thanks in advance for taking a look into this.

Best wishes,
Miguel

@fburic fburic self-assigned this Mar 10, 2021
@fburic
Copy link
Owner

fburic commented Mar 10, 2021

Hi Miguel,

Thank you for a detailed report!

The decomposition itself seems to have worked and what failed were the downstream time mode evaluation.
I think this is due to my improper test data (too small to generate meaningful results), but the log messages are confusing.

To see if the decomposition itself ran, could you please check the logs for each partition (i.e. batch) of decompositions:

  • test/test_experiment/logs/decompose_partition_0_20210302172404.log and
  • test/test_experiment/logs/decompose_partition_1_20210302172404.log ?

You can also search for model files (with the *.pt extension), by running the command below, which will also list how large these files are:

find test/test_experiment/samples/scans_csv_slices -name '*.pt' -type f | xargs du -h

I think you can actually try CANDIA with real data but of course I should provide better test data.
Sorry for this.

Actions on my part:

Best regards,
Filip

@MiguelCos
Copy link
Author

Hello Filip,

Thanks for the observations. I will take a look into this and would let you know.

I have an additional question regarding the potential outputs from CANDIA, to see if I am understanding its usability correctly and test accordingly.

After PARAFAC decomposition, should I expect to have a set of mzML files that can be then used for searching with traditional search engines? If so, where would I find those 'decomposed' spectral files?

Let me know if you would prefer me to open a new issue with this question.

Best wishes,
Miguel

@fburic
Copy link
Owner

fburic commented Mar 10, 2021

Hi again Miguel,

The output of the pipeline is a single mzXML file which contains decomposed spectra. This can be used for searching with existing search engines. If you need a file in mzML format, packages like OpenMS can be used to convert.

The sequence leading to this output is as follows, keeping the numbering in the README:

  • 5: all models in the configured range are produced through decomposition (i.e. the _F*.pt files)
  • 6-7: the best models are selected using the time modes of the models (this is step I think is failing with the test data)
  • 8: the mzXML with spectra from the best models is produced and searched using Crux or MS-GF+

The fact that the script for step 8 both exports the mzXML and performs the search reflects the workflow of the paper, but I think it should be split, for flexibility of use. This is pretty simple to do, and I opened an issue for it #10 . For now, if you wish to use different engines, a workaround is to simply run the script to produce the output but expect it to fail when it tries running Crux.

Best,
Filip

@MiguelCos
Copy link
Author

Hello Filip,

Many thanks for your input!

We have tested candia with our own data and it runs nicely until step 9.

I am having an error when executing step 10.

ubuntu@deepdia:~/software/candia$ SINGULARITYENV_PREPEND_PATH=$HOME/software/diann singularity exec candia.sif     snakemake -p -s scripts/quantification/diann.Snakefile --configfile ${configfile}
KeyError in line 8 of /home/ubuntu/software/candia/scripts/quantification/diann.Snakefile:
'diann_library'
  File "/home/ubuntu/software/candia/scripts/quantification/diann.Snakefile", line 8, in <module>

I modified the sample yaml file to include diann_out_dir and diann_report but I get the same error.

I am attaching the yaml configuration that I am using.

Many thanks in advance for taking a look!

candia.txt

Best,
Miguel

@fburic
Copy link
Owner

fburic commented Mar 29, 2021

Hi Miguel,

My pleasure, happy that it works for you so far!

For step 10, it looks like a bug. The diann_library config parameter is not needed (used when running DIA-NN in library-free mode), but the script tries to read it anyway.

I will fix this asap, thought this week is a bit packed. I opened issue #12 for this.

Until then, a workaround is to add diann_library: "results/diann/dummy.tsv" to prevent the script from crashing. (The dummy file won't be created.)

Best,
Filip

@fburic fburic closed this as completed Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants