Error running PARAFAC decomposition #7

MiguelCos · 2021-03-10T09:52:07Z

Hello Filip,

We manage to install the CANDIA singularity container on a DENBI Ubuntu server with 2 cuda-able GPUs.

We are still not able to make the test command ./candia test/test_experiment/config/candia.yaml run through completely.

Actually, it throws an error at the stage of PARAFAC decomposition. The previous steps of the processing seem to run through. The error is persistent even if I execute the commands separately for each stage.

Something like:

Running PARAFAC decomposition...
CANDIA: 2 GPUs found. Dividing input slices into 2 partitions.
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_0_20210302172404.log
CANDIA: Output saved to test/test_experiment/logs/decompose_partition_1_20210302172404.log
done.
Indexing all PARAFAC models and components...
scripts/parafac/models.py:123: YAMLLoadWarning:

calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.

[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():54:    Wrote model index
[2021-03-02 17:24:07] [PID 93949] INFO: models.py:main():58:    Wrote spectrum index
done.
Selecting best models
[2021-03-02 17:24:12] [PID 94478] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F12.pt
[2021-03-02 17:24:12] [PID 94477] WARNING:      collect_time_mode_values.py:get_model_time_mode_peak_counts():60:      Could not load model test/test_experiment/samples/scans_csv_slices/swath_lower_adjusted=623.00/rt_window=0.0/parafac_model_F10.pt
...
Traceback (most recent call last):
  File "scripts/parafac/collect_time_mode_values.py", line 113, in <module>
    main()
  File "scripts/parafac/collect_time_mode_values.py", line 45, in main
    model_peak_count = pd.concat(model_peak_count, ignore_index=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
    sort=sort,
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I am attaching here the log of the execution including the results of the previous steps and the complete error.

error_report_CANDIA.txt

Do you think there's something we might be missing in the installation? What would you suggest to trouble-shoot this?

Many thanks in advance for taking a look into this.

Best wishes,
Miguel

The text was updated successfully, but these errors were encountered:

fburic · 2021-03-10T11:06:09Z

Hi Miguel,

Thank you for a detailed report!

The decomposition itself seems to have worked and what failed were the downstream time mode evaluation.
I think this is due to my improper test data (too small to generate meaningful results), but the log messages are confusing.

To see if the decomposition itself ran, could you please check the logs for each partition (i.e. batch) of decompositions:

test/test_experiment/logs/decompose_partition_0_20210302172404.log and
test/test_experiment/logs/decompose_partition_1_20210302172404.log ?

You can also search for model files (with the *.pt extension), by running the command below, which will also list how large these files are:

find test/test_experiment/samples/scans_csv_slices -name '*.pt' -type f | xargs du -h

I think you can actually try CANDIA with real data but of course I should provide better test data.
Sorry for this.

Actions on my part:

Improve test data to cover entire workflow Improve test data to cover entire workflow #8
Improve logging and error reporting Improve logging and error reporting #9

Best regards,
Filip

MiguelCos · 2021-03-10T11:50:37Z

Hello Filip,

Thanks for the observations. I will take a look into this and would let you know.

I have an additional question regarding the potential outputs from CANDIA, to see if I am understanding its usability correctly and test accordingly.

After PARAFAC decomposition, should I expect to have a set of mzML files that can be then used for searching with traditional search engines? If so, where would I find those 'decomposed' spectral files?

Let me know if you would prefer me to open a new issue with this question.

Best wishes,
Miguel

fburic · 2021-03-10T12:50:02Z

Hi again Miguel,

The output of the pipeline is a single mzXML file which contains decomposed spectra. This can be used for searching with existing search engines. If you need a file in mzML format, packages like OpenMS can be used to convert.

The sequence leading to this output is as follows, keeping the numbering in the README:

5: all models in the configured range are produced through decomposition (i.e. the _F*.pt files)
6-7: the best models are selected using the time modes of the models (this is step I think is failing with the test data)
8: the mzXML with spectra from the best models is produced and searched using Crux or MS-GF+

The fact that the script for step 8 both exports the mzXML and performs the search reflects the workflow of the paper, but I think it should be split, for flexibility of use. This is pretty simple to do, and I opened an issue for it #10 . For now, if you wish to use different engines, a workaround is to simply run the script to produce the output but expect it to fail when it tries running Crux.

Best,
Filip

MiguelCos · 2021-03-26T10:02:23Z

Hello Filip,

Many thanks for your input!

We have tested candia with our own data and it runs nicely until step 9.

I am having an error when executing step 10.

ubuntu@deepdia:~/software/candia$ SINGULARITYENV_PREPEND_PATH=$HOME/software/diann singularity exec candia.sif     snakemake -p -s scripts/quantification/diann.Snakefile --configfile ${configfile}
KeyError in line 8 of /home/ubuntu/software/candia/scripts/quantification/diann.Snakefile:
'diann_library'
  File "/home/ubuntu/software/candia/scripts/quantification/diann.Snakefile", line 8, in <module>

I modified the sample yaml file to include diann_out_dir and diann_report but I get the same error.

I am attaching the yaml configuration that I am using.

Many thanks in advance for taking a look!

candia.txt

Best,
Miguel

fburic · 2021-03-29T09:22:14Z

Hi Miguel,

My pleasure, happy that it works for you so far!

For step 10, it looks like a bug. The diann_library config parameter is not needed (used when running DIA-NN in library-free mode), but the script tries to read it anyway.

I will fix this asap, thought this week is a bit packed. I opened issue #12 for this.

Until then, a workaround is to add diann_library: "results/diann/dummy.tsv" to prevent the script from crashing. (The dummy file won't be created.)

Best,
Filip

fburic self-assigned this Mar 10, 2021

fburic added the priority: high label Mar 29, 2021

fburic closed this as completed Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running PARAFAC decomposition #7

Error running PARAFAC decomposition #7

MiguelCos commented Mar 10, 2021

fburic commented Mar 10, 2021 •

edited

Loading

MiguelCos commented Mar 10, 2021

fburic commented Mar 10, 2021 •

edited

Loading

MiguelCos commented Mar 26, 2021

fburic commented Mar 29, 2021

Error running PARAFAC decomposition #7

Error running PARAFAC decomposition #7

Comments

MiguelCos commented Mar 10, 2021

fburic commented Mar 10, 2021 • edited Loading

MiguelCos commented Mar 10, 2021

fburic commented Mar 10, 2021 • edited Loading

MiguelCos commented Mar 26, 2021

fburic commented Mar 29, 2021

fburic commented Mar 10, 2021 •

edited

Loading

fburic commented Mar 10, 2021 •

edited

Loading