de novo sequencing without evaluation #10

BenSamy2020 · 2022-03-03T22:34:22Z

Greetings,

Firstly, I would like to thank you for providing this open source tool. I am currently interested to perform de novo sequencing without evaluation. Based on your github page, it was shown to run the command of:

casanovo --mode=denovo --model_path='path/to/pretrained' --test_data_path='path/to/test' --config_path='path/to/config' --output_path='path/to/output'

Could you assist me on where to obtain the --config_path file?
Does the --test_data_path mean the .raw proteomics file?
Do you have an example output file from Casanovo?

Regards,
Ben

melihyilmaz · 2022-03-03T23:35:45Z

Hi Ben,
We appreciate your interest in Casanovo!

The config file can be user provided but the default is the casanovo/config.py we provide in the repo. You can use it as a template for your own config file and provide the path to your file.
test_data_path denotes the path to the directory where you have the .mgf file you want to sequence.
I added an example output file casanovo_sample_output.csv to the repo.

Let me know if you have other questions, feel free to close the issue otherwise.

BenSamy2020 · 2022-03-04T02:39:41Z

Greeting @melihyilmaz,

This is actually a really amazing tool! I have successfully started the program. Now it is running. Based on the output file you had provided can I request for a program improvement feature? The improvement I would suggest is to allow a proteomics fasta database to be provided in the command itself. Subsequently, the program would match the denovo sequenced peptides onto the fasta protein database provided and append the fasta header to the corresponding peptides. Based on this it would be easier to know from which protein these peptides are derived from. Also if the denovo sequenced peptide is absent from the provided protein database, it should be labeled as missing. I understand this is a huge ask, but this enhancement would improve downstream analysis.

Additionally, I also observed the user warning of:

rank_zero_warn("You are running on single node with no parallelization, so distributed has no effect.")
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pytorch_lightning\trainer\data_loading.py:132: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 24 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Testing: 0it [00:00, ?it/s]

Could you advise me on how to dedicate/allocate sufficient CPU for your program (e.g., --cpu 15). Unfortunately, the options of --cpu or --memory is not available.
My computer has GPU (NVIDIA GeForce GTX 1660 SUPER), is there a way to access those using your program?
Also after observing the above message I did not observed any progress for more than 15 mins. By any chance is the program stalled? Is there a way to access if the program is running in the background?

Regards,
Ben

guhanrv · 2022-03-04T18:27:58Z

Hi @BenSamy2020,

I've been using the program recently and think I can help!

You can adjust the number of CPUs in the casanovo/config.py file, which should be found in your /environment/lib/pythonversion/site-packages/casanovo/config.py file, on line 30.
Yes. If you are able to run python3 from the command line, import torch, and type torch.cuda.is_available() and it returns True, that means your environment is configured to recognize your GPU, and so all you need to do is change line 31 in the same file as above to gpus = [0]. Then, when you run Casanovo, you should see GPU available: True, used: True instead.
I think the GPU will help lots there. Also, check the test_batch_size (line 80 in the same config file) - it's by default set to 1024, so your screen will only update after inferring 1024 peptides. On CPU, that takes a while. So try changing that test batch size to something small and see if you see progress.

Hope this helps!

BenSamy2020 · 2022-03-05T04:40:35Z

Greetings @guhanrv,

I am really appreciative of your assistances. With regards to CPUs I will edit line 30 of config.py file.
Unfortunately, pytorch is not available on my PC and I would require to set it up. Additionally, I am a wet lab person. I will have to youtube or google some information on how to set it up before tapping onto my GPUs.

Once again thank alot!

Regards,
Ben

melihyilmaz closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

de novo sequencing without evaluation #10

de novo sequencing without evaluation #10

BenSamy2020 commented Mar 3, 2022

melihyilmaz commented Mar 3, 2022

BenSamy2020 commented Mar 4, 2022 •

edited

Loading

guhanrv commented Mar 4, 2022

BenSamy2020 commented Mar 5, 2022

de novo sequencing without evaluation #10

de novo sequencing without evaluation #10

Comments

BenSamy2020 commented Mar 3, 2022

melihyilmaz commented Mar 3, 2022

BenSamy2020 commented Mar 4, 2022 • edited Loading

guhanrv commented Mar 4, 2022

BenSamy2020 commented Mar 5, 2022

BenSamy2020 commented Mar 4, 2022 •

edited

Loading