Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de novo sequencing without evaluation #10

Closed
BenSamy2020 opened this issue Mar 3, 2022 · 4 comments
Closed

de novo sequencing without evaluation #10

BenSamy2020 opened this issue Mar 3, 2022 · 4 comments

Comments

@BenSamy2020
Copy link

Greetings,

Firstly, I would like to thank you for providing this open source tool. I am currently interested to perform de novo sequencing without evaluation. Based on your github page, it was shown to run the command of:

casanovo --mode=denovo --model_path='path/to/pretrained' --test_data_path='path/to/test' --config_path='path/to/config' --output_path='path/to/output'

  1. Could you assist me on where to obtain the --config_path file?
  2. Does the --test_data_path mean the .raw proteomics file?
  3. Do you have an example output file from Casanovo?

Regards,
Ben

@melihyilmaz
Copy link
Collaborator

Hi Ben,
We appreciate your interest in Casanovo!

  1. The config file can be user provided but the default is the casanovo/config.py we provide in the repo. You can use it as a template for your own config file and provide the path to your file.
  2. test_data_path denotes the path to the directory where you have the .mgf file you want to sequence.
  3. I added an example output file casanovo_sample_output.csv to the repo.

Let me know if you have other questions, feel free to close the issue otherwise.

@BenSamy2020
Copy link
Author

BenSamy2020 commented Mar 4, 2022

Greeting @melihyilmaz,

This is actually a really amazing tool! I have successfully started the program. Now it is running. Based on the output file you had provided can I request for a program improvement feature? The improvement I would suggest is to allow a proteomics fasta database to be provided in the command itself. Subsequently, the program would match the denovo sequenced peptides onto the fasta protein database provided and append the fasta header to the corresponding peptides. Based on this it would be easier to know from which protein these peptides are derived from. Also if the denovo sequenced peptide is absent from the provided protein database, it should be labeled as missing. I understand this is a huge ask, but this enhancement would improve downstream analysis.

Additionally, I also observed the user warning of:

rank_zero_warn("You are running on single node with no parallelization, so distributed has no effect.")
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pytorch_lightning\trainer\data_loading.py:132: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 24 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Testing: 0it [00:00, ?it/s]

  1. Could you advise me on how to dedicate/allocate sufficient CPU for your program (e.g., --cpu 15). Unfortunately, the options of --cpu or --memory is not available.
  2. My computer has GPU (NVIDIA GeForce GTX 1660 SUPER), is there a way to access those using your program?
  3. Also after observing the above message I did not observed any progress for more than 15 mins. By any chance is the program stalled? Is there a way to access if the program is running in the background?

Regards,
Ben

@guhanrv
Copy link

guhanrv commented Mar 4, 2022

Hi @BenSamy2020,

I've been using the program recently and think I can help!

  1. You can adjust the number of CPUs in the casanovo/config.py file, which should be found in your /environment/lib/pythonversion/site-packages/casanovo/config.py file, on line 30.
  2. Yes. If you are able to run python3 from the command line, import torch, and type torch.cuda.is_available() and it returns True, that means your environment is configured to recognize your GPU, and so all you need to do is change line 31 in the same file as above to gpus = [0]. Then, when you run Casanovo, you should see GPU available: True, used: True instead.
  3. I think the GPU will help lots there. Also, check the test_batch_size (line 80 in the same config file) - it's by default set to 1024, so your screen will only update after inferring 1024 peptides. On CPU, that takes a while. So try changing that test batch size to something small and see if you see progress.

Hope this helps!

@BenSamy2020
Copy link
Author

Greetings @guhanrv,

I am really appreciative of your assistances. With regards to CPUs I will edit line 30 of config.py file.
Unfortunately, pytorch is not available on my PC and I would require to set it up. Additionally, I am a wet lab person. I will have to youtube or google some information on how to set it up before tapping onto my GPUs.

Once again thank alot!

Regards,
Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants