Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPC job specifications #49

Closed
marymcelroy opened this issue Feb 3, 2023 · 4 comments
Closed

HPC job specifications #49

marymcelroy opened this issue Feb 3, 2023 · 4 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@marymcelroy
Copy link

Hi Haris! I'd like to use PEMA on some metabarcoding data for my graduate work. I successfully installed the Singularity image on my university's HPC environment, but I was hoping for some advice about how to estimate the HPC resources I would need to run PEMA in my job script (we use Slurm). Specifically, do you have any guidance for the #SBATCH specifications and values I should use?

For context, I have 96 samples that were PE sequenced for COI, 18S, and 16S amplicons (euk eDNA metabarcoding) on an Illumina MiSeq, so I have 576 fastq files as my raw sequencing data. I would like to use a custom ref db for COI, so I will follow your instructions about training the RDP classifier (I know this will likely affect computational load). Thank you!

@cpavloud
Copy link
Collaborator

cpavloud commented Feb 5, 2023

Hi @marymcelroy

Your question depends on the available partitions/resources in your HPC.
The more cores you assign, the faster your job will be completed.

For example, in the Zorbas HPC, I would normally use the batch partition (1 node and 20 cores).
I would say that, if I were to run e.g. a job for 96 samples and 16S rRNA using the #SBATCH specifications and the parameters I normally use, it would take more or less a day.
(You will need to run 3 separate jobs, since you have 3 genes and thus, you will have 3 different parameters files).

Μy shell script would be something like

#!/bin/bash -l

#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --mem=40G
#SBATCH --job-name="my_pema_job"
#SBATCH --output=my_pema_job.output
#SBATCH --requeue

module purge # unloads all previous loads

module load singularity/3.7.1 #loads singularity

singularity run -B /home1/christina/the_directory_where_the_mydata_folder_and_the_parameters_are/:/mnt/analysis /home1/christina/pema_v.2.1.4.sif

module unload singularity/3.7.1 #unloads singularity

@hariszaf
Copy link
Owner

hariszaf commented Feb 6, 2023

Hi @marymcelroy.

@cpavloud is right. Just a few more comments from my side:

  • a good practice is to first run only a few samples of a marker to tune your parameters and estimate the required time for your whole dataset
  • for the case of COI and your ref db, you can first check if you can train the RDPClassifier in the sandbox, meaning once you build your sandbox you can try to run the commands of step 2 in there and build your new ref db under the
    /tools/RDPTools/TRAIN/ path on the image. If you can do this, then you won't have to train again and again the classifier every time you run your analysis, but you will be able to set the custom_ref_db as No and give the name of your db through the name_of_custom_db parameter.
    If this note here is making things more confusing, please ignore!

In any case, if I had to guess, I would say that you wouldn't need more than 3 days time for any of your analysis no matter what parameters you ask for. A step that can really take some time if you have a great number of otus/asvs, is the getNCBITaxId; I suggest to have this as No at least until you have your first results.

Good luck! 💯

@marymcelroy
Copy link
Author

Thank you both very much for the suggestions!

@hariszaf hariszaf added help wanted Extra attention is needed question Further information is requested labels Feb 6, 2023
@hariszaf
Copy link
Owner

@marymcelroy I now close this issue and in case you d like to give us any feedback, please feel free to open a "new discussion".

Thanks again for your interest on PEMA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants