Skip to content

06_PROFILES

eolesin edited this page Jun 1, 2021 · 5 revisions

Profiling completed on SAGA

  1. I created a template shell script to base all the rest on: profile_temp.sh
#!/usr/bin/bash
# every job must be accounted for
#SBATCH --account=nn9836k
#SBATCH --job-name=<samp>_to_<assembly>

# every job requires some specification of the number of cores to be used
#SBATCH --ntasks=1
# every job requires some specification of the memory (RAM) it needs
#SBATCH --cpus-per-task=10
#SBATCH --mem-per-cpu=5G
# every job requires a runtime limit
#SBATCH --time=4:00:00

# setting up software environment
module purge

# load the conda version
module load Miniconda3/4.9.2

# Set the ${PS1} (needed in the source of the Anaconda environment)
export PS1=\$

# Source the conda environment setup
# The variable ${EBROOTANACONDA3} or ${EBROOTMINICONDA3}
# So use one of the following lines
# comes with the module load command
# source ${EBROOTANACONDA3}/etc/profile.d/conda.sh
source ${EBROOTMINICONDA3}/etc/profile.d/conda.sh

# Deactivate any spill-over environment from the login node
conda deactivate &>/dev/null

# Activate the environment by using the full path (not name)
# to the environment. The full path is listed if you do
# conda info --envs at the command prompt.
conda activate /cluster/projects/nn9836k/conda_envs/anvio

# set up paths
CONTIGDB_PATH="/cluster/projects/nn9836k/Metagenomics_AMOR_2020/04_CONTIGS"
COMAP_PATH="/cluster/projects/nn9836k/Metagenomics_AMOR_2020/05_COMAPPING"
PROFILE_PATH="/cluster/projects/nn9836k/Metagenomics_AMOR_2020/06_PROFILES"

# Profile individual sample to assembly
anvi-profile -c ${CONTIGDB_PATH}/<assembly>/<assembly>.prefixed.contigs.db \
            -i ${COMAP_PATH}/<assembly>/<samp>.bam \
            --skip-SNV-profiling \
            --num-threads 10 \
            -o ${PROFILE_PATH}/<assembly>/<samp>_to_<assembly>
  1. Create subdirectories to hold all the profiles
for i in `cat AMOR_2020_Good`; 
  do mkdir ${i};
done
  1. Next I populate subfolders with the child bash scripts based on the template
./Make_child_scripts_indivprofiles.sh

script code:

#!/bin/bash
### This script creates individual sbatch files for each sample to run

cd 06_PROFILES

# Not very graceful, but making a template for each assembly first from the main template.
assembly=$(echo "$(cat AMOR_2020_Good)")

declare -a ASsmb_array=($(echo $assembly));         # create assembly name array

# replace the placeholder in the template file with the names you want for each sample.
for assembly in "${ASsmb_array[@]}";
    do sed "s/<assembly>/${assembly}/g" "profile_temp.sh" > "${assembly}/profile_${assembly}_temp.sh";
done


# Then move on and create the final job scripts
samples=$(echo "$(cat AMOR_2020_Good)");
declare -a Smp_array=($(echo $samples));         # create sample name array


# Replace sample name placeholder in the assembly template file in nested loop.
for assembly in "${ASsmb_array[@]}"; do
# replace the placeholder in the template file with the names you want for each sample.
    for samp in "${Smp_array[@]}";
        do sed "s/<samp>/${samp}/g" "${assembly}/profile_${assembly}_temp.sh" > "${assembly}/profile_${samp}_to_${assembly}.sh";
    done;
    rm "${assembly}/profile_${assembly}_temp.sh";
done

  1. Then run the script that recursively submits the jobs.I decided to divide the list of samples into 5 parts and run them with fewer at once, just to know that if I needed to adjust any parameters, I would be wasting fewer resources.
./Submit_slurms_indivprofile.sh

script code:

#!/bin/bash
### This script submits sbatch scripts, one for each sample

cd 06_PROFILES/

assembly=$(echo "$(cat AMOR_2020_Good)") # I split this into Part1, 2, 3, 4, 5 when performing.
samples=$(echo "$(cat AMOR_2020_Good)");

declare -a ASsmb_array=($(echo $assembly));
declare -a Smp_array=($(echo $samples));

for assembly in "${ASsmb_array[@]}"; do
# replace the placeholder in the template file with the names you want for each sample.
    for samp in "${Smp_array[@]}";
        do sbatch "${assembly}/profile_${samp}_to_${assembly}.sh";
    done;
done

Clone this wiki locally