-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Sequencing data has arrived from NSC (hooray!), and now it's time to get started looking at what we got out of it. My process will be on SAGA because we have a lot of extra CPU hours that I asked for because of my IDBA assembly on the 2019 Chimneys.
The sequencing centre provides individual FASTQC reports from every sample, but this is somewhat tough to navigate individually. So we will run MultiQC to see what the results were in context of all the samples. We will probably have to split up the files for multiple multiqc reports.
To get the MultiQC program, the easiest way I found was installing through conda (tried Easybuild modules and python virtualenv but ran into permissions problems and config issues): # Make a conda environment if you don't have one conda create --name YourEnvName source YourEnvName/bin/activate # You'll have to close out the shell and open a new one for this to take effect.
# Activate and install multiqc
conda activate YourEnvName
conda install -c bioconda multiqc
In SAGA we must make scripts for each job we run as the login node cannot really process anything. I created a script with the following content, called MultiQC.sh in the Metagenomics_AMOR_2020 folder in our project (/cluster/projects/nn9836k/).
touch MutliQC.sh # Make the file
vim MultiQC.sh
And we add the following to the script doc: #!/usr/bin/bash #SBATCH --account=nn9836k #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 # every job requires some specification of the memory (RAM) it needs #SBATCH --mem-per-cpu=5GB # every job requires a runtime limit #SBATCH --time=48:00:00 #SBATCH --job-name=QC_1
#Set up job environment
# Load modules
module purge
module load Anaconda3/2019.03
# Activate the conda environment
conda activate YourEnvName
# Scan directories for data to report on
multiqc TheMoon/
Run the script and check the status:
sbatch MultiQC.sh # run
squeue -u $USER # status
In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)