Skip to content

Batch processing of many samples

Guanliang MENG edited this page Jun 2, 2022 · 7 revisions

Say, you have raw data files that look like this:

$ ls /abspath/to/fastq/sampleID*.fq.gz

/abspath/to/fastq/sampleID_1.1.fq.gz
/abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz
/abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz
/abspath/to/fastq/sampleID_3.2.fq.gz

then you can do:

$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID

$ ls /abspath/to/fastq/sampleID*.fq.gz | awk 'NR%2{printf "%s ",$0;next;}1' > sample_fq.list

$ cat sample_fq.list
/abspath/to/fastq/sampleID_1.1.fq.gz /abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz /abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz /abspath/to/fastq/sampleID_3.2.fq.gz

$ cat sample_fq.list | perl -ne '
chomp; 
my @a=split /\s+/; # to split the path of fq1 and fq2 into array @a
my $b=(split /\//, $a[0])[-1];  # to get the basename of fq1, e.g. "sampleID_1.1.fq.gz"
my $sample=(split /\./, $b)[0];  # to extract the sample ID, e.g. ""sampleID_1". You might need to change this.

mkdir $sample; 
chdir $sample;

`echo "mitoz all --fq1 $a[0] --fq2 $a[1] --outprefix $sample --thread_number 8 --clade Chordata --genetic_code 2 --insert_size 250 --fastq_read_length 150 --assembler mitoassemble --kmers 51,71,91 --requiring_taxa Chordata" > mitoz.sh`;

`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ; 

chdir "../" ; '

About the qsub command

You can make your jobs run on specific nodes, or exclude some nodes.

# to run on only these nodes
qsub -cwd -l vf=100g -l h='(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh
# or exclude these nodes
qsub -cwd -l vf=100g -l h='!(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh

Batch annotation

If you want to annotate a lot of samples, you can provide many fasta files to the --fastafiles option of the mitoz annotate command, e.g.

--fastafiles /abspath/to/mitogenome/sampleID_1.fasta /abspath/to/mitogenome/sampleID_2.fasta

see https://github.com/linzhi2013/MitoZ/wiki/The-'annotate'-subcommand.

Or you can do similar things as the above:

Say, you have fasta files that look like this:

$ ls /abspath/to/fastq/sampleID*.fasta

/abspath/to/mitogenome/sampleID_1.fasta
/abspath/to/mitogenome/sampleID_2.fasta
/abspath/to/mitogenome/sampleID_3.fasta

then you can do:

$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID

$ ls /abspath/to/fastq/sampleID*.fasta > fasta.list

$ cat fasta.list | perl -ne '
chomp; 
my $b=(split /\//, $_)[-1]; # to get the basename, e.g. "sampleID_1.fasta"
my $sample=(split /\./, $b)[0]; # to get the sample ID, e.g. "sampleID_1". You may need to change this.

mkdir $sample; 
chdir $sample;

`echo "mitoz annotate --outprefix $sample --fastafiles $_ --thread_number 8 --clade Chordata --requiring_taxa Chordata" > mitoz.sh`;

`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ; 

chdir "../" ; '
Clone this wiki locally