-
Notifications
You must be signed in to change notification settings - Fork 39
Batch processing of many samples
Guanliang MENG edited this page Jun 2, 2022
·
7 revisions
Say, you have raw data files that look like this:
$ ls /abspath/to/fastq/sampleID*.fq.gz
/abspath/to/fastq/sampleID_1.1.fq.gz
/abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz
/abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz
/abspath/to/fastq/sampleID_3.2.fq.gz
then you can do:
$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID
$ ls /abspath/to/fastq/sampleID*.fq.gz | awk 'NR%2{printf "%s ",$0;next;}1' > sample_fq.list
$ cat sample_fq.list
/abspath/to/fastq/sampleID_1.1.fq.gz /abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz /abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz /abspath/to/fastq/sampleID_3.2.fq.gz
$ cat sample_fq.list | perl -ne '
chomp;
my @a=split /\s+/; # to split the path of fq1 and fq2 into array @a
my $b=(split /\//, $a[0])[-1]; # to get the basename of fq1, e.g. "sampleID_1.1.fq.gz"
my $sample=(split /\./, $b)[0]; # to extract the sample ID, e.g. ""sampleID_1". You might need to change this.
mkdir $sample;
chdir $sample;
`echo "mitoz all --fq1 $a[0] --fq2 $a[1] --outprefix $sample --thread_number 8 --clade Chordata --genetic_code 2 --insert_size 250 --fastq_read_length 150 --assembler mitoassemble --kmers 51,71,91 --requiring_taxa Chordata" > mitoz.sh`;
`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ;
chdir "../" ; '
You can make your jobs run on specific nodes, or exclude some nodes.
# to run on only these nodes
qsub -cwd -l vf=100g -l h='(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh
# or exclude these nodes
qsub -cwd -l vf=100g -l h='!(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh
If you want to annotate
a lot of samples, you can provide many fasta files to the --fastafiles
option of the mitoz annotate
command, e.g.
--fastafiles /abspath/to/mitogenome/sampleID_1.fasta /abspath/to/mitogenome/sampleID_2.fasta
see https://github.com/linzhi2013/MitoZ/wiki/The-'annotate'-subcommand.
Or you can do similar things as the above:
Say, you have fasta files that look like this:
$ ls /abspath/to/fastq/sampleID*.fasta
/abspath/to/mitogenome/sampleID_1.fasta
/abspath/to/mitogenome/sampleID_2.fasta
/abspath/to/mitogenome/sampleID_3.fasta
then you can do:
$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID
$ ls /abspath/to/fastq/sampleID*.fasta > fasta.list
$ cat fasta.list | perl -ne '
chomp;
my $b=(split /\//, $_)[-1]; # to get the basename, e.g. "sampleID_1.fasta"
my $sample=(split /\./, $b)[0]; # to get the sample ID, e.g. "sampleID_1". You may need to change this.
mkdir $sample;
chdir $sample;
`echo "mitoz annotate --outprefix $sample --fastafiles $_ --thread_number 8 --clade Chordata --requiring_taxa Chordata" > mitoz.sh`;
`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ;
chdir "../" ; '
About:
Commands:
- The -all- subcommand
- The -filter- subcommand
- The -assemble- subcommand
- The -findmitoscaf- subcommand
- The -annotate- subcommand
- The -visualize- subcommand
Usages:
- Installation
- Tutorial
- Extending MitoZ-s database
- Batch processing of many samples
- Known issues
- FAQ
- Some important intermediate files
- Upload to GenBank
MitoZ-tools:
- Overview: The -mitoz tools- command
- The -mitoz-tools--group_seq_by_gene- command
- The -mitoz tools bold_identification- command
- The -mitoz tools circle_check- command
- The -mitoz tools gbfiletool- command
- The -mitoz tools gbseqextractor- command
- The -mitoz tools msaconverter- command
- The -mitoz tools taxonomy_ranks- command