The mitoz tools group_seq_by_gene command

To group the gene sequences of different samples into different files by genes.

$ mitoz-tools  group_seq_by_gene -h
usage: mitoz-tools group_seq_by_gene [-h] [-r <file>] [-d <str>] [-p <str>] [-clean_header]

To group the gene sequences of different samples into different files by genes.

Please cite:
Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation
and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173

optional arguments:
  -h, --help     show this help message and exit
  -r <file>      the gene file list. Per-line format: Abbreviation geneFilePath. The abbreviation will be added
                 to the seqid to indicate different samples.
  -d <str>       the delimiter between the abbreviation and the seqid [;]
  -p <str>       the prefix of all result files [MitoZ]
  -clean_header  Only shows the 'Abbreviation' in the sequence header [False]

Usage

Prepare a file (e.g. called gene_f_list) whose content looks like this:

DM01 DM01/DM01.result/DM01.DM01.megahit.mitogenome.fa.result/DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta
DM02 DM02/DM02.result/DM02.DM02.megahit.mitogenome.fa.result/DM02_DM02.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta

For content format (per line) is:

sampleID /path/to/the/fasta_file

The sampleID （the first column) will be added to the beginning of the sequencing title of the resulting files.

The second column is the path to the fasta format files, which can be any of them:

-rw-rw-r-- 1 gmeng  17K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.gene.fasta
-rw-rw-r-- 1 gmeng  12K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.cds.fasta
-rw-rw-r-- 1 gmeng 2.6K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.trna.fasta
-rw-rw-r-- 1 gmeng 2.7K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.rrna.fasta
-rw-rw-r-- 1 gmeng  17K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.fasta
-rw-rw-r-- 1 gmeng 4.3K Jun 29 05:54 DM01_DM01.megahit.mitogenome.fa_mitoscaf.fa.gbf.cds_translation.fasta

Then execute:

$ mitoz-tools  group_seq_by_gene -r gene_f_list -d '_' -p MitoZ

We got:

$ ls -lh
-rw-rw-r-- 1 gmeng gmeng   42 Jul  8 17:24 gene_f_list
-rw-rw-r-- 1 gmeng gmeng 1.5K Jul  8 17:36 MitoZ.gene-ATP6.fa
-rw-rw-r-- 1 gmeng gmeng  406 Jul  8 17:36 MitoZ.gene-ATP8.fa
-rw-rw-r-- 1 gmeng gmeng 3.2K Jul  8 17:36 MitoZ.gene-COX1.fa
-rw-rw-r-- 1 gmeng gmeng 1.5K Jul  8 17:36 MitoZ.gene-COX2.fa
-rw-rw-r-- 1 gmeng gmeng 1.6K Jul  8 17:36 MitoZ.gene-COX3.fa
-rw-rw-r-- 1 gmeng gmeng 2.4K Jul  8 17:36 MitoZ.gene-CYTB.fa
-rw-rw-r-- 1 gmeng gmeng 3.4K Jul  8 17:36 MitoZ.gene-l-rRNA.fa
-rw-rw-r-- 1 gmeng gmeng 2.0K Jul  8 17:36 MitoZ.gene-ND1.fa
-rw-rw-r-- 1 gmeng gmeng 2.2K Jul  8 17:36 MitoZ.gene-ND2.fa
-rw-rw-r-- 1 gmeng gmeng  680 Jul  8 17:36 MitoZ.gene-ND3.fa
-rw-rw-r-- 1 gmeng gmeng 2.8K Jul  8 17:36 MitoZ.gene-ND4.fa
-rw-rw-r-- 1 gmeng gmeng  668 Jul  8 17:36 MitoZ.gene-ND4L.fa
-rw-rw-r-- 1 gmeng gmeng 3.7K Jul  8 17:36 MitoZ.gene-ND5.fa
-rw-rw-r-- 1 gmeng gmeng 1.1K Jul  8 17:36 MitoZ.gene-ND6.fa
-rw-rw-r-- 1 gmeng gmeng 2.0K Jul  8 17:36 MitoZ.gene-s-rRNA.fa
-rw-rw-r-- 1 gmeng gmeng  216 Jul  8 17:36 MitoZ.gene-trnA(ugc).fa
-rw-rw-r-- 1 gmeng gmeng  212 Jul  8 17:36 MitoZ.gene-trnC(gca).fa
-rw-rw-r-- 1 gmeng gmeng  218 Jul  8 17:36 MitoZ.gene-trnD(guc).fa
-rw-rw-r-- 1 gmeng gmeng  218 Jul  8 17:36 MitoZ.gene-trnE(uuc).fa
-rw-rw-r-- 1 gmeng gmeng  216 Jul  8 17:36 MitoZ.gene-trnF(gaa).fa
-rw-rw-r-- 1 gmeng gmeng  214 Jul  8 17:36 MitoZ.gene-trnG(ucc).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnH(gug).fa
-rw-rw-r-- 1 gmeng gmeng  214 Jul  8 17:36 MitoZ.gene-trnI(gau).fa
-rw-rw-r-- 1 gmeng gmeng  224 Jul  8 17:36 MitoZ.gene-trnK(uuu).fa
-rw-rw-r-- 1 gmeng gmeng  226 Jul  8 17:36 MitoZ.gene-trnL(uaa).fa
-rw-rw-r-- 1 gmeng gmeng  228 Jul  8 17:36 MitoZ.gene-trnL(uag).fa
-rw-rw-r-- 1 gmeng gmeng  218 Jul  8 17:36 MitoZ.gene-trnM(cau).fa
-rw-rw-r-- 1 gmeng gmeng  224 Jul  8 17:36 MitoZ.gene-trnN(guu).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnP(ugg).fa
-rw-rw-r-- 1 gmeng gmeng  220 Jul  8 17:36 MitoZ.gene-trnQ(uug).fa
-rw-rw-r-- 1 gmeng gmeng  220 Jul  8 17:36 MitoZ.gene-trnR(ucg).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnS(gcu).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnS(uga).fa
-rw-rw-r-- 1 gmeng gmeng  226 Jul  8 17:36 MitoZ.gene-trnT(ugu).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnV(uac).fa
-rw-rw-r-- 1 gmeng gmeng  222 Jul  8 17:36 MitoZ.gene-trnW(uca).fa
-rw-rw-r-- 1 gmeng gmeng  216 Jul  8 17:36 MitoZ.gene-trnY(gua).fa

$ grep '>' MitoZ.gene-COX1.fa
>DM01_COX1;len=1557;[2925:4482](-)
>DM02_COX1;len=1557;[2925:4482](-)

You can change the -p to any other string, say, your project ID.

You can also change the delimiter of the sequence title to other strings, for example, I don't want the DM01 being connected to the COX1:

$ mitoz-tools  group_seq_by_gene -r gene_f_list -d ' ' -p MitoZ

$ grep '>' MitoZ.gene-COX1.fa
>DM01 COX1;len=1557;[2925:4482](-)
>DM02 COX1;len=1557;[2925:4482](-)

If you want a clean sequence header:

$ mitoz-tools  group_seq_by_gene -r gene_f_list -p MitoZ -clean_header

$ grep '>' MitoZ.gene-COX1.fa
>DM01
>DM02

Now you can use the MitoZ.gene-*.fa files for subsequent analysis, e.g. to perform multiple sequence alignment with the MAFFT program.

About:

Commands:

Usages:

MitoZ-tools:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The mitoz tools group_seq_by_gene command

Usage

Clone this wiki locally