Skip to content

Parameters to change the resulting profiles

Alessio Milanese edited this page Aug 12, 2019 · 7 revisions

There are many parameters that affect the way the profile is printed. The following list of parameters can be used with motus profile. Check this page for more information on the printed profile.

Print the result as counts [-c]

As default we print the result as relative abundance. With -c is possible to print the result as counts. For example, the result of motus profile -s test1_single.fastq -n test2 -c is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -c | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test2 -c
#consensus_taxonomy	test2
Kandleria vitulina [ref_mOTU_v2_0001]	36
Methyloversatilis universalis [ref_mOTU_v2_0002]	0
Megasphaera genomosp. [ref_mOTU_v2_0003]	12

Change the taxonomy level [-k]

With -k is possible to change the taxonomy level. For example, the result of motus profile -s test1_single.fastq -n test3 -k class is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k class -g 3 | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test3 -k class
#consensus_taxonomy	test3
Mamiellophyceae	0.0000000000
Chthonomonadetes	0.0557659685
Cyanobacteria	0.0090374928

Add the NCBI taxonomy id [-p]

With -p you add the NCBI taxonomy id to the profile. Hence you will have 3 columns now, where the second one is the NCBI id. For example, the result of motus profile -s test1_single.fastq -n test4 -p is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -p | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test4 -p
#consensus_taxonomy	NCBI_tax_id	test4
Kandleria vitulina [ref_mOTU_v2_0001]	1630	0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002]	378211	0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003]	699192	0.0234955832

Print the full rank taxonomy [-q]

With -q you can print the full rank taxonomy (up to the one selected with -k). For example, the result of motus profile -s test1_single.fastq -n test5 -k class -q is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k class -g 3 | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test5 -k class -q
#consensus_taxonomy	test5
k__Archaea|p__Nanoarchaeota|c__Nanoarchaeota class incertae sedis	0.0235010600
k__Archaea|p__Crenarchaeota|c__Thermoprotei	0.0005106200
k__Archaea|p__Crenarchaeota|c__Crenarchaeota class incertae sedis [YNPFFA]	0.0000000000

If you add -p you will get the full rank of NCBI taxonomy ids. Calling motus profile -s test1_single.fastq -n test6 -k class -q -p will produce:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k class -g 3 -p | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test6 -k class -q -p
#consensus_taxonomy	NCBI_tax_id	test6
k__Archaea|p__Nanoarchaeota|c__Nanoarchaeota class incertae sedis	2157|192989|NA	0.0235010600
k__Archaea|p__Crenarchaeota|c__Thermoprotei	2157|28889|183924	0.0005106200
k__Archaea|p__Crenarchaeota|c__Crenarchaeota class incertae sedis [YNPFFA]	2157|28889|NA	0.0000000000

CAMI (BioBoxes) format [-C]

You can print the result in BioBoxes format with -C. Note that the mOTUs species definition and the NCBI species definition is not always congruent. As a result, you can decide three methods to save the result in CAMI format: "precision", where the discrepancies are deleted; "recall", where the relative abundances of the discrepancies are split and "parenthesis" where all the discrepancies are kept.

Check in the following examples what happen to the species Pseudomonas sp. GM67 (NCBI tax id:1144335) and Pseudomonas sp. GM60 (NCBI tax id:1144334) that in the mOTUs clustering are classified as belonging to the same species.

Calling motus profile -s test1_single.fastq -n test7 -C parenthesis produces:

# Taxonomic Profiling Output
# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -C parenthesis | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test7 -C parenthesis

@SampleID: test7
@Version:0.9.1
@Ranks:superkingdom|phylum|class|order|family|genus|species
@TaxonomyID: Sep 16 2015
@@TAXID	RANK	TAXPATH	TAXPATHSN	PERCENTAGE
2	superkingdom	2	Bacteria	100.0
...
28221	class	2|1224|28221	Bacteria|Proteobacteria|Deltaproteobacteria	2.20702
...
34029	species	2|1224|28216|80840||88|34029	Bacteria|Proteobacteria|Betaproteobacteria|Burkholderiales||Leptothrix|Leptothrix cholodnii	0.04191
(1144335/1144334)	species	2|1224|1236|72274|135621|286|(1144335/1144334)	Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Pseudomonadaceae|Pseudomonas|(Pseudomonas sp. GM67/Pseudomonas sp. GM60)	2.4000

Calling motus profile -s test1_single.fastq -n test7 -C precision produces:

...
@@TAXID	RANK	TAXPATH	TAXPATHSN	PERCENTAGE
2	superkingdom	2	Bacteria	100.0
...
28221	class	2|1224|28221	Bacteria|Proteobacteria|Deltaproteobacteria	2.20702
...
34029	species	2|1224|28216|80840||88|34029	Bacteria|Proteobacteria|Betaproteobacteria|Burkholderiales||Leptothrix|Leptothrix cholodnii	0.04191

Calling motus profile -s test1_single.fastq -n test7 -C recall produces:

...
@@TAXID	RANK	TAXPATH	TAXPATHSN	PERCENTAGE
2	superkingdom	2	Bacteria	100.0
...
28221	class	2|1224|28221	Bacteria|Proteobacteria|Deltaproteobacteria	2.20702
...
34029	species	2|1224|28216|80840||88|34029	Bacteria|Proteobacteria|Betaproteobacteria|Burkholderiales||Leptothrix|Leptothrix cholodnii	0.04191
1144335	species	2|1224|1236|72274|135621|286|(1144335/1144334)	Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Pseudomonadaceae|Pseudomonas|(Pseudomonas sp. GM67/Pseudomonas sp. GM60)	1.2000
1144334	species	2|1224|1236|72274|135621|286|(1144335/1144334)	Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Pseudomonadaceae|Pseudomonas|(Pseudomonas sp. GM67/Pseudomonas sp. GM60)	1.2000

BIOM format [-B]

You can print the result in BIOM format version 1.0 with -B. Calling motus profile -s test1_single.fastq -n test8 -B will produce a JSON file:

{
    "id": "test8",
    "format": "Biological Observation Matrix 1.0.0",
    "format_url": "http://biom-format.org",
    "type": "OTU table",
    "generated_by": "motus v2.0.0",
    "date": "2018-06-13T14:55:00",
    "rows":[
            {"id":"ref_mOTU_v2_0001", "metadata":{"name":"Kandleria vitulina",
                                       "NCBI_id":"1630"}},
            {"id":"ref_mOTU_v2_0002", "metadata":{"name":"Methyloversatilis universalis",
                                       "NCBI_id":"378211"}},

Print full species name [-u]

With -u is possible to print the full name of the mOTUs. For example, the result of motus profile -s test1_single.fastq -n test9 -u is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -u | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test9 -u
#mOTU	consensus_taxonomy	test9
...
ref_mOTU_v2_0033	Streptococcus mitis	0.0304056010
ref_mOTU_v2_0034	Escherichia albertii	0.0000394910
ref_mOTU_v2_0035	Escherichia sp. [C KTE11/KTE52/KTE96/KTE159/TW09308]	0.0037182409

Note that now we have 3 columns. The second column is the full name of the species. The result for mOTUs 35 without -u is

Escherichia sp. [ref_mOTU_v2_0035]	0.0037182409

For visualization purposes we print a shorter version of the full name. In NCBI these five genomes (KTE11/KTE52/KTE96/KTE159/TW09308) are classified as Escherichia sp. (only genus level information). With mOTUs these five genomes are clustered and recognize that they belong to the same species.

Print only ref_mOTUs [-e]

With -e is possible to print only the ref_mOTUs, all the meta_mOTUs will be added to -1. For example, the result of motus profile -s test1_single.fastq -n test10 -e is:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 -e | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test10 -e
#consensus_taxonomy	test10
Kandleria vitulina [ref_mOTU_v2_0001]	0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002]	0.0000000000
...
Thermoproteus uzoniensis [ref_mOTU_v2_5304]	0.0000000000
Paenibacillus sp. [ref_mOTU_v2_5305]	0.0030541740
-1	0.5016916385

Print all taxonomy levels together [-A]

With -A is possible to print all levels together. It produces the same result as calling -q at all 7 taxonomic levels. For example, the result of motus profile -s test1_single.fastq -n test11 -A is:

# git tag version 2.5.0 |  motus version 2.5.0 | map_tax 2.5.0 | gene database: nr2.5.0 | calc_mgc 2.5.0 -y insert.scaled_counts -l 75 | calc_motu 2.5.0 -k mOTU -g 3 -A | taxonomy: ref_mOTU_2.5.0 meta_mOTU_2.5.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test11 -A
#mOTUs2_clade	test11
k__Bacteria	1.0000000000
k__Bacteria|p__Proteobacteria	0.0783503519
k__Bacteria|p__Firmicutes	0.6122416678
k__Bacteria|p__Thermodesulfobacteria	0.0061668664
k__Bacteria|p__Actinobacteria	0.2441684405
...
k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Kandleria|s__Kandleria vitulina [ref_mOTU_v25_04327]	0.0763622813
k__Bacteria|p__Thermodesulfobacteria|c__Thermodesulfobacteria|o__Thermodesulfobacteriales|f__Thermodesulfobacteriaceae|g__Thermodesulfobacterium|s__Thermodesulfobacterium commune [ref_mOTU_v25_05094]	0.0061668664
k__Bacteria|p__Firmicutes|c__Firmicutes class incertae sedis|o__Firmicutes order incertae sedis|f__Firmicutes fam. incertae sedis|g__Firmicutes gen. incertae sedis|s__Firmicutes species incertae sedis [meta_mOTU_v25_13597]	0.1128345600