Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

1.1. Analysis Pipeline (FastDTLmapper)

moshi edited this page Oct 28, 2021 · 4 revisions

Analysis Pipeline

In this page, explain FastDTLmapper analysis pipeline in detail.

1. Grouping ortholog sequences using OrthoFinder

Input:

Species genomic protein CDS fasta files
(00_user_data/fasta/*.fa)

Output:

OrthoFinder ortholog group results
(01_orthofinder/*)

Method:

Grouping ortholog sequences using OrthoFinder.

Parameter(OrthoFinder):

MCL inflation = 1.5 (default)

2. Align each OG(Ortholog Group) sequences using mafft

Input:

Each OG fasta files
(02_dtl_reconciliation/OGXXXXXXX/OGXXXXXXX.fa)

Output:

Each OG aligned fasta files
(02_dtl_reconciliation/OGXXXXXXX/OGXXXXXXX_aln.fa)

Method:

Align each OG sequences using mafft.

Parameter(mafft):

--auto --anysymbol

3. Trim each OG alignment using trimal

Input:

Each OG aligned fasta files
(02_dtl_reconciliation/OGXXXXXXX/OGXXXXXXX_aln.fa)

Output:

Trim each OG aligned sequences files
(02_dtl_reconciliation/OGXXXXXXX/OGXXXXXXX_aln_trim.fa)

Method:

Trim each OG aligned sequences using trimal.
If trimal make all gap sequences in trim process, use non-trim fasta file instead in next step.

Parameter(trimal):

-automated1

4. Reconstruct each OG gene tree using IQ-TREE

Input:

Each OG trim-aligned fasta files
(02_dtl_reconciliation/OGXXXXXXX/OGXXXXXXX_aln_trim.fa)

Output:

Reconstruct each OG gene tree results
(02_dtl_reconciliation/OGXXXXXXX/iqtree/*)

Method:

Reconstruct each OG gene tree using IQ-TREE.

If OG gene number >= 4:

Generate ultra-fast bootstrap gene trees with 1000 replicates.

If OG gene number == 3:

IQ-TREE cannot generate 3 genes tree.
Make simple unrooted 3 gene tree like "(gene1:0.1, gene2:0.1, gene3:0.1);"

If OG gene number < 3:

Do nothing.

Parameter(IQ-TREE):

-m TEST -mset JTT,WAG,LG
--ufboot 1000 --boot-trees --wbtl

5. Correct each OG gene tree multifurcation using Treerecs

Input:

Each OG ultra-fast bootstrap gene tree file
(02_dtl_reconciliation/OGXXXXXXX/iqtree/OGXXXXXXX.ufboot)
Species tree file
(00_user_data/tree/ultrametric_nodeid_tree.nwk)

Output:

Each OG multifurcation corrected bootstrap gene tree file (02_dtl_reconciliation/OGXXXXXXX/treerecs/OGXXXXXXX_multifurcate.ufboot_recs.nwk)

Method:

Correct each OG bootstrap trees multifurcation using Treerecs.
Number of corrected bootstrap tree is limited to 100, in order to reduce AnGST DTL reconciliation large computational cost in next step.

ℹ️ TIPS
IQ-TREE randomly bifurcate topology of identical sequences and it increases DTL cost calculation.
In order to minimize DTL cost in multifurcation, Treerecs correction is useful.

Parameter(Treerecs):

--dupcost 2 --losscost 1 (default)

Treerecs multifurcated gene tree correction example

treerecs_resolve_multifurcation_example.png

6. DTL reconciliation of species tree & each OG gene tree using AnGST

Input:

Each OG multifurcation corrected bootstrap gene tree file (02_dtl_reconciliation/OGXXXXXXX/treerecs/OGXXXXXXX_multifurcate.ufboot_recs.nwk)
Species tree file
(00_user_data/tree/ultrametric_nodeid_tree.nwk)

Output:

Each OG DTL reconciliation result files
(02_dtl_reconciliation/OGXXXXXXX/angst/*)

Method:

DTL reconciliation of species tree & each OG bootstrap gene trees using AnGST.

ℹ️ TIPS
AnGST estimates DTL events when the number of genes is three or more.
DTL events when the number of genes is less than three are estimated in the next data aggregation step.

Parameter(AnGST):

dup_cost = 2, los_cost = 1, trn_cost = 3 (default)
timetree option = off (default)

7. Aggregate and map genome-wide DTL reconciliation result

Input:

Each OG DTL reconciliation result files
(02_dtl_reconciliation/OGXXXXXXX/angst/*)

Output:

Aggregated and mapped genome-wide DTL event result
(03_aggregate_map_result/*)

Method:

Aggregate and map genome-wide DTL reconciliation result from AnGST results.

If OG gene number >= 3:

Extract DTL event information from AnGST result file.

If OG gene number == 2:

Estimates DTL events by self-implemented method like AnGST.
DTL cost of three case scenario is calculated and lowest DTL cost scenario is selected.

Case1: Newly born in one leaf node and then duplication occur.
Case2: Newly born in one internal node and then speciation and loss occur.
Case3: Newly born in one leaf node and then transfer to another leaf node occur.

If OG gene number == 1:

Treat target gene as newly born in gene's species node.