Skip to content

Ancestral flowering plant genomes reconstruction

Notifications You must be signed in to change notification settings

jin-repo/RACCROCHE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RACCROCHE

N|Solid

Build Status

Overview

Given the phylogenetic relationships of several extant species, the reconstruction of their ancestral genomes at the gene and chromosome level is made difficult by the cycles of whole genome doubling followed by fractionation in plant lineages. Fractionation scrambles the gene adjacencies that enable existing reconstruction methods. We propose an alternative approach that postpones the selection of gene adjacencies for reconstructing small ancestral segments and instead accumulates a very large number of syntenically validated candidate adjacencies to produce long ancestral contigs through maximum weight matching. Likewise, we do not construct chromosomes by successively piecing together contigs into larger segments, but instead count all contig co-occurrences on the input genomes and cluster these, so that chromosomal assemblies of contigs all emerge naturally ordered at each ancestral node of the phylogeny. These strategies result in substantially more complete reconstructions than existing methods. We deploy a number of quality measures: contig lengths, continuity of contig structure on successive ancestors, coverage of the reconstruction on the input genomes, and rearrangement implications of the chromosomal structures obtained. The reconstructed ancestors can be functionally annotated and are visualized by painting the ancestral projections on the descendant genomes, and by highlighting syntenic ancestor-descendant relationships. Our methods can be applied to genomes drawn from a broad range of clades or orders.

There are four major modules including seven steps in the RACCROCHE pipeline. Scripts in the pipeline are organized according to the modules as depicted in the diagram of program architechture and file structure.

Module 1: construct gene families and list candidate adjacencies

  • Step 1: Pre-process gene families
  • Step 2: List generalized adjacencies
  • Step 3: List candidate adjacencies

Module 2: construct ancestral contigs by Maximum Weight Matching

  • Step 4: Construct contigs by maximum weight matching

Module 3: match contigs, cluster and sort ancestral chromosomes, paint extant genomes with ancestral synteny blocks

  • Step 5: Match synteny blocks between ancestral genome and extant genomes
  • Step 6: Cluster ancestral contigs into ancestral chromosomes based on contig co-occurrence
  • Step 7: Paint the extant genomes according to the ancestral chromosomes

More details can be found in the diagram of module 3 program architechture and file structure.

Module 4: compare ancestors with extant genomes by MCScanX

  • Step 8: Adapt MCScanX to match ancestral genomes with extant genomes
  • Step 9: Measures of Quality

See the manual for how to install and use the pipeline.

In addition to the pipeline, we also provide our project data (under the "project-monocots" directory) on six genomes of monocot orders, confirming the tetraploidization event “tau” in the stem lineage between the alismatids and the lilioids.

Authors Qiaoji Xu (QiaojiXu)
Lingling Jin (LinglingJin)
Chunfang Zheng
James H. Leeben-Mack
David Sankoff
Emails limqiaojixu@gmail.com
lingling.jin@cs.usask.ca
License BSD

Citations:

About

Ancestral flowering plant genomes reconstruction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages