-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Welcome to the GenEra
wiki! Here we detail how to install, run and explore the additional options with GenEra
for sensitive inference of gene age.
GenEra
is an easy-to-use and highly customizable command-line tool that estimates gene-family founder events (i.e., the age of the last common ancestor of protein-coding gene families) through the reimplementation of genomic phylostratigraphy (Domazet-Lošo et al., 2007). GenEra
takes advantage of DIAMOND’s speed and sensitivity to search for homolog genes throughout the entire NR database, and combines these results with the NCBI Taxonomy to assign an origination date for each gene and gene family in a query species. GenEra
can also incorporate protein data from external sources to enrich the analysis, it can search for proteins within nucleotide data (i.e., genome/transcriptome assemblies) using MMseqs2 to improve the classification of orphan genes, and it calculates a taxonomic representativeness score to assess the reliability of assigning a gene to a specific age. Additionally, GenEra
can calculate homology detection failure probabilities using abSENSE to help distinguish fast-evolving genes from high-confidence gene-family founder events.
- As of v1.3.0, users can now detect gene ages on taxonomic levels below species, such as between different strains or subspecies that do not have a Taxonomy ID on the NCBI.
- As of v1.2.0,
GenEra
was adapted to run completely offline! - As of v1.1.0, users can now use Foldseek to search protein structural predictions against the AlphaFold DB for fast and sensitive structural alignments. Alternatively, the user can choose to perform a reassessment of gene ages by running JackHMMER on top of DIAMOND (be aware, that this additional step significantly slows down the analysis).
GenEra requires the following software dependencies:
- DIAMOND v2.0.0 or higher
- Foldseek v3.915ef7d or higher
- NCBItax2lin
- MCL
- MMseqs2 (optional for protein-against-nucleotide sequence search)
- abSENSE (optional to calculate homology detection failure probabilities)
- NumPy and SciPy (needed to run abSENSE in step 4)
- R alongside the libraries optparse, Bio3D, Tidyverse, and SeqinR (optional for JackHMMER reassessment)
- Phytools (optional for infraspecies-level analyses)
- OrthoFinder (optional for the automatic inference of evolutionary relationships between strains/subspecies).
Additionally, GenEra requires access to the taxonomy dump from the NCBI and either a locally installed NR database for DIAMOND or a locally installed AlphaFold database for Foldseek.
GenEra
has now been published.
Barrera-Redondo, J., Lotharukpong, J.S., Drost, H.G., Coelho, S.M. (2023). Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biology, 24, 54. https://doi.org/10.1186/s13059-023-02895-z
GenEra
makes use of several dependencies that should also be cited, if implemented within the pipeline. Please see the Citations page.
A highly contiguous genome assembly reveals sources of genomic novelty in the symbiotic fungus Rhizophagus irregularis
Bethan F Manley, Jaruwatana S Lotharukpong, Josué Barrera-Redondo, Theo Llewellyn, Gokalp Yildirir, Jana Sperschneider, Nicolas Corradi, Uta Paszkowski, Eric A Miska, Alexandra Dallaire
G3 Genes|Genomes|Genetics 2023, Volume 13, Issue 6, jkad077;
doi: https://doi.org/10.1093/g3journal/jkad077
pLM-BLAST – distant homology detection based on direct comparison of sequence representations from protein language models
Kamil Kaminski, Jan Ludwiczak, Vikram Alva, Stanislaw Dunin-Horkawicz
bioRxiv 2022.11.24.517862;
doi: https://doi.org/10.1101/2022.11.24.517862
Single-cell atlases of two lophotrochozoan larvae highlight their complex evolutionary histories
Laura Piovani, Daniel J. Leite, Luis Alfonso Yañez Guerra, Fraser Simpson, Jacob M. Musser, Irepan Salvador-Martínez, Ferdinand Marlétaz, Gáspár Jékely, Maximilian J. Telford
Science Advances 2023, Volume 9, Issue 31, eadg6034;
doi: https://doi.org/10.1126/sciadv.adg6034
Genome evolution in plants and the origins of innovation
James W. Clark
New Phytologist 2023, early view;
doi: https://doi.org/10.1111/nph.19242