Skip to content

jsuurvali/abce152

Repository files navigation

Plant ABCE proteins and their copy numbers

This repository covers phylogenetic and regression analysis of 152 ABCE proteins from 76 plant species.

The full study is published in Frontiers in Genetics at https://doi.org/10.3389/fgene.2024.1408665.

In addition to the analysis scripts themselves, the repository also contains source data (alignments and annotations) and output trees.

Source data

  • fasta/152 CDS FULL_correct.fas
    • unaligned CDS sequences
  • fasta/152_pepseq_MUSCLE.fa
    • protein sequences, pre-aligned with MUSCLE
  • ABCE_infosheet.tsv
    • genome parameters and NCBI taxID for each species in the study
  • annotation_v2.tsv
    • annotations for each sequence used to construct phylogenies. Can be loaded into FigTree with the output trees from phylogenetic analysis

Analysis scripts

  • prepare_alignments.sh
    • aligns CDS sequences with mafft.
    • Uses trimal to exclude sites with > 10% gaps
  • abce_iqtree.sh
    • creates sets of 20 CDS trees and 20 protein (pep) trees by running IQ-TREE with random seed values from 1 to 20 and otherwise identical parameters.
  • abce_models_needskey.R

Output trees

  • trees/CDStrees.originals.seed1-seed20.nwk and trees/peptrees.originals.seed1-seed20.nwk
    • Newick-formatted output trees from running abce_iqtree.sh, concatenated into two files with 20 trees in them (one for proteins, one for CDS). Trees are in the order of seed values used (1-20).
    • Note that in Figtree the trees can be supplemented with annotations from the file annotation_v2.tsv
  • trees/CDStrees.modified.figtree.nex and trees/peptrees.modified.figtree.nex
    • Same 20+20 independently obtained trees, but saved with annotations as Figtree-openable .nex files, with the following modifications:
      • Trees are in the order of their associated likelihood values (highest to lowest)
      • All trees are rooted on the phylum Chlorophyta.
      • Branches supported by less than 50% of bootstrap replicates are collapsed.
      • Major phylogenetic clades are highlighted as on figures of the manuscript.

Software versions used for the manuscript

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published