An automatic web tool for phylogenetic inference of bilaterian orthogroups under purposeful taxonomic sampling
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
images
tarfiles
README.md

README.md

ORTHOSCOPE

Our web servise is available from http://orthoscope.jp


Mode

mode


Case studies in Inoue and Satoh (submitted).

Query seqeunces from genes with known function.

Actinopterygii Vertebrata Deuterostomia Protostomia
PLCB1* ALDH1A* Brachyury Brachyury
Queries Queries Queries Queries
Result Result Result Result

Query sequence collectoin from assemble database*

  1. Download Coregonus lavaretus TSA file (GFIG00000000.1) form NCBI.
  2. Translate raw sequences into amino acid and coding sequences using TransDecoder.
./TransDecoder.LongOrfs -t GFIG01.1.fsa_nt
  1. Make blast databases using BLAST+.
makeblastdb -in longest_orfs.pep -dbtype prot -parse_seqids 
makeblastdb -in longest_orfs.cds -dbtype nucl -parse_seqids
  1. BLASTP seaech against amino acid database.
blastp -query query.txt -db longest_orfs.pep -num_alignments 10 -evalue 1e-12 -out 010_out.txt
  1. Retrieve blast top hit sequences from coding sequence file using sequence id.
blastdbcmd -db longest_orfs.cds -dbtype nucl -entry_batch queryIDs.txt -out 020_out.txt


Focal group

analisis group


Upload files

Coding sequence

file format

Case 1: Query seqeunce is present in the ORTHOSCOPE database

registered sequence search

Case 2: Query seqeunce is not present in the ORTHOSCOPE database

unregistered sequence search


Species tree hypothesis

Our hypothetical species tree (newick) can be downloaded from here.

Phylogenetic relationships without references follow the NCBI Taxonomy Common Tree.

Newick formats can be modifed using TreeGraph2.

treegraph2


Sequence collection

sequence collection


Alignment

sequence alignment


Tree search

Dataset

codon mode


Rearrangement BS value threshold

branch rearrangement

NJ analysis is conducted using the software package Ape in R (coding) and FastME (amino acid). Rearrangement analysis is done using a method implemented in NOTUNG.


Genome taxon sampling

Feasibility of completion

Number of hits to report per genome Number of species
3 <50
5 <40
10 <30

Tree estimation of orthogroup members using additional sequences

The script is specialized for a Macintosh use. Windows users need some modifications. Example.

Installing Dependencies

Estimation of the small tree requires some dependencies to be installed and in the system path.

RAxML:

Available here: https://github.com/stamatak/standard-RAxML

Download the the latest release and extract it. Cd into the extracted directry (e.g., standard-RAxML-8.2.12), compile the PThreads version, and copy the executable to a directory in your system path, e.g.:

cd standard-RAxML-8.2.12
make -f Makefile.SSE3.PTHREADS.gcc
cp raxmlHPC-PTHREADS-SSE3 ~/bin

Add the directory containing the directory to your PATH variable. e.g.:

export PATH=$PATH:~/bin/

Mafft:

Available here: https://mafft.cbrc.jp/alignment/software/

After compilation, set your PATH following this site.

trimAL:

Available here: https://github.com/scapella/trimal Cd into trimAl/source, type make, and copy the executable.

make
cp trimal ~/bin

PAL2NAL:

Available here: http://www.bork.embl.de/pal2nal/#Download Change the permission of perl script and copy it.

chmod 755 pal2nal.pl
cp pal2nal.pl ~/bin

Ape in R:

R is availab here R. By installing R, rscript will be installed automatically. APE in R can be installed from the R console.

install.packages("ape")

Tree estimation

  1. Select an appropriate outgroup and orthogroup members and save 010_candidates_nucl.txt file. The outgroup sequence should be placed at the top of alignment. Additional sequences can be included.

query sequences

  1. Decompress 100_2ndTree.tar.gz file.
  2. cd into 100_2ndTree file.
  3. Run the pipeline.
./100_estimate2ndTree.py
  1. ML tree is saved in 200_RAxMLtree_Exc3rd.pdf automatically.

ML tree

Duplicated node estimation

Using Notung, duplicated nodes can be identified. Here, we will analyze the gene tree of orthogroup members.

  1. Double click the downloaded .jar file (here, Notung-2.9.jar).
  2. Save the species tree (newick format) as a new file (here, speciesTree.tre), from 000_summary.txt file.
  3. Open the species tree file, speciesTree.tre (File > Open Gene Tree), from Notung.
  4. Open the gene tree file, RAxML_bootstrap.txt (File > Open Gene Tree).
  5. Set "Edge Weight THreshold" (here 70) from “Edit Values button“. This value corresponds to “Rearrangement BS value threshold” in ORTHOSCOPE.
  6. From "Rearrange" tab in the bottum, select "Prefix of the general label".
  7. Push "Reconcile" button.
  8. Duplicated nodes are shown with "D".

Rearranged tree


Supported browsers

Chrome Firefox Safari IE
Supported Supported 11.0 or later Not supported

History

10 July 2018 Version 1.0 (Published in Inoue and Satoh under review).


Database

Available from here (10.5281/zenodo.1452077). 10 October 2018.

ORTHOSCOPE employs a genome-scale, protein-coding gene database (coding and amino acid sequence datasets) constructed for each species. In order to count numbers of orthologs in each species, only the longest sequence is used, when transcript variants exist for single locus.


Citation

Inoue J. and Satoh N. ORTHOSCOPE: an automatic web tool of analytical pipeline for ortholog identification using a species tree. in prep.


Previous versions:

Email: jun.inoue AT oist.jp