phyphy

ALERT! Several recent changes to the `HyPhy` software have precluded this developer's ability to maintain `phyphy` and ensure consistent functionality with the HyPhy software itself. As a consequence this package is only guaranteed to work with OLDER HyPhy versions (2.3.7 - 2.3.14). If you are using (which you likely are) HyPhy version >= 2.5, `phyphy` is unlikely to work properly for you. Unfortunately given some inconsistent and untrackable development changes in HyPhy itself which render a lot of features NOT backwards compatible, I cannot further maintain `phyphy`. I encourage all potential `phyphy` users to use HyPhy directly with the new `hyphy` executable.

This repository is therefore now archived.

The current release is version 0.4.3.

phyphy aims to facilitate HyPhy usage in two primary ways:

Execute standard analyses in a Python scripting environment
Conveniently parse various information from the resulting JSON output.

This functionality makes batch usage eminently more convenient. Never use the HyPhy prompt again!

phyphy is pronounced "feye-feye" and is so-named for "Python Hyphy". Importantly, this name was chosen for maximal pronunciation enjoyment. phyphy is compatible with either Python2.7 or Python3.

phyphy is compatible with HyPhy versions 2.3.7 - 2.3.14 and supports the following analyses (in alphabetical order):

FEL (Fixed Effects Likelihood): Infer pervasive selection at individual sites using maximum likelihood
MEME (Mixed Effects Model of Evolution): Infer episodic selection at individual sites
SLAC (Single-Likelihood Ancestor Counting): Infer pervasive selection at individual sites using a counting-based approach
FUBAR (Fast, Unconstrained Bayesian AppRoximation): Infer pervasive selection at individual sites using a Bayesian approach
aBSREL (adaptive Branch-Site Random Effects Likelihood): Branch-site model of lineage-specific selection
BUSTED (Branch-Site Unrestricted Statistical Test for Episodic Diversification): Branch-site model of whole-gene selection
RELAX (Test for selection RELAXation): Test for relaxed or intensified selection on a specified branch set
LEISR (Likelihood Estimation of Individual Site Rates): Infer relative evolutionary rates from protein or nucleotide data

Full API documentation, including some code examples, is available from https://sjspielman.github.io/phyphy.

Download the current release here, or clone the master branch, and cd in.
Build the package with python setup.py build (with a sudo as needed)
Optionally, run tests with python setup.py test (with a sudo as needed).
- NOTE: the test suite is python2.7 compatible only (do not attempt with a python3 interpretter, or some tests will fail).
Install per your own adventure:

To specify a different install directory, add the argument --prefix=/path/to/my/favorite/directory to the install line.
- To install as root: sudo python setup.py install
- To install for user only: python setup.py install --user

Using phyphy

Overview

phyphy has three primary modules:

Hyphy
- This module can be optionally used to specify that a non-canonical installation (i.e. not installed to /usr/local/) or a local build (i.e. where make install was not run) of HyPhy. This module can additionally be used to specify that HyPhy be run quietly and/or without outputting its standard log files, messages.log and errors.log.
Analysis
- This module contains the analysis methods to execute, named according to the analysis. For example, the FEL class within the Analysis module would be used to execute an FEL analysis.
- Unless a custom HyPhy object is given, assumes the executable HYPHYMP in usr/local/bin and a HyPhy library in /usr/local/lib/hyphy (these are the default make install) locations
Extractor
- This module contains functionality to parse a given analysis output. Extractor makes it simple to extract information from a given analysis JSON output, including:
  - Fitted model parameters
  - Fitted phylogenies
  - Newick-extended format phylogenies with branch features for downstream visualization in tools like the Python package ete3 or the R package ggtree
  - CSV files, for methods FEL, MEME, SLAC, FUBAR, LEISR, aBSREL

Examples

Defining HyPhy instances

Full API documentation is here.

Most use cases are shown here:

import phyphy
	
## Use canonical install, but suppress creation of messages.log and errors.log
myhyphy = phyphy.HyPhy(suppress_log = True)

## Use canonical install, but suppress markdown output	
myhyphy = phyphy.HyPhy(quiet = True)	

## Use a **local build**
myhyphy = phyphy.HyPhy(build_path = /path/to/build/of/hyphy/)
		
## Use a **local install**
myhyphy = phyphy.HyPhy(build_path = /path/to/install/of/hyphy/)

## Specify that <=3 processes should be used by the default executable
myhyphy = phyphy.HyPhy(CPU=3)

## Specify that the MPI executable should be used, with the launcher mpirun and the given mpirun arguments (32 processes)
myhyphy = phyphy.HyPhy(executable   = "HYPHYMPI",
                       mpi_launcher = "mpirun",
                       mpi_options  = "-np 32")

## Once defined,the HyPhy instance can be passed to an analysis, for example this default FEL inference (see next section for details):
myfel = FEL(hyphy = myhyphy, data = "/path/to/data.nex", nexus=True)
myfel.run_analysis()

Executing HyPhy Analyses

Possible analyses to define include the following:

ABSREL
BUSTED
FEL
FUBAR
LEISR
MEME
RELAX
SLAC

Running an analysis proceeds in two steps:

Define an analysis instance (e.g. x=FEL(..args..) would define an FEL analysis)
Execute the analysis with the method .run_analysis() (e.g. x.run_analysis())

Full API documentation is here. Each analysis method is documented with examples and the available optional arguments. Importantly, while each analysis will have its own optional arguments, at a mininum, you must provide path(s) to input data. There are two mutually exclusive (but one is necessary) strategies for this:

When alignment and tree are in a single file, use the single keyword argument data.
When alignment and tree are in separate files, use the two respective keyword arguments alignment and tree.

Note that RELAX requires one additional argument, at a minimum: the label in the tree corresponding to test branches.

Briefly, here is how one might define and run some FEL analyses:

import phyphy

### Run a FEL analysis, for example, with data in a single file.
### When providing a NEXUS file to the data argument, include the argument `nexus=True` (in version >=0.4.2)
myfel = phyphy.FEL(data = "path/to/data.nex", nexus=True)
myfel.run_analysis()

### Run a FEL analysis, for example, with data in two files
myfel = phyphy.FEL(alignment = "path/to/aln.fasta", tree = "path/to/tree/tree.txt")
myfel.run_analysis()

### Run a FEL analysis, specifying the vertebrate mtDNA genetic code (NIH code #2)
myfel = phyphy.FEL(data = "path/to/data.nex", genetic_code = 2, nexus=True)
myfel.run_analysis()	

### Run a FEL analysis, specifying a custom-named output JSON. This time, not a nexus!	
myfel = phyphy.FEL(data = "path/to/data.dat", output = "customname.json")
myfel.run_analysis()

Parsing HyPhy output JSON

Full API documentation is available here, and this PDF describes the contents JSON fields in standard analyses.

An Extractor instance should be defined using a single argument, either an executed Analysis instance or a specific JSON file:

import phyphy
	
### Run a FEL analysis, for example, with data in a single file
myfel = phyphy.FEL(data = "path/to/data.nex")
myfel.run_analysis()
	
### Create an Extractor instance to parse JSON produced by `myfel`
myext = phyphy.Extractor(myfel)
	
## Alternatively, create an Extractor instance with a JSON directly
myext = phyphy.Extractor("path/to/json.json")

There are several flavors of Extractor methods, all of which are detailed with examples in the API:

Reveal contents of the JSON
- .reveal_fields() returns a list of all top-level fields in the JSON, to see the overall structure of the file
- .reveal_fitted_models() returns a list names of models fitted to the data, from which various components can be extracted
- .reveal_branch_attributes(), returns a dictionary of branch attributes, i.e. branch-specific information, (generally, these correspond to names of the fitted models which represent the fitted branch lengths, and other analysis-specific elements).
Extract input information
- .extract_number_sequences() returns the number of sequences in the input data
- .extract_number_sites() returns the number of sites in the input data (note, this will be length/3 for codon analyses)
- .extract_partition_count() returns the number of partitions in the analysis
- .extract_input_tree() returns the original inputted phylogeny, with HyPhy node annotations
- .extract_input_file() returns the provided file name for the analyzed dataset
Extract fitted model components
- .extract_model_logl(<name of model>) returns the Log Likelihood of the fitted model
- .extract_model_estimated_parameters(<name of model>) returns the number of estimated parameters in the fitted model
- .extract_model_aicc(<name of model>) returns the small-sample AIC (AICc) of the fitted model
- .extract_model_rate_distributions(<name of model>) returns a dictionary (when applicable) of fitted model rate distributions
- .extract_model_frequenciesl(<name of model>) returns the empirical equilibrium frequencies of the fitted model
- .extract_model_logl(<name of model>)
Extract miscallaneous information
- .extract_branch_sets() returns a dictionary of structure <node name>:<branch set>. Can be rearranged to a dictionary of structure <branch set>:[list of nodes] with the argument by_set=True (or dictionary, with argument as_dict=True)
- .extract_branch_attribute(<attribute name) returns a dictionary of the desired attribute values.
- .extract_site_logl() and .extract_evidence_ratios() are BUSTED-specific methods to return these values, as dictionaries each
- .extract_timers() returns a dictionary of timers from the method (wall-clock time in seconds to complete each step in algorithm)
Extract a CSV, described in the next section.

Extracting CSVs from HyPhy output JSON

CSVs can be obtained for the methods FEL, SLAC, MEME, FUBAR, ABSREL, and LEISR, using the method .extract_csv(<csv_file_name>). Consult the API for information on column headers, and more usage information.

Briefly:

import phyphy

### Define a FEL Extractor, for example
e = Extractor("/path/to/FEL.json") 
e.extract_csv("fel.csv")  ## save to fel.csv

### tab-delimited output, as fel.tsv
e.extract_csv("fel.tsv", delim = "\t")

Parsing annotated trees from HyPhy output JSON

Of specific interest, phyphy uses the powerful Python package ete3 to assist in tree manipulation, allowing for the extraction of specific trees that can be used for downstream processing or visualization in other tools:

The method .extract_input_tree() allows you to obtain the original inputted phylogeny, with HyPhy node annotations
The method .extract_model_tree() allows you to obtain the fitted phylogeny for a given model (i.e., branch lengths will be updated). This will be output in standard newick format
The method .extract_feature_tree() allows you to obtain an annotated tree in Newick eXtended format (NHX), where nodes are annotated with the provided feature (i.e., attribute).
The method .extract_absrel_tree() is a special case of .extract_feature_tree() for specifically annotating branches based on whether an aBSREL analysis has found evidence for selection, at a given P-value threshold
Note, for multipartitioned analyses, you can specify a partition or obtain all partitions from either of these methods

Please consult the documentation for examples and full usage information. Some brief examples follow here:

To determine which models can be used with .extract_model_tree(), use the method . reveal_fitted_models() to get a list of all models which were fitted:

import phyphy

## Define a FEL Extractor, for example
e = phyphy.Extractor("/path/to/FEL.json") 
e.reveal_fitted_models()
  ['Nucleotide GTR', 'Global MG94xREV']
  
## Obtain tree fitted during the Global MG94xREV fit
## Consult the API for more arguments and customizations
e.extract_model_tree('Nucleotide GTR')
'((((Pig:0.192554792971,Cow:0.247996722936)Node3:0.101719189407,Horse:0.211310618381,Cat:0.273732369855)Node2:0.0644249932769,((RhMonkey:0.00372054481786,Baboon:0.0017701670358)Node9:0.0259206344918,(Human:0,Chimp:0.00182836999996)Node12:0.0178636195889)Node8:0.109431753602)Node1:0.284434196447,Rat:0.0670087588444,Mouse:0.120166947697);'

Feature trees in NHX format can be created using any of the available branch attributes. To determine which attributes are available, use the method .reveal_branch_attributes() to get a list of all dictionary attributes, where the keys are the attribute names and the values are the type of attribute (either a branch length, node label, or branch label). Any of these attributes can then be used as a feature:

import phyphy

## Define an ABSREL Extractor, for example
e = phyphy.Extractor("/path/to/ABSREL.json") 
e.reveal_branch_attributes()
	{'Corrected P-value': 'branch label', 
	 'Rate classes': 'branch label', 
	 'original name': 'node label', 
	 'Full adaptive model': 'branch length', 
	 'Rate Distributions': 'branch label', 
	 'Uncorrected P-value': 'branch label', 
	 'Baseline MG94xREV': 'branch length', 
	 'Nucleotide GTR': 'branch length', 
	 'Baseline MG94xREV omega ratio': 'branch label', 
	 'LRT': 'branch label'}
 
  
## Any of those keys can then be used, or multiple in a list, to extract as features:
## Again, please consult the API for more arguments and customizations
e.extract_feature_tree('LRT')
'(0564_7:1[&&NHX:LRT=0],(((((0564_11:1[&&NHX:LRT=3.96048976951],0564_4:1[&&NHX:LRT=4.80587881259])Node20:1[&&NHX:LRT=3.30060030447],(0564_1:1[&&NHX:LRT=0.0105269546166],(0564_21:1[&&NHX:LRT=0],0564_5:1[&&NHX:LRT=4.51927751707])Node25:1[&&NHX:LRT=0])Node23:1[&&NHX:LRT=0])Node19:1[&&NHX:LRT=0.153859379099],0564_17:1[&&NHX:LRT=0])Node18:1[&&NHX:LRT=1.64667962972],((0564_13:1[&&NHX:LRT=0],(0564_15:1[&&NHX:LRT=4.97443221859])Node32:1[&&NHX:LRT=0])Node30:1[&&NHX:LRT=2.86518293899],((0564_22:1[&&NHX:LRT=0.114986865638],0564_6:1[&&NHX:LRT=0])Node36:1[&&NHX:LRT=0],0564_3:1[&&NHX:LRT=14.0568340492])Node35:1[&&NHX:LRT=22.65142315])Node29:1[&&NHX:LRT=1.50723222708])Node17:1[&&NHX:LRT=2.63431127725],0564_9:1[&&NHX:LRT=0])Node16:1[&&NHX:LRT=0],(((0557_24:1[&&NHX:LRT=0],0557_4:1[&&NHX:LRT=0],0557_2:1[&&NHX:LRT=0])Node9:1[&&NHX:LRT=0],0557_12:1[&&NHX:LRT=0])Node8:1[&&NHX:LRT=1.99177154715],((0557_21:1[&&NHX:LRT=0],0557_6:1[&&NHX:LRT=0.662153642024],0557_9:1[&&NHX:LRT=0],0557_11:1[&&NHX:LRT=2.65063418443],0557_13:1[&&NHX:LRT=0],0557_26:1[&&NHX:LRT=0],(0557_5:1[&&NHX:LRT=1.98893541781],0557_7:1[&&NHX:LRT=0])Node53:1[&&NHX:LRT=0.660753167251])Node6:1[&&NHX:LRT=0],0557_25:1[&&NHX:LRT=0])Node7:1[&&NHX:LRT=1.69045394756])Separator:1[&&NHX:LRT=14.127483568])[&&NHX:LRT=0];'

## With specific branch lengths for a fitted model:
e.extract_feature_tree('LRT', update_branch_lengths = "Full adaptive model")
'(0564_7:0.00708844[&&NHX:LRT=0],(((((0564_11:0.00527268[&&NHX:LRT=3.96048976951],0564_4:0.00714182[&&NHX:LRT=4.80587881259])Node20:0.0022574[&&NHX:LRT=3.30060030447],(0564_1:0.00583239[&&NHX:LRT=0.0105269546166],(0564_21:0.00121537[&&NHX:LRT=0],0564_5:0.00266921[&&NHX:LRT=4.51927751707])Node25:0.000797211[&&NHX:LRT=0])Node23:0.00142056[&&NHX:LRT=0])Node19:0.0019147[&&NHX:LRT=0.153859379099],0564_17:0.00605582[&&NHX:LRT=0])Node18:0.00100178[&&NHX:LRT=1.64667962972],((0564_13:0.0053066[&&NHX:LRT=0],(0564_15:0.00346989[&&NHX:LRT=4.97443221859])Node32:0.000752206[&&NHX:LRT=0])Node30:0.00188243[&&NHX:LRT=2.86518293899],((0564_22:0.00686981[&&NHX:LRT=0.114986865638],0564_6:0.00581523[&&NHX:LRT=0])Node36:0.00125905[&&NHX:LRT=0],0564_3:0.00791919[&&NHX:LRT=14.0568340492])Node35:0.0174886[&&NHX:LRT=22.65142315])Node29:0.0010489[&&NHX:LRT=1.50723222708])Node17:0.00156911[&&NHX:LRT=2.63431127725],0564_9:0.00551506[&&NHX:LRT=0])Node16:0.000783733[&&NHX:LRT=0],(((0557_24:0.00078793[&&NHX:LRT=0],0557_4:0.000787896[&&NHX:LRT=0],0557_2:0.000399166[&&NHX:LRT=0])Node9:0.00206483[&&NHX:LRT=0],0557_12:0.00267531[&&NHX:LRT=0])Node8:0.00118205[&&NHX:LRT=1.99177154715],((0557_21:0[&&NHX:LRT=0],0557_6:0.000391941[&&NHX:LRT=0.662153642024],0557_9:0.000402021[&&NHX:LRT=0],0557_11:0.00156985[&&NHX:LRT=2.65063418443],0557_13:0.000401742[&&NHX:LRT=0],0557_26:0.00079377[&&NHX:LRT=0],(0557_5:0.00117641[&&NHX:LRT=1.98893541781],0557_7:0[&&NHX:LRT=0])Node53:0.000391973[&&NHX:LRT=0.660753167251])Node6:0.00118062[&&NHX:LRT=0],0557_25:0.00220372[&&NHX:LRT=0])Node7:0.00103489[&&NHX:LRT=1.69045394756])Separator:0.00822051[&&NHX:LRT=14.127483568])[&&NHX:LRT=0];'

ABSREL selection trees can be created in much the same way, using the method .extract_absrel_tree():

import phyphy

## Define an ABSREL Extractor
e = phyphy.Extractor("/path/to/ABSREL.json") 
## Selected defined as be all P<=0.1, and use Full Adaptive Model branch lengths
e.extract_absrel_tree(p = 0.1, update_branch_lengths = "Full adaptive model")
'(0564_7:0.00708844[&&NHX:Selected=0],(((((0564_11:0.00527268[&&NHX:Selected=0],0564_4:0.00714182[&&NHX:Selected=0])Node20:0.0022574[&&NHX:Selected=0],(0564_1:0.00583239[&&NHX:Selected=0],(0564_21:0.00121537[&&NHX:Selected=0],0564_5:0.00266921[&&NHX:Selected=0])Node25:0.000797211[&&NHX:Selected=0])Node23:0.00142056[&&NHX:Selected=0])Node19:0.0019147[&&NHX:Selected=0],0564_17:0.00605582[&&NHX:Selected=0])Node18:0.00100178[&&NHX:Selected=0],((0564_13:0.0053066[&&NHX:Selected=0],(0564_15:0.00346989[&&NHX:Selected=0])Node32:0.000752206[&&NHX:Selected=0])Node30:0.00188243[&&NHX:Selected=0],((0564_22:0.00686981[&&NHX:Selected=0],0564_6:0.00581523[&&NHX:Selected=0])Node36:0.00125905[&&NHX:Selected=0],0564_3:0.00791919[&&NHX:Selected=1])Node35:0.0174886[&&NHX:Selected=1])Node29:0.0010489[&&NHX:Selected=0])Node17:0.00156911[&&NHX:Selected=0],0564_9:0.00551506[&&NHX:Selected=0])Node16:0.000783733[&&NHX:Selected=0],(((0557_24:0.00078793[&&NHX:Selected=0],0557_4:0.000787896[&&NHX:Selected=0],0557_2:0.000399166[&&NHX:Selected=0])Node9:0.00206483[&&NHX:Selected=0],0557_12:0.00267531[&&NHX:Selected=0])Node8:0.00118205[&&NHX:Selected=0],((0557_21:0[&&NHX:Selected=0],0557_6:0.000391941[&&NHX:Selected=0],0557_9:0.000402021[&&NHX:Selected=0],0557_11:0.00156985[&&NHX:Selected=0],0557_13:0.000401742[&&NHX:Selected=0],0557_26:0.00079377[&&NHX:Selected=0],(0557_5:0.00117641[&&NHX:Selected=0],0557_7:0[&&NHX:Selected=0])Node53:0.000391973[&&NHX:Selected=0])Node6:0.00118062[&&NHX:Selected=0],0557_25:0.00220372[&&NHX:Selected=0])Node7:0.00103489[&&NHX:Selected=0])Separator:0.00822051[&&NHX:Selected=1])[&&NHX:Selected=0];'

Note that, for all analysis, HyPhy will rename taxa when needed to "safe" names when certain forbidden characters are seen. In most cases, taxon names are safe. By default, extracted trees will contain these safe names. To force the output to use the original names as provided in the alignment/tree analyzed, use the argument original_names = True with any of these tree extraction method.

Finally, any NHX tree can be visualized with a variety of programmatic platforms, including ete3 in Python3 or ggtree in R/Bioconductor. Examples of visualizing NHX trees with either of these two platforms are available in examples/visualize_feature_tree_ete3.py and examples/visualize_feature_tree_ggtree.R, respectively.

Get help

Full documentation is available from https://sjspielman.github.io/phyphy
Please post questions and bugs to the Issues page.
This PDF contains a reference describing the contents JSON fields in standard analyses.

A note for conda users

If you are using the conda distribution (either via miniconda or anaconda) of HyPhy on Linux, you may need to issue the following command (with the appropriate path inserted) before launching your phyphy script:

export LD_LIBRARY_PATH=/path/to/miniconda2/lib:$LD_LIBRARY_PATH

Citation

Spielman, SJ (2018). phyphy: Python package for facilitating the execution and parsing of HyPhy standard analyses. Journal of Open Source Software, 3(21), 514, https://doi.org/10.21105/joss.00514

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
docs		docs
examples		examples
paper		paper
rawdocs		rawdocs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

License

sjspielman/phyphy

Folders and files

Latest commit

History

Repository files navigation

phyphy

This repository is therefore now archived.

Table of contents

Installation

Using phyphy

Overview

Examples

Defining HyPhy instances

Executing HyPhy Analyses

Parsing HyPhy output JSON

Extracting CSVs from HyPhy output JSON

Parsing annotated trees from HyPhy output JSON

Get help

A note for conda users

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages