harmslab/epistasis

A Python API for estimating statistical high-order epistasis in large genotype-phenotype maps.
Python Jupyter Notebook
 Failed to load latest commit information. docs Jan 10, 2017 epistasis Jan 17, 2017 examples Jan 3, 2017 .gitignore Aug 30, 2016 Dockerfile Jan 11, 2017 MANIFEST.in Apr 1, 2015 README.md Jan 13, 2017 UNLICENSE Jan 13, 2017 index.ipynb Sep 1, 2016 requirements.txt Jan 10, 2017 setup.py Jan 11, 2017

High Order Epistasis Models/Regressions for Genotype-Phenotype Maps

A python API for modeling statistical, high-order epistasis in genotype-phenotype maps. All models follow a Scikit-learn interface, making it easy to integrate `epistasis` models with other pipelines and software. It includes a plotting module built on matplotlib for visualizing high-order interactions and interactive widgets to simplify complex nonlinear fits.

This package includes APIs for both linear and nonlinear epistasis models, described in this paper, relaxing the assumption of linearity.

Basic examples

A simple example of fitting a data set with a linear epistasis model.

```# Import epistasis model
from epistasis.models import EpistasisLinearRegression

# Read data from file and estimate epistasis
model = EpistasisLinearRegression.from_json("dataset.json", order=3)
model.fit()

# Estimate the uncertainty in epistatic coefficients
model.bootstrap_fit()```

If analyzing a nonlinear genotype-phenotype map, use `NonlinearEpistasisModel` (nonlinear least squares regression) to estimate nonlinearity in map:

```# Import the nonlinear epistasis model
from epistasis.models import NonlinearEpistasisRegression

# Define a nonlinear function to fit the genotype-phenotype map.
def boxcox(x, lmbda, lmbda2):
"""Fit with a box-cox function to estimate nonlinearity."""
return ((x-lmbda2)**lmbda - 1 )/lmbda

def reverse_boxcox(y, lmbda, lmbda2):
"inverse of the boxcox function."
return (lmbda*y + 1) ** (1/lmbda) + lmbda2

# Read data from file and estimate nonlinearity in dataset.
model = EpistasisNonlinearRegression.from_json("dataset.json",
function=boxbox,
reverse=reverse_boxcox,
order=1,
)

# Give initial guesses for parameters to aid in convergence (not required).
model.fit(lmbda=1, lmbda2=1)```

The nonlinear fit also includes Jupyter Notebook widgets to make nonlinear fitting easier.

`model.fit_widget(lmbda=(-2,2,.1), lmbda2=(-2,2,.1))`

More demos are available as binder notebooks.

Installation

To install, clone these repo and run:

``````python setup.py install
``````

or, if you'd like to soft install for development:

``````python setup.py develop
``````

This package is still really hacked together. I plan to include examples and clean up some of the plotting/network managing very soon.

Works in Python 2.7+ and Python 3+

Documentation

Documentation and API reference can be viewed here.

Dependencies

• gpmap: Module for constructing powerful genotype-phenotype map python data-structures.
• Scikit-learn: Simple to use machine-learning algorithms
• Numpy: Python's array manipulation packaged
• Scipy: Efficient scientific array manipulations and fitting.

Citations

If you use this API for research, please cite this paper.