Autoregressive protein model learning through generalized logistic regression in Julia.
The authors of this code are Jeanne Trinquier, Guido Uguzzoni, Andrea Pagnani, Francesco Zamponi, and Martin Weigt.
See also this Wikipedia article article for a general overview of the Direct Coupling Analysis technique.
The code is written in Julia.
This is a registered package: to install enter ]
in the repl and
pkg> add ArDCA
There are two jupyter
notebooks (Python, and Julia) to help using the Package.
The tutorial.ipynb is for the julia version. The arDCA_sklearn.ipynb is for the python version.
Data for five protein families (PF00014,PF00072, PF00076,PF00595,PF13354) are contained in the companion ArDCAData package.
For didactic reasons we include locally in the data
folder, the PF00014 dataset.
The minimal Julia version to run this code is 1.5. To run it in parallel using Julia multicore infrastructure, start julia with
$> julia -t numcores # ncores can be as large as your available number of threads
This project is covered under the MIT License.