SparSNP

SparSNP fits lasso-penalized linear models to SNP data. Its main features are:

it can fit squared hinge loss for classification (case/control) and linear regression (quantitative phenotypes)
takes PLINK BED/FAM files as input
the amount of memory is bounded - can work with large datasets using little memory (typically <1GB, more for better performance)
fits a model over a grid of penalties, and writes the estimated coefficients to disk
it can also do cross-validation, using the estimated coefficients to predict outputs for other datasets
efficient - it uses warm-restarts plus an active-set approach, the model fitting part of 3-fold cross-validation for a dataset of 2000 samples by 300,000 SNP dataset takes ~5min, and about 25min for ~6800 samples / ~516,000 SNPs

Contact

Citation

G. Abraham, A. Kowalczyk, J. Zobel, and M. Inouye, ``SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction'', BMC Bioinformatics, 2012, 13:88, doi:10.1186/1471-2105-13-88

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

Requirements

For the post-analysis scripts: R packages ggplot2 >=0.9.3, scales, grid, abind, ROCR

A 64-bit operating system is recommended; we have tested SparSNP on 64-bit OSX and Linux.

Quick Start

To get the latest version:

git clone git://github.com/gabraham/SparSNP

To install:

cd SparSNP
make

Run (assuming a PLINK BED/BIM/FAM dataset named MYDATA, i.e. MYDATA.bim)

export PATH=<PATH_TO_SPARSNP>:$PATH
crossval.sh MYDATA sqrhinge 2>&1 | tee log
eval.R

Documentation: see the document https://github.com/gabraham/SparSNP/blob/master/workflow.pdf

Changelog: see https://github.com/gabraham/SparSNP/blob/master/CHANGELOG

Acknowledgments

Marco Colombo, patches for consistent lambda1 path

Name		Name	Last commit message	Last commit date
Latest commit History 663 Commits
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cbind.c		cbind.c
coder.c		coder.c
coder.h		coder.h
common.c		common.c
common.h		common.h
covtest.c		covtest.c
crossval.sh		crossval.sh
crossvaluni.sh		crossvaluni.sh
eval.R		eval.R
evalprofile.R		evalprofile.R
gennetwork.c		gennetwork.c
gennetwork.h		gennetwork.h
gennetwork_test.R		gennetwork_test.R
gennetwork_test.c		gennetwork_test.c
getmodels.R		getmodels.R
gmatrix.c		gmatrix.c
gmatrix.h		gmatrix.h
ind.c		ind.c
ind.h		ind.h
link.c		link.c
link.h		link.h
main.c		main.c
makefolds.c		makefolds.c
matrix.c		matrix.c
matrix.h		matrix.h
multivariable.c		multivariable.c
multivariable.h		multivariable.h
options.c		options.c
predict.sh		predict.sh
realpath.c		realpath.c
runonce.sh		runonce.sh
scale.c		scale.c
secondstage.R		secondstage.R
sparsnp.c		sparsnp.c
sparsnp.h		sparsnp.h
submitpbs.sh		submitpbs.sh
subsample.c		subsample.c
svd.c		svd.c
svd.h		svd.h
thin.c		thin.c
thin.h		thin.h
thintest.c		thintest.c
transpose.c		transpose.c
univariable.c		univariable.c
univariable.h		univariable.h
unpack.c		unpack.c
util.c		util.c
util.h		util.h
varexp.R		varexp.R
workflow.pdf		workflow.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparSNP

Contact

Citation

License

Requirements

Quick Start

Acknowledgments

About

Releases

Packages

Languages

License

gabraham/SparSNP

Folders and files

Latest commit

History

Repository files navigation

SparSNP

Contact

Citation

License

Requirements

Quick Start

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages