{bignspr} is an R package for the analysis of massive SNP arrays. It enhances the features of package {bigstatsr} for the purpose of analysing genotype data.
# For the current version
devtools::install_github("privefl/bigsnpr")
This package reads bed/bim/fam files (PLINK preferred format) using function snp_readBed()
. Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC
and snp_plinkIBDQC
.
This package now also reads UK Biobank BGEN files using function snp_readBGEN()
.
This package uses a class called bigSNP
for representing SNP data. A bigSNP
object is just a list with some elements:
genotypes
: AFBM.code256
. Rows are samples and columns are SNPs. This stores genotypes calls or dosages (rounded to 2 decimal places).fam
: Adata.frame
containing some information on the SNPs.map
: Adata.frame
giving some information on the individuals.
devtools::source_gist("42b41d771bbeae63245b8304ef283c70", filename = "get-genes.R")
rsid <- c("rs3934834", "rs3737728", "rs6687776", "rs9651273", "rs4970405",
"rs12726255", "rs2298217", "rs4970362", "rs9660710", "rs4970420")
snp_gene(rsid)
- Imputation of probabilities and multiple imputation.
- An interactive QC procedure (call rates, difference of missingness between cases and controls, MAF cutoff, relatedness, HWE, autosomal only, others?).
- Proper integration of haploid species.
You can request some feature by opening an issue.
Please open an issue if you find a bug. If you want help using {bigstatsr}, please post on Stack Overflow with the tag bigstatsr. How to make a great R reproducible example?