Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
tons of changes from git
  • Loading branch information
h.redestig committed Jan 16, 2016
1 parent 342fa27 commit 101ba26
Show file tree
Hide file tree
Showing 87 changed files with 288 additions and 153 deletions.
7 changes: 5 additions & 2 deletions DESCRIPTION
Expand Up @@ -16,7 +16,9 @@ Description: Provides Bayesian PCA, Probabilistic PCA, Nipals PCA,
data structure (pcaRes) to provide a common interface to the
PCA results. Initiated at the Max-Planck Institute for
Molecular Plant Physiology, Golm, Germany.
Version: 1.61.0
Version: 1.63.0
URL: https://github.com/hredestig/pcamethods
BugReports: https://github.com/hredestig/pcamethods/issues
Depends:
Biobase,
methods
Expand All @@ -28,8 +30,8 @@ Suggests:
matrixStats,
lattice
Collate:
'errorHierarchic.R'
'derrorHierarchic.R'
'errorHierarchic.R'
'AllClasses.R'
'AllGenerics.R'
'BPCA_dostep.R'
Expand Down Expand Up @@ -60,3 +62,4 @@ Collate:
'xval.R'
Packaged: 2013-03-12 21:41:03 UTC; henning
biocViews: Bayesian
RoxygenNote: 5.0.1
2 changes: 1 addition & 1 deletion NAMESPACE
@@ -1,4 +1,4 @@
# Generated by roxygen2 (4.0.2): do not edit by hand
# Generated by roxygen2: do not edit by hand

S3method(biplot,pcaRes)
S3method(dim,pcaRes)
Expand Down
24 changes: 12 additions & 12 deletions R/BPCA_dostep.R
Expand Up @@ -16,18 +16,18 @@ BPCA_dostep <- function(M,y) {
M$scores <- matrix(NA, M$rows, M$comps)

## Expectation step for data without missing values
Rx <- diag(M$comps) + M$tau * t(M$PA) %*% M$PA + M$SigW
Rxinv <- solve(Rx)
idx <- M$row_nomiss
Rx <- diag(M$comps) + M$tau * t(M$PA) %*% M$PA + M$SigW
Rxinv <- solve(Rx)
idx <- M$row_nomiss

if (length(idx) == 0) {
trS <- 0
T <- 0
} else {
dy <- y[idx,, drop=FALSE] - repmat(M$mean, length(idx), 1)
x <- M$tau * Rxinv %*% t(M$PA) %*% t(dy)
T <- t(dy) %*% t(x)
trS <- sum(sum(dy * dy))
dy <- y[idx,, drop=FALSE] - repmat(M$mean, length(idx), 1)
x <- M$tau * Rxinv %*% t(M$PA) %*% t(dy)
T <- t(dy) %*% t(x)
trS <- sum(sum(dy * dy))

## Assign the scores for complete rows
xTranspose <- t(x)
Expand Down Expand Up @@ -59,19 +59,19 @@ BPCA_dostep <- function(M,y) {
M$scores[M$row_miss[n],] <- t(x)
}
}
T <- T / M$rows
trS <- trS / M$rows
T <- T / M$rows
trS <- trS / M$rows

## Maximation step
Rxinv <- solve(Rx)
Dw <- Rxinv + M$tau * t(T) %*% M$PA %*% Rxinv +
Dw <- Rxinv + M$tau * t(T) %*% M$PA %*% Rxinv +
diag(M$alpha, nrow = length(M$alpha)) / M$rows
Dwinv <- solve(Dw)
M$PA <- T %*% Dwinv ## The new estimate of the principal axes (loadings)
M$tau <- (M$cols + 2 * M$gtau0 / M$rows) / (trS - sum(diag(t(T) %*% M$PA)) +
(M$mean %*% t(M$mean) * M$gmu0 + 2 * M$gtau0 / M$btau0) / M$rows)
M$tau <- M$tau[1,1] ## convert to scalar
M$SigW <- Dwinv * (M$cols / M$rows)
M$tau <- M$tau[1,1] ## convert to scalar
M$SigW <- Dwinv * (M$cols / M$rows)
M$alpha <- (2 * M$galpha0 + M$cols) / (M$tau * diag(t(M$PA) %*% M$PA) +
diag(M$SigW) + 2 * M$galpha0 / M$balpha0)

Expand Down
47 changes: 29 additions & 18 deletions R/bpca.R
@@ -1,6 +1,6 @@
##' Implements a Bayesian PCA missing value estimator. The script
##' is a port of the Matlab version provided by Shigeyuki OBA. See
##' also \url{http://hawaii.aist-nara.ac.jp/\%7Eshige-o/tools/}.
##' also \url{http://ishiilab.jp/member/oba/tools/BPCAFill.html}.
##' BPCA combines an EM approach for PCA with a Bayesian model. In
##' standard PCA data far from the training set but close to the
##' principal subspace may have the same reconstruction error. BPCA
Expand Down Expand Up @@ -40,6 +40,16 @@
##'
##' It is not recommended to use this function directely but rather to
##' use the pca() wrapper function.
##'
##' There is a difference with respect the interpretation of rows
##' (observations) and columns (variables) compared to matlab
##' implementation. For estimation of missing values for microarray
##' data, the suggestion in the original bpca is to intepret genes as
##' observations and the samples as variables. In pcaMethods however,
##' genes are interpreted as variables and samples as observations
##' which arguably also is the more natural interpretation. For bpca
##' behavior like in the matlab implementation, simply transpose your
##' input matrix.
##'
##' Details about the probabilistic model underlying BPCA are found in
##' Oba et. al 2003. The algorithm uses an expectation maximation
Expand All @@ -59,31 +69,32 @@
##' complexity for inverting a matrix of size
##' \eqn{components}{components}. Components is the number of
##' components used for re-estimation.
##' @title Bayesian PCA Missing Value Estimator
##' @title Bayesian PCA missing value estimation
##' @param Matrix \code{matrix} -- Pre-processed matrix (centered,
##' scaled) with variables in columns and observations in rows. The
##' data may contain missing values, denoted as \code{NA}.
##' scaled) with variables in columns and observations in rows. The
##' data may contain missing values, denoted as \code{NA}.
##' @param nPcs \code{numeric} -- Number of components used for
##' re-estimation. Choosing few components may decrease the estimation
##' precision.
##' re-estimation. Choosing few components may decrease the
##' estimation precision.
##' @param maxSteps \code{numeric} -- Maximum number of estimation
##' steps.
##' steps.
##' @param verbose \code{boolean} -- BPCA prints the number of steps
##' and the increase in precision if set to TRUE. Default is
##' interactive().
##' and the increase in precision if set to TRUE. Default is
##' interactive().
##' @param threshold convergence threshold
##' @param ... Reserved for future use. Currently no further
##' parameters are used
##' parameters are used
##' @return Standard PCA result object used by all PCA-based methods
##' of this package. Contains scores, loadings, data mean and
##' more. See \code{\link{pcaRes}} for details.
##' @references Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa,
##' Morito Monden, Ken-ichi Matsubara and Shin Ishii. A Bayesian
##' missing value estimation method for gene expression profile
##' data. \emph{Bioinformatics, 19(16):2088-2096, Nov 2003}.
##' of this package. Contains scores, loadings, data mean and
##' more. See \code{\link{pcaRes}} for details.
##' @references Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito
##' Monden, Ken-ichi Matsubara and Shin Ishii. A Bayesian missing
##' value estimation method for gene expression profile
##' data. \emph{Bioinformatics, 19(16):2088-2096, Nov 2003}.
##' @seealso \code{\link{ppca}}, \code{\link{svdImpute}},
##' \code{\link{prcomp}}, \code{\link{nipalsPca}}, \code{\link{pca}},
##' \code{\link{pcaRes}}. \code{\link{kEstimate}}.
##' \code{\link{prcomp}}, \code{\link{nipalsPca}},
##' \code{\link{pca}},
##' \code{\link{pcaRes}}. \code{\link{kEstimate}}.
##' @note Requires \code{MASS}.
##' @examples
##' ## Load a sample metabolite dataset with 5\% missig values (metaboliteData)e
Expand Down
31 changes: 12 additions & 19 deletions R/ppca.R
Expand Up @@ -47,7 +47,8 @@
##' vary slightly. Set the seed for exact reproduction of your
##' results.
##' @param threshold Convergence threshold.
##' @param ... Reserved for future use. Currently no further
##' @param maxIterations the maximum number of allowed iterations
##' @param ... Reserved for future use. Currently no further
##' parameters are used.
##' @note Requires \code{MASS}. It is not recommended to use this
##' function directely but rather to use the pca() wrapper function.
Expand All @@ -71,7 +72,7 @@
##' @keywords multivariate
##' @author Wolfram Stacklies
##' @export
ppca <- function(Matrix, nPcs=2, seed=NA, threshold=1e-5, ...) {
ppca <- function(Matrix, nPcs=2, seed=NA, threshold=1e-5, maxIterations=1000, ...) {
## Set the seed to the user defined value. This affects the generation
## of random values for the initial setup of the loading matrix
if (!is.na(seed))
Expand All @@ -96,7 +97,7 @@ ppca <- function(Matrix, nPcs=2, seed=NA, threshold=1e-5, ...) {
X <- Matrix %*% C %*% solve(CtC)
recon <- X %*% t(C)
recon[hidden] <- 0
ss <- sum(sum((recon - Matrix)^2)) / (N - missing)
ss <- sum(sum((recon - Matrix)^2)) / (N * D - missing)

count <- 1
old <- Inf
Expand Down Expand Up @@ -124,28 +125,20 @@ ppca <- function(Matrix, nPcs=2, seed=NA, threshold=1e-5, ...) {
ss <- ( sum(sum( (C %*% t(X) - t(Matrix))^2 )) + N * sum(sum(CtC %*% Sx)) +
missing * ss_old ) / (N * D)

## Some of the values may be negative at the beginning of the iteration,
## check that we are ot trying to calculate a log(<0)
if( (ss < 0) | (det(Sx) < 0) | (ss_old < 0) ) {
objective <- NaN
} else {
objective <- N * (D * log(ss) + sum(diag(Sx)) - log(det(Sx)) ) +
sum(diag(SumXtX)) - missing * log(ss_old)
}
objective <- N * (D * log(ss) + sum(diag(Sx)) - log(det(Sx)) ) +
sum(diag(SumXtX)) - missing * log(ss_old)

rel_ch <- abs( 1 - objective / old )
old <- objective

count <- count + 1
if (!is.nan(rel_ch)) {
if( (rel_ch < threshold) & (count > 5) ) {
count <- 0
}
} else if (count > 1000) {
if( rel_ch < threshold & count > 5 ) {
count <- 0
}
else if (count > maxIterations) {
count <- 0
warning("Stopped after 1000 iterations, but rel_ch was NaN\n",
"Results may be inaccurate\n")
}
warning("stopped after max iterations, but rel_ch was > threshold")
}
} ## End EM iteration
C <- orth(C)
evs <- eigen( cov(Matrix %*% C) )
Expand Down
2 changes: 1 addition & 1 deletion R/robustPca.R
Expand Up @@ -161,7 +161,7 @@ robustSvd <- function(x) {
## We need the weightedMedian function provided by the aroma.light
## package. However we do not want to make the whole package dependant
## on aroma.light
if (!require(matrixStats, quietly=TRUE))
if (!requireNamespace("matrixStats", quietly=TRUE))
stop("package matrixStats required but not available")

L1RegCoef <- function(x,a){
Expand Down
40 changes: 37 additions & 3 deletions README.md
@@ -1,4 +1,38 @@
pcaMethods
==========
# pcaMethods

R package for performing PCA with applications to missing value imputation
R package for performing
[principal component analysis PCA](https://en.wikipedia.org/wiki/Principal_component_analysis)
with applications to missing value imputation. Provides a single
interface to performing PCA using

- **SVD:** a fast method which is also the standard method in R but
which is not applicable for data with missing values.
- **NIPALS:** an iterative fast method which is applicable also to
data with missing values.
- **PPCA:** Probabilistic PCA which is applicable also on data with
missing values. Missing value estimation is typically better than
NIPALS but also slower to compute and uses more memory. A port to R
of the
[implementation by Jakob Verbeek](http://lear.inrialpes.fr/~verbeek/software.php).
- **BPCA:** Bayesian PCA which performs very well in the presence of
missing values but is slower than PPCA. A port of the
[matlab implementation by Shigeyuki Oba](http://ishiilab.jp/member/oba/tools/BPCAFill.html).
- **NLPCA:** Non-linear PCA which can find curves in data and in
presence of such can perform accurate missing value
estimation. [Matlab port of the implementation by Mathias Scholz](http://www.nlpca.org/).


[pcaMethods is a Bioconductor package](http://www.bioconductor.org/packages/release/bioc/html/pcaMethods.html)
and you can install it by

```R
source("https://bioconductor.org/biocLite.R")
biocLite("pcaMethods")
```

## Documentation

```R
browseVignettes("pcaMethods")
?<function_name>
```
3 changes: 2 additions & 1 deletion man/BPCA_dostep.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/BPCA_initmodel.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/DModX-pcaRes-method.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/Q2.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/R2VX-pcaRes-method.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/R2cum-pcaRes-method.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/RnipalsPca.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/asExprSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/biplot-methods.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 101ba26

Please sign in to comment.