Skip to content

Commit

Permalink
documentation 0.2.1
Browse files Browse the repository at this point in the history
  • Loading branch information
michalovadek committed Aug 28, 2023
1 parent 66216f8 commit 3ca8b3c
Show file tree
Hide file tree
Showing 8 changed files with 47 additions and 48 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: nmfbin
Title: Non-negative Matrix Factorization for Binary Data
Version: 0.1.0
Version: 0.2.1
Authors@R: c(person(given = "Michal",
family = "Ovadek",
role = c("aut", "cre", "cph"),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# nmfbin 0.2.1

* documentation improvements

# nmfbin 0.2.0

* Full rewrite, simplification, improved terminology
Expand Down
20 changes: 10 additions & 10 deletions R/nmfbin.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,24 @@
#' This function performs Logistic Non-negative Matrix Factorization (NMF) on a binary matrix.
#'
#' @param X A binary matrix (m x n) to be factorized.
#' @param k The number of factors or components.
#' @param optimizer Type of updating algorithm. `update` for NMF multiplicative update rules or `gradient` for gradient descent.
#' @param init Method for initializing the factorization.
#' @param max_iter Maximum number of iterations for the gradient descent optimization.
#' @param k The number of factors (components, topics).
#' @param optimizer Type of updating algorithm. `mur` for NMF multiplicative update rules, `gradient` for gradient descent, `sgd` for stochastic gradient descent.
#' @param init Method for initializing the factorization. By default Nonnegative Double Singular Value Decomposition with average densification.
#' @param max_iter Maximum number of iterations for optimization.
#' @param tol Convergence tolerance. The optimization stops when the change in loss is less than this value.
#' @param learning_rate Learning rate (step size) for the gradient descent optimization.
#' @param verbose Print convergence if `TRUE`.
#' @param loss_fun Choice of loss function.
#' @param loss_normalize Normalize loss if `TRUE`.
#' @param loss_fun Choice of loss function: `logloss` (negative log-likelihood, also known as binary cross-entropy) or `mse` (mean squared error).
#' @param loss_normalize Normalize loss by matrix dimensions if `TRUE`.
#' @param epsilon Constant to avoid log(0).
#'
#' @return A list containing:
#' \itemize{
#' \item \code{W}: The basis matrix (m x k).
#' \item \code{H}: The coefficient matrix (k x n).
#' \item \code{W}: The basis matrix (m x k). The document-topic matrix in topic modelling.
#' \item \code{H}: The coefficient matrix (k x n). Contribution of features to factors (topics).
#' \item \code{c}: The global threshold.
#' \item \code{convergence}: Divergence (loss) from `X` at every `iter` until `tol` or `max_iter` is reached.
#' \item \code{final_loss}: The final loss before `tol` or `max_iter` was reached.
#' }
#'
#' @examples
Expand All @@ -34,8 +36,6 @@
#'
#' # Apply the function
#' result <- nmfbin(X, k)
#' }
#'
#' @export

nmfbin <- function(X, k, optimizer = "mur", init = "nndsvd", max_iter = 1000, tol = 1e-6, learning_rate = 0.001,
Expand Down
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,13 @@

The `nmfbin` R package provides a simple Non-Negative Matrix Factorization (NMF) implementation tailored for binary data matrices. It offers a choice of initialization methods, loss functions and updating algorithms.

Unlike most other NMF packages, this one is focused on (1) binary (Boolean) data and (2) minimizing dependencies.
NMF is typically used for reducing high-dimensional matrices into lower (k-) rank ones where _k_ is chosen by the user. Given a non-negative matrix _X_ of size $m \times n$, NMF looks for two non-negative matrices _W_ ($m \times k$) and _H_ ($k \times n$), such that:

Note the package is in early stages of development.
$$X \approx W \times H$$

In topic modelling, _W_ is interpreted as the document-topic matrix and _H_ as the topic-feature matrix.

Unlike most other NMF packages, `nmfbin` is focused on binary (Boolean) data, while keeping the number of dependencies to a minimum.

## Installation

Expand All @@ -30,10 +34,10 @@ The input matrix can only contain 0s and 1s.
library(nmfbin)

# Create a binary matrix for demonstration
X <- matrix(sample(c(0, 1), 100, replace=TRUE), ncol=10)
X <- matrix(sample(c(0, 1), 100, replace = TRUE), ncol = 10)

# Perform NMF
results <- nmfbin(X, k=3, optimizer = "mur", init = "nndsvd")
# Perform Logistic NMF
results <- nmfbin(X, k = 3, optimizer = "mur", init = "nndsvd", max_iter = 1000)
```

## Citation
Expand All @@ -43,7 +47,7 @@ results <- nmfbin(X, k=3, optimizer = "mur", init = "nndsvd")
title = {nmfbin: Non-negative Matrix Factorization for Binary Data},
author = {Michal Ovadek},
year = {2023},
note = {R package version 0.2.0},
note = {R package version 0.2.1},
url = {https://michalovadek.github.io/nmfbin/},
}
```
Expand Down
1 change: 0 additions & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
url: https://michalovadek.github.io/nmfbin/
template:
bootstrap: 5

2 changes: 1 addition & 1 deletion inst/CITATION
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ bibentry(
family = "Ovadek",
email = "michal.ovadek@gmail.com"),
year = "2023",
note = "R package version 0.2.0",
note = "R package version 0.2.1",
url = "https://michalovadek.github.io/nmfbin/",
header = "To cite nmfbin in publications use:"
)
34 changes: 10 additions & 24 deletions man/nmfbin.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 11 additions & 5 deletions vignettes/introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,28 @@ knitr::opts_chunk$set(
)
```

The main function `nmfbin()` operates on matrices like so:
The main function `nmfbin()` operates on binary matrices like so:

```{r setup}
library(nmfbin)
# Create a binary matrix for demonstration
X <- matrix(sample(c(0, 1), 100, replace=TRUE), ncol=10)
X <- matrix(sample(c(0, 1), 100, replace = TRUE), ncol = 10)
# Perform NMF
results <- nmfbin(X, k=3, optimizer = "mur", init = "nndsvd")
results <- nmfbin(X, k = 3, optimizer = "mur", init = "nndsvd", loss_fun = "logloss", max_iter = 500)
```

The results include the final loss:

```{r measures}
```{r finalloss}
print(results$final_loss)
```

We can also easily plot the optimization process.

```{r convergence}
plot(results$convergence,
xlab = "Iteration",
ylab = "Negative log-likelihood loss")
```

0 comments on commit 3ca8b3c

Please sign in to comment.