top2vecr

top2vecr is an R implementation of top2vec, a topic modelling technique relying on jointly learned document and word embeddings.

The main idea is that documents found close to each other in the joint document-word vector space can be interpreted as topics. Words similar to these document clusters are used as topic descriptors. UMAP is used to reduce the dimensionality of the original vector space – as produced by doc2vec – and HDBSCAN is used to identify document clusters.

As opposed to the original Python implementation, this package does not yet support the use of pre-trained sentence encoders and transformers.

Development halted due to performance limitations in UMAP's R implementation

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("michalovadek/top2vecr")

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

man

man

tests

tests

.Rbuildignore

.Rbuildignore

.gitignore

.gitignore

DESCRIPTION

DESCRIPTION

NAMESPACE

NAMESPACE

README.Rmd

README.Rmd

README.md

README.md

Repository files navigation

top2vecr

Installation

About

Releases

Packages

Languages

michalovadek/top2vecr

Folders and files

Latest commit

History

Repository files navigation

top2vecr

Installation

About

Resources

Stars

Watchers

Forks

Languages