Skip to content

michalovadek/top2vecr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

top2vecr

top2vecr is an R implementation of top2vec, a topic modelling technique relying on jointly learned document and word embeddings.

The main idea is that documents found close to each other in the joint document-word vector space can be interpreted as topics. Words similar to these document clusters are used as topic descriptors. UMAP is used to reduce the dimensionality of the original vector space – as produced by doc2vec – and HDBSCAN is used to identify document clusters.

As opposed to the original Python implementation, this package does not yet support the use of pre-trained sentence encoders and transformers.

Development halted due to performance limitations in UMAP's R implementation

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("michalovadek/top2vecr")

About

An R implementation of top2vec, a topic modelling technique relying on jointly learned document and word embeddings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages