Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
 
 
man
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Travis-CI Build Status codecov.io Buy Me A Coffee

GloveR


The GloveR package is an R wrapper for the Global Vectors for Word Representation (GloVe). GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. For more information consult : Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. COPYRIGHTS file and LICENSE can be found in the inst folder of the R package.


This R package has some limitations:

  • it works only on a unix OS
  • the data file should be big enough for the package-function Glove to work properly

To install the package from Github use the install_github function of the devtools package,

devtools::install_github('mlampros/GloveR')


Use the following link to report bugs/issues (for the R wrapper),

https://github.com/mlampros/GloveR/issues


Example usage


# example input data ---> 'dat.txt'



library(GloveR)


#-----------------------------
# vocabulary count computation
#-----------------------------


res = vocabulary_counts(train_data = '/data_GloveR/dat.txt', MAX_vocab = 0,

                        MIN_count = 5, output_vocabulary = '/data_GloveR/VOCAB.txt', 
                        
                        trace = TRUE)
                        

               
               
#-------------------------
# cooccurrence statistics
#-------------------------


co_mat = cooccurrence_statistics(train_data = '/data_GloveR/dat.txt', vocab_input = '/data_GloveR/VOCAB.txt',
                                  
                                 output_cooccurences = '/data_GloveR/COOCUR.bin', symmetric_both = TRUE, 
                                 
                                 context_words = 15, memory_gb = 4.0, MAX_product = 0, overflowLength = 0, 
                                 
                                 trace = TRUE)




#---------------------------
# shuffling of cooccurrences
#---------------------------


shfl = shuffle_cooccurrences(input_cooccurences = '/data_GloveR/COOCUR.bin',

                             output_cooccurences = '/data_GloveR/COOCUR_output.bin',

                             memory_gb = 4.0, arraySize = 0, trace = TRUE)




#---------------------------------------
# Global Vectors for Word Representation
#---------------------------------------


gl = Glove(input_cooccurences = '/data_GloveR/COOCUR_output.bin',

           output_vectors = '/data_GloveR/vectors',

           vocab_input = '/data_GloveR/VOCAB.txt',

           model_output = 2, iter_num = 5, learn_rate = 0.05, 
           
           save_squared_grads_file = NULL, alpha_weight = 0.75, 
           
           cutoff = 10, binary_output = 0, vectorSize = 50, threads = 6, 
           
           trace = TRUE)


More information about the parameters of each function can be found in the package documentation.


About

Global Vectors for Word Representation

Topics

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.