Global Vectors for Word Representation
C R
Switch branches/tags
Nothing to show
Clone or download
Latest commit fbf315b Jan 6, 2017

README.md

Travis-CI Build Status codecov.io

GloveR


The GloveR package is an R wrapper for the Global Vectors for Word Representation (GloVe). GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. For more information consult : Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. COPYRIGHTS file and LICENSE can be found in the inst folder of the R package.


This R package has some limitations:

  • it works only on a unix OS
  • the data file should be big enough for the package-function Glove to work properly

To install the package from Github use the install_github function of the devtools package,

devtools::install_github('mlampros/GloveR')


Use the following link to report bugs/issues (for the R wrapper),

https://github.com/mlampros/GloveR/issues


Example usage


# example input data ---> 'dat.txt'



library(GloveR)


#-----------------------------
# vocabulary count computation
#-----------------------------


res = vocabulary_counts(train_data = '/data_GloveR/dat.txt', MAX_vocab = 0,

                        MIN_count = 5, output_vocabulary = '/data_GloveR/VOCAB.txt', 
                        
                        trace = TRUE)
                        

               
               
#-------------------------
# cooccurrence statistics
#-------------------------


co_mat = cooccurrence_statistics(train_data = '/data_GloveR/dat.txt', vocab_input = '/data_GloveR/VOCAB.txt',
                                  
                                 output_cooccurences = '/data_GloveR/COOCUR.bin', symmetric_both = TRUE, 
                                 
                                 context_words = 15, memory_gb = 4.0, MAX_product = 0, overflowLength = 0, 
                                 
                                 trace = TRUE)




#---------------------------
# shuffling of cooccurrences
#---------------------------


shfl = shuffle_cooccurrences(input_cooccurences = '/data_GloveR/COOCUR.bin',

                             output_cooccurences = '/data_GloveR/COOCUR_output.bin',

                             memory_gb = 4.0, arraySize = 0, trace = TRUE)




#---------------------------------------
# Global Vectors for Word Representation
#---------------------------------------


gl = Glove(input_cooccurences = '/data_GloveR/COOCUR_output.bin',

           output_vectors = '/data_GloveR/vectors',

           vocab_input = '/data_GloveR/VOCAB.txt',

           model_output = 2, iter_num = 5, learn_rate = 0.05, 
           
           save_squared_grads_file = NULL, alpha_weight = 0.75, 
           
           cutoff = 10, binary_output = 0, vectorSize = 50, threads = 6, 
           
           trace = TRUE)


More information about the parameters of each function can be found in the package documentation.