Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
measures/pairwise added check for zero length or all zero valued vectors to cosine simi… Jun 27, 2018
.travis.yml switched to coveralls Feb 9, 2018
GOPHER added Sep 4, 2017
Gophers.008.crop.png resized Mar 15, 2017
LICENSE Added Mar 31, 2017
README.md added LSH Forest LSH scheme Sep 14, 2018
dimreduction.go removed sign random projections Jan 25, 2018
dimreduction_test.go added tests for PCA Jan 26, 2018
doc.go updated documentation Mar 2, 2018
example_test.go update example_test and lda_test to reflect variadic stopword impleme… May 23, 2018
hashing.go added LSH Forest LSH scheme Sep 14, 2018
index.go added LSH Forest LSH scheme Sep 14, 2018
index_test.go added LSH Forest LSH scheme Sep 14, 2018
lda.go cache locality optimisations - ~20% processing time improvement Mar 18, 2018
lda_test.go update example_test and lda_test to reflect variadic stopword impleme… May 23, 2018
lsh.go added LSH Forest LSH scheme Sep 14, 2018
randomprojection.go updated kernel of Random Indexing and changed algorithm to use extra … Nov 29, 2018
randomprojection_test.go updated kernel of Random Indexing and changed algorithm to use extra … Nov 29, 2018
utils.go dealing with ColView() and RowView() method updates in sparse Sep 17, 2018
vectorisers.go updated documentation Sep 4, 2018
vectorisers_test.go removed PartialFit method from CountVectoriser Aug 28, 2018
weightings.go Moved conditional branch to outside of loop Sep 4, 2018
weightings_test.go gofmt -s Jan 3, 2018

README.md

Natural Language Processing

License: MIT GoDoc Build Status Go Report Card codecov Awesome Sourcegraph

nlp

Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for the package is the statistical semantics of plain-text documents supporting semantic analysis and retrieval of semantically similar documents.

Built upon the Gonum package for linear algebra and scientific computing with some inspiration taken from Python's scikit-learn and Gensim.

Check out the companion blog post or the Go documentation page for full usage and examples.


Features

Planned

  • Expanded persistence support
  • Stemming to treat words with common root as the same e.g. "go" and "going"
  • Clustering algorithms e.g. Heirachical, K-means, etc.
  • Classification algorithms e.g. SVM, KNN, random forest, etc.

References

  1. Rosario, Barbara. Latent Semantic Indexing: An overview. INFOSYS 240 Spring 2000
  2. Latent Semantic Analysis, a scholarpedia article on LSA written by Tom Landauer, one of the creators of LSA.
  3. Thomo, Alex. Latent Semantic Analysis (Tutorial).
  4. Latent Semantic Indexing. Standford NLP Course
  5. Charikar, Moses S. "Similarity Estimation Techniques from Rounding Algorithms" in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing - STOC ’02, 2002, p. 380.
  6. M. Bawa, T. Condie, and P. Ganesan, “LSH forest: self-tuning indexes for similarity search,” Proc. 14th Int. Conf. World Wide Web - WWW ’05, p. 651, 2005.
  7. A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” VLDB ’99 Proc. 25th Int. Conf. Very Large Data Bases, vol. 99, no. 1, pp. 518–529, 1999.
  8. Kanerva, Pentti, Kristoferson, Jan and Holst, Anders (2000). Random Indexing of Text Samples for Latent Semantic Analysis
  9. Rangan, Venkat. Discovery of Related Terms in a corpus using Reflective Random Indexing
  10. Vasuki, Vidya and Cohen, Trevor. Reflective random indexing for semi-automatic indexing of the biomedical literature
  11. QasemiZadeh, Behrang and Handschuh, Siegfried. Random Indexing Explained with High Probability
  12. Foulds, James; Boyles, Levi; Dubois, Christopher; Smyth, Padhraic; Welling, Max (2013). Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation