Skip to content

roscopecoltran/word-embedding

 
 

Repository files navigation

Word Embedding in Golang

Build Status GoDoc Go Report Card

This is an implementation of word embedding (also referred to as word representation) models in Golang.

Details

Word embedding makes words' meaning, structure, and concept mapping into vector space (and low dimension). For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, it could calculate word meaning by arithmetic operations between vectors.

Features

Listed models for word embedding, and checked it already implemented.

Models

  • Word2Vec
    • Distributed Representations of Words and Phrases and their Compositionality [pdf]
  • GloVe
    • GloVe: Global Vectors for Word Representation [pdf]
  • SPPMI-SVD
    • Neural Word Embedding as Implicit Matrix Factorization [pdf]

Installation

$ go get -u github.com/roscopecoltran/word-embedding
$ bin/word-embedding -h

Demo

Downloading text8 corpus, and training by Skip-Gram with negative sampling.

$ sh demo.sh

Usage

The tools embedding words into vector space

Usage:
  word-embedding [flags]
  word-embedding [command]

Available Commands:
  sim         Estimate the similarity between words
  word2vec    Embed words using word2vec

File I/O

  • Input
    • Given a text is composed of one-sentence per one-line, ideally.
  • Output
    • Output a file is like libsvm format:
    <word> <index1>:<value1> <index2>:<value2> ...
    

References

  • Just see it for more deep comprehension:
    • Improving Distributional Similarity with Lessons Learned from Word Embeddings [pdf]
    • Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors [pdf]

About

Word2Vec, GloVe in Golang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Go 97.5%
  • Makefile 1.9%
  • Shell 0.6%