Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
piskvorky committed Dec 20, 2014
1 parent 025bc54 commit c2b9aed
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 5 deletions.
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -2,10 +2,10 @@ Evaluation of word embeddings
=============================

Code for the blog post evaluating word2vec, GloVe, SPPMI and SPPMI-SVD methods:
[Making sense of word2vec](http://radimrehurek.com/2014/12/making-sense-of-word2vec/)

[Making sense of word2vec](http://radimrehurek.com/2014/12/making-sense-of-word2vec/).

Run `run_all.sh` to run all experiments. Logs with results will be stored in the data directory.

To replicate my results from the blog article, download and preprocess Wikipedia using [this code](https://github.com/piskvorky/sim-shootout).

You can use your own corpus though (the corpus path is a parameter to `run_all.sh`).
1 change: 0 additions & 1 deletion run_all.sh
Expand Up @@ -13,7 +13,6 @@ fi
input_corpus=$1
questions=$2
outdir=$3
shift 3

mkdir -p $outdir 2> /dev/null

Expand Down
4 changes: 2 additions & 2 deletions run_embed.py
Expand Up @@ -181,7 +181,7 @@ def raw2ppmi(cooccur, word2id, k_shift=1.0):
cooccur /= marginal_word[:, None] # #(w, c) / #w
cooccur /= marginal_context # #(w, c) / (#w * #c)
cooccur *= marginal_word.sum() # #(w, c) * D / (#w * #c)
numpy.log(cooccur, out=cooccur) # log(#(w, c) * D / (#w * #c))
numpy.log(cooccur, out=cooccur) # PMI = log(#(w, c) * D / (#w * #c))

logger.info("shifting PMI scores by log(k) with k=%s" % (k_shift, ))
cooccur -= numpy.log(k_shift) # shifted PMI = log(#(w, c) * D / (#w * #c)) - log(k)
Expand Down Expand Up @@ -278,7 +278,7 @@ def __init__(self, corpus, id2word, s_exponent=0.0):
cooccur = utils.unpickle(outf('glove_corpus'))
else:
logger.info("glove corpus matrix not found, creating")
cooccur = glove.Corpus()
cooccur = glove.Corpus(dictionary=word2id)
cooccur.fit(corpus(), window=WINDOW)
utils.pickle(cooccur, outf('glove_corpus'))
model = glove.Glove(no_components=DIM, learning_rate=0.05)
Expand Down

0 comments on commit c2b9aed

Please sign in to comment.