Skip to content

A Clojure wrapper around a Java implementation of Word2Vec.

License

Notifications You must be signed in to change notification settings

shark8me/clojure-word2vec

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clojure-word2vec

The word2vec tool by Mikolov et al enables us to create word vectors from a dataset containing text data. Unlike a binary present/absent representation used by a bag-of-words, these word vectors can be used to compare 2 words and see if they are related.

This is a Clojure wrapper of Java implementation of word2vec [available here] (https://github.com/medallia/Word2VecJava).

Installation

To include word2vec, add the following to your :dependencies section of project.clj

[Clojars Project]

Usage

First import clojure-word2vec.core into your namespace

(ns clojure-word2vec.examples
  (:require [clojure-word2vec.core :refer :all]
            [clojure.java.io :as io]))

Download a text corpus and place it in the resources folder. Here we'll download James Joyce's Ulysses from Project Gutenberg.

(def data
  (create-input-format "path/to/ulysses.txt"))

Create the model and train it, using the default hyperparameters

(def model (word2vec data))

The hyper parameters can be specified as arguments to word2vec.

(def model (word2vec data :window-size 15)

Find the closest words to a given word

(get-matches model "woman")

A longer introduction is available in the docs .

License

Copyright © 2015 Bridgei2i

Distributed under the Eclipse Public License version 1.0.

About

A Clojure wrapper around a Java implementation of Word2Vec.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Clojure 100.0%