Skip to content
experimenting with elasticsearch features for vector fields
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
LICENSE
README.asciidoc
docker-compose.yml
mvnw
mvnw.cmd
pom.xml

README.asciidoc

Similarity search in word embeddings with Elasticsearch

This project provides a shell (based on spring shell) to bulk import the GloVe word embeddings into Elasticsearch and to query for similar words using cosine similarity.

It uses Elasticsearch’s dense vector fieldtype and script score queries with the predefined cosineSimilarity function which is introduced in Elasticsearch 7.2.

Setup

  • clone and build Elasticsearch to get a current snapshot version docker image

  • run docker-compose up

  • run ./mvnw package -Pdownload to build the application with the download profile to get the glove word embeddings

  • run the built jar, then type "import" in the shell for an initial import of the words into elaticsearch

  • type similar --to <word> in the shell to see similar words

shell:>similar --to cat
{"word":"dog","score":1.9218005}
{"word":"rabbit","score":1.8487821}
{"word":"monkey","score":1.8041081}
{"word":"rat","score":1.7891964}
{"word":"cats","score":1.786527}
{"word":"snake","score":1.779891}
{"word":"dogs","score":1.7795815}
{"word":"pet","score":1.7792249}
{"word":"mouse","score":1.7731668}
{"word":"bite","score":1.77288}
{"word":"shark","score":1.7655175}
{"word":"puppy","score":1.76256}
{"word":"monster","score":1.7619764}
{"word":"spider","score":1.7521701}
{"word":"beast","score":1.7520056}
{"word":"crocodile","score":1.7498653}
{"word":"baby","score":1.7463862}
{"word":"pig","score":1.7445586}
{"word":"frog","score":1.7426511}
{"word":"bug","score":1.7365949}

Set the es.test profile to run the WordSearchServiceTest against the dockerized Elasticsearch instance.

You can’t perform that action at this time.