Generating gradients, exploring neighborhoods.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data Added data directory; probably won't include in public repo, but hand… May 6, 2017
databases This should fix the problems with sentencepiece dependency... we'll see! May 7, 2018
defs Adding original stas-semeniuta/textvae code May 4, 2017
nn Merge branch 'master' of github.com:robinsloan/sentencespace May 7, 2018
scripts This should fix the problems with sentencepiece dependency... we'll see! May 7, 2018
session Adding sample session with no sp dependency. May 7, 2018
sp_models Adding some sentencepiece support files etc. Jan 17, 2018
.gitignore Adding sample session with no sp dependency. May 7, 2018
LICENSE Adding license Mar 4, 2018
ORIGINAL-README.md Mostly sentencepiece stuff in here, I believe Jan 17, 2018
README.md README update May 7, 2018
flask-command-reminder.txt Mostly sentencepiece stuff in here, I believe Jan 17, 2018
lm_charcnn.py Adding original stas-semeniuta/textvae code May 4, 2017
lm_vae.py Adding original stas-semeniuta/textvae code May 4, 2017
lm_vae_baseline.py Adding original stas-semeniuta/textvae code May 4, 2017
lm_vae_lstm.py Adding original stas-semeniuta/textvae code May 4, 2017
lm_vae_sample.py The server works! Plus better session saving -- a bunch of stuff. Jun 17, 2017
serve.sh This should fix the problems with sentencepiece dependency... we'll see! May 7, 2018
textproject_server.py
textproject_vae_charlevel.py Lots of little changes, plus a README upgrade Mar 1, 2018
train.sh Added sample training script Mar 4, 2018
vae.py Tinkering with the server Jun 19, 2017

README.md

Welcome to sentence space

You can find an introduction to this project, with interactive demos, here.

This is a server designed to provide a couple of interesting artifacts. The core of it is a variational autoencoder that embeds sentences into a continuous space; as a result, you can select a point anywhere in that space and get a (more or less) coherent sentence back.

Once you've established this continuous sentence space, what can you get from it?

  1. Sentence gradients: smooth interpolations between two input sentences.
  2. Sentence neighborhoods: clouds of alternative sentences closely related to an input sentence.

These are very weird artifacts! If you try to write a sentence gradient by hand, you'll find it's very difficult. Is it useful? Possibly not. Is it interesting? Definitely!

Again, you'll find a ton more context and exploration in this post.

Running the server

This code isn't quite turnkey, but if you're willing to tinker, you should be able to train your own models and serve your own gradients, neighborhoods, and who-knows-what-else.

The requirements are:

  • Python 2.7
  • Flask
  • Numpy 1.12.1
  • Theano 0.9 (plus Nvidia's CUDA and cudnn)
  • Pandas 0.20.1
  • Matplotlib 2.0.2
  • sentencepiece (optional but nice to have)
  • wordfilter

One way to get started would be to use Anaconda:

conda create -n sentence-space python=2.7
source activate sentence-space
conda install flask
conda install numpy=1.12.1
conda install theano=0.9.0
conda install pandas=0.20.1
conda install matplotlib=2.0.2

pip install wordfilter
pip install sentencepiece

If you have those requirements installed, as well as CUDA and cudnn (which is A Whole Other Thing), it should be possible to run bash serve.sh and get a server running. If that's not the case, open an issue and let me know. I definitely want to streamline this over time, and improve this documentation as well.

The name of the trained model you want to serve is specified near the top of textproject_server.py. (Sorry it's not a command-line option; I just couldn't be bothered.) Try sample_no_sp_2 to start.

Once the server is running, the API is simple:

  • /gradient?s1=Your%20first%20sentence&s2=Your%20second%20sentence
  • /neighborhood?s1=Your%20sentence&mag=0.2

Both endpoints return a JSON array of results. The code is currently configured to provide seven sentences in each gradient or neighborhood, but you could make that three or 128.

Contributors

This project is forked from stas-semeniuta/textvae, which is the code for the paper "A Hybrid Convolutional Variational Autoencoder for Text Generation" by Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. I'm indebted to Semeniuta, et. al., for their skill and generosity. If I have tinkered slightly, it is because I stood on the shoulders of smart people.

I'm indebted also to @richardassar, whose improvements allow this server to provide results at interactive speeds.

You can find Semeniuta et. al.'s original README in (you guessed it) ORIGINAL-README.md.