Skip to content

Commit

Permalink
Update Keras Example for (Parikh et al, 2016) implementation (#2803)
Browse files Browse the repository at this point in the history
* bug fixes in keras example

* created contributor agreement

* baseline for Parikh model

* initial version of parikh 2016 implemented

* tested asymmetric models

* fixed grevious error in normalization

* use standard SNLI test file

* begin to rework parikh example

* initial version of running example

* start to document the new version

* start to document the new version

* Update Decompositional Attention.ipynb

* fixed calls to similarity

* updated the README

* import sys package duh

* simplified indexing on mapping word to IDs

* stupid python indent error

* added code from tensorflow/tensorflow#3388 for tf bug workaround
  • Loading branch information
free-variation authored and honnibal committed Oct 1, 2018
1 parent 405a826 commit 9faea3f
Show file tree
Hide file tree
Showing 5 changed files with 1,250 additions and 361 deletions.
67 changes: 39 additions & 28 deletions examples/keras_parikh_entailment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@

# A decomposable attention model for Natural Language Inference
**by Matthew Honnibal, [@honnibal](https://github.com/honnibal)**

> ⚠️ **IMPORTANT NOTE:** This example is currently only compatible with spaCy
> v1.x. We're working on porting the example over to Keras v2.x and spaCy v2.x.
> See [#1445](https://github.com/explosion/spaCy/issues/1445) for details –
> contributions welcome!
**Updated for spaCy 2.0+ and Keras 2.2.2+ by John Stewart, [@free-variation](https://github.com/free-variation)**

This directory contains an implementation of the entailment prediction model described
by [Parikh et al. (2016)](https://arxiv.org/pdf/1606.01933.pdf). The model is notable
Expand All @@ -21,19 +17,25 @@ hook is installed to customise the `.similarity()` method of spaCy's `Doc`
and `Span` objects:

```python
def demo(model_dir):
nlp = spacy.load('en', path=model_dir,
create_pipeline=create_similarity_pipeline)
doc1 = nlp(u'Worst fries ever! Greasy and horrible...')
doc2 = nlp(u'The milkshakes are good. The fries are bad.')
print(doc1.similarity(doc2))
sent1a, sent1b = doc1.sents
print(sent1a.similarity(sent1b))
print(sent1a.similarity(doc2))
print(sent1b.similarity(doc2))
def demo(shape):
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))

doc1 = nlp(u'The king of France is bald.')
doc2 = nlp(u'France has no king.')

print("Sentence 1:", doc1)
print("Sentence 2:", doc2)

entailment_type, confidence = doc1.similarity(doc2)
print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")
```

Which gives the output `Entailment type: contradiction (Confidence: 0.60604566)`, showing that
the system has definite opinions about Betrand Russell's [famous conundrum](https://users.drew.edu/jlenz/br-on-denoting.html)!

I'm working on a blog post to explain Parikh et al.'s model in more detail.
A [notebook](https://github.com/free-variation/spaCy/blob/master/examples/notebooks/Decompositional%20Attention.ipynb) is available that briefly explains this implementation.
I think it is a very interesting example of the attention mechanism, which
I didn't understand very well before working through this paper. There are
lots of ways to extend the model.
Expand All @@ -43,7 +45,7 @@ lots of ways to extend the model.
| File | Description |
| --- | --- |
| `__main__.py` | The script that will be executed. Defines the CLI, the data reading, etc — all the boring stuff. |
| `spacy_hook.py` | Provides a class `SimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
| `spacy_hook.py` | Provides a class `KerasSimilarityShim` that lets you use an arbitrary function to customize spaCy's `doc.similarity()` method. Instead of the default average-of-vectors algorithm, when you call `doc1.similarity(doc2)`, you'll get the result of `your_model(doc1, doc2)`. |
| `keras_decomposable_attention.py` | Defines the neural network model. |

## Setting up
Expand All @@ -52,17 +54,13 @@ First, install [Keras](https://keras.io/), [spaCy](https://spacy.io) and the spa
English models (about 1GB of data):

```bash
pip install https://github.com/fchollet/keras/archive/1.2.2.zip
pip install keras
pip install spacy
python -m spacy.en.download
python -m spacy download en_vectors_web_lg
```

⚠️ **Important:** In order for the example to run, you'll need to install Keras from
the 1.2.2 release (and not via `pip install keras`). For more info on this, see
[#727](https://github.com/explosion/spaCy/issues/727).

You'll also want to get Keras working on your GPU. This will depend on your
set up, so you're mostly on your own for this step. If you're using AWS, try the
You'll also want to get Keras working on your GPU, and you will need a backend, such as TensorFlow or Theano.
This will depend on your set up, so you're mostly on your own for this step. If you're using AWS, try the
[NVidia AMI](https://aws.amazon.com/marketplace/pp/B00FYCDDTE). It made things pretty easy.

Once you've installed the dependencies, you can run a small preliminary test of
Expand All @@ -80,22 +78,35 @@ Finally, download the [Stanford Natural Language Inference corpus](http://nlp.st
## Running the example

You can run the `keras_parikh_entailment/` directory as a script, which executes the file
[`keras_parikh_entailment/__main__.py`](__main__.py). The first thing you'll want to do is train the model:
[`keras_parikh_entailment/__main__.py`](__main__.py). If you run the script without arguments
the usage is shown. Running it with `-h` explains the command line arguments.

The first thing you'll want to do is train the model:

```bash
python keras_parikh_entailment/ train <train_directory> <dev_directory>
python keras_parikh_entailment/ train -t <path to SNLI train JSON> -s <path to SNLI dev JSON>
```

Training takes about 300 epochs for full accuracy, and I haven't rerun the full
experiment since refactoring things to publish this example — please let me
know if I've broken something. You should get to at least 85% on the development data.
know if I've broken something. You should get to at least 85% on the development data even after 10-15 epochs.

The other two modes demonstrate run-time usage. I never like relying on the accuracy printed
by `.fit()` methods. I never really feel confident until I've run a new process that loads
the model and starts making predictions, without access to the gold labels. I've therefore
included an `evaluate` mode. Finally, there's also a little demo, which mostly exists to show
included an `evaluate` mode.

```bash
python keras_parikh_entailment/ evaluate -s <path to SNLI train JSON>
```

Finally, there's also a little demo, which mostly exists to show
you how run-time usage will eventually look.

```bash
python keras_parikh_entailment/ demo
```

## Getting updates

We should have the blog post explaining the model ready before the end of the week. To get
Expand Down
178 changes: 118 additions & 60 deletions examples/keras_parikh_entailment/__main__.py
Original file line number Diff line number Diff line change
@@ -1,139 +1,197 @@
from __future__ import division, unicode_literals, print_function
import spacy

import plac
from pathlib import Path
import numpy as np
import ujson as json
import numpy
from keras.utils.np_utils import to_categorical

from spacy_hook import get_embeddings, get_word_ids
from spacy_hook import create_similarity_pipeline
from keras.utils import to_categorical
import plac
import sys

from keras_decomposable_attention import build_model
from spacy_hook import get_embeddings, KerasSimilarityShim

try:
import cPickle as pickle
except ImportError:
import pickle

import spacy

# workaround for keras/tensorflow bug
# see https://github.com/tensorflow/tensorflow/issues/3388
import os
import importlib
from keras import backend as K

def set_keras_backend(backend):
if K.backend() != backend:
os.environ['KERAS_BACKEND'] = backend
importlib.reload(K)
assert K.backend() == backend
if backend == "tensorflow":
K.get_session().close()
cfg = K.tf.ConfigProto()
cfg.gpu_options.allow_growth = True
K.set_session(K.tf.Session(config=cfg))
K.clear_session()

set_keras_backend("tensorflow")


def train(train_loc, dev_loc, shape, settings):
train_texts1, train_texts2, train_labels = read_snli(train_loc)
dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)

print("Loading spaCy")
nlp = spacy.load('en')
nlp = spacy.load('en_vectors_web_lg')
assert nlp.path is not None

print("Processing texts...")
train_X = create_dataset(nlp, train_texts1, train_texts2, 100, shape[0])
dev_X = create_dataset(nlp, dev_texts1, dev_texts2, 100, shape[0])

print("Compiling network")
model = build_model(get_embeddings(nlp.vocab), shape, settings)
print("Processing texts...")
Xs = []
for texts in (train_texts1, train_texts2, dev_texts1, dev_texts2):
Xs.append(get_word_ids(list(nlp.pipe(texts, n_threads=20, batch_size=20000)),
max_length=shape[0],
rnn_encode=settings['gru_encode'],
tree_truncate=settings['tree_truncate']))
train_X1, train_X2, dev_X1, dev_X2 = Xs

print(settings)
model.fit(
[train_X1, train_X2],
train_X,
train_labels,
validation_data=([dev_X1, dev_X2], dev_labels),
nb_epoch=settings['nr_epoch'],
batch_size=settings['batch_size'])
validation_data = (dev_X, dev_labels),
epochs = settings['nr_epoch'],
batch_size = settings['batch_size'])

if not (nlp.path / 'similarity').exists():
(nlp.path / 'similarity').mkdir()
print("Saving to", nlp.path / 'similarity')
weights = model.get_weights()
# remove the embedding matrix. We can reconstruct it.
del weights[1]
with (nlp.path / 'similarity' / 'model').open('wb') as file_:
pickle.dump(weights[1:], file_)
with (nlp.path / 'similarity' / 'config.json').open('wb') as file_:
pickle.dump(weights, file_)
with (nlp.path / 'similarity' / 'config.json').open('w') as file_:
file_.write(model.to_json())


def evaluate(dev_loc):
def evaluate(dev_loc, shape):
dev_texts1, dev_texts2, dev_labels = read_snli(dev_loc)
nlp = spacy.load('en',
create_pipeline=create_similarity_pipeline)
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))

total = 0.
correct = 0.
for text1, text2, label in zip(dev_texts1, dev_texts2, dev_labels):
doc1 = nlp(text1)
doc2 = nlp(text2)
sim = doc1.similarity(doc2)
if sim.argmax() == label.argmax():
sim, _ = doc1.similarity(doc2)
if sim == KerasSimilarityShim.entailment_types[label.argmax()]:
correct += 1
total += 1
return correct, total


def demo():
nlp = spacy.load('en',
create_pipeline=create_similarity_pipeline)
doc1 = nlp(u'What were the best crime fiction books in 2016?')
doc2 = nlp(
u'What should I read that was published last year? I like crime stories.')
print(doc1)
print(doc2)
print("Similarity", doc1.similarity(doc2))
def demo(shape):
nlp = spacy.load('en_vectors_web_lg')
nlp.add_pipe(KerasSimilarityShim.load(nlp.path / 'similarity', nlp, shape[0]))

doc1 = nlp(u'The king of France is bald.')
doc2 = nlp(u'France has no king.')

print("Sentence 1:", doc1)
print("Sentence 2:", doc2)

entailment_type, confidence = doc1.similarity(doc2)
print("Entailment type:", entailment_type, "(Confidence:", confidence, ")")


LABELS = {'entailment': 0, 'contradiction': 1, 'neutral': 2}
def read_snli(path):
texts1 = []
texts2 = []
labels = []
with path.open() as file_:
with open(path, 'r') as file_:
for line in file_:
eg = json.loads(line)
label = eg['gold_label']
if label == '-':
if label == '-': # per Parikh, ignore - SNLI entries
continue
texts1.append(eg['sentence1'])
texts2.append(eg['sentence2'])
labels.append(LABELS[label])
return texts1, texts2, to_categorical(numpy.asarray(labels, dtype='int32'))
return texts1, texts2, to_categorical(np.asarray(labels, dtype='int32'))

def create_dataset(nlp, texts, hypotheses, num_unk, max_length):
sents = texts + hypotheses

sents_as_ids = []
for sent in sents:
doc = nlp(sent)
word_ids = []

for i, token in enumerate(doc):
# skip odd spaces from tokenizer
if token.has_vector and token.vector_norm == 0:
continue

if i > max_length:
break

if token.has_vector:
word_ids.append(token.rank + num_unk + 1)
else:
# if we don't have a vector, pick an OOV entry
word_ids.append(token.rank % num_unk + 1)

# there must be a simpler way of generating padded arrays from lists...
word_id_vec = np.zeros((max_length), dtype='int')
clipped_len = min(max_length, len(word_ids))
word_id_vec[:clipped_len] = word_ids[:clipped_len]
sents_as_ids.append(word_id_vec)


return [np.array(sents_as_ids[:len(texts)]), np.array(sents_as_ids[len(texts):])]


@plac.annotations(
mode=("Mode to execute", "positional", None, str, ["train", "evaluate", "demo"]),
train_loc=("Path to training data", "positional", None, Path),
dev_loc=("Path to development data", "positional", None, Path),
train_loc=("Path to training data", "option", "t", str),
dev_loc=("Path to development or test data", "option", "s", str),
max_length=("Length to truncate sentences", "option", "L", int),
nr_hidden=("Number of hidden units", "option", "H", int),
dropout=("Dropout level", "option", "d", float),
learn_rate=("Learning rate", "option", "e", float),
learn_rate=("Learning rate", "option", "r", float),
batch_size=("Batch size for neural network training", "option", "b", int),
nr_epoch=("Number of training epochs", "option", "i", int),
tree_truncate=("Truncate sentences by tree distance", "flag", "T", bool),
gru_encode=("Encode sentences with bidirectional GRU", "flag", "E", bool),
nr_epoch=("Number of training epochs", "option", "e", int),
entail_dir=("Direction of entailment", "option", "D", str, ["both", "left", "right"])
)
def main(mode, train_loc, dev_loc,
tree_truncate=False,
gru_encode=False,
max_length=100,
nr_hidden=100,
dropout=0.2,
learn_rate=0.001,
batch_size=100,
nr_epoch=5):
max_length = 50,
nr_hidden = 200,
dropout = 0.2,
learn_rate = 0.001,
batch_size = 1024,
nr_epoch = 10,
entail_dir="both"):

shape = (max_length, nr_hidden, 3)
settings = {
'lr': learn_rate,
'dropout': dropout,
'batch_size': batch_size,
'nr_epoch': nr_epoch,
'tree_truncate': tree_truncate,
'gru_encode': gru_encode
'entail_dir': entail_dir
}

if mode == 'train':
if train_loc == None or dev_loc == None:
print("Train mode requires paths to training and development data sets.")
sys.exit(1)
train(train_loc, dev_loc, shape, settings)
elif mode == 'evaluate':
correct, total = evaluate(dev_loc)
if dev_loc == None:
print("Evaluate mode requires paths to test data set.")
sys.exit(1)
correct, total = evaluate(dev_loc, shape)
print(correct, '/', total, correct / total)
else:
demo()
demo(shape)

if __name__ == '__main__':
plac.call(main)
Loading

0 comments on commit 9faea3f

Please sign in to comment.