Phrase Matcher Demo

This is a simple demo of the new lookup table feature in rasa_nlu. See the blog post accompanying this repository here

The goal is to show how lookup tables may improve entity extraction under certain conditions and also give some advice on using this feature effectively.

This repo contains two demos:

A simple restaurant example with very few training examples and only one entity.
A medium-sized company name extraction example with a few thousand examples and several entities.

Running the demo.

No installation is necessary although you must have rasa_nlu installed and version > 0.13.3 or above.

To run one or both of the demos:

python run_lookup.py <demo_key>

where <demo_key> is one of {food, company}. If <demo_key> is ommitted, it will run both of the demos.

Code Structure

data/ holds the training data and lookup tables for each of the demos.

models/ is where the models are persisted.

configs/ holds the rasa_nlu configs to do the baseline evaluation and the lookup table evaluation.

img/ stores plots and outputs from the runs.

Cleaning lookup tables

The script filter_lookup.py may be used to clean up lookup tables by removing any elements that match with a cross-list.

You can call this scripy by running

python filter_lookup.py <lookup_in> <cross_list> <lookup_out>

<lookup_in> is a lookup table with newline-separated elements.

<cross_list> is either a comma or newline-separated list of elements that you'd like to remove from <lookup_in>

<lookup_out> is the name of the file that you'd like to write the filtered list to.

Speed Testing

We include the directory speed_test/ for testing the speed of training as a function of the lookup table size.

This generates random lookup tables and times each component of the training and evaluation process. We use the company dataset data/company/company_train_lookup.py.

cd speed_test
python time_lookups.py

See speed_test/README.md for more details.

Ngrams

A simple ngrams tester is included and can be run by

python run_ngrams.py

This loads two lookup tables, data/company/pos_ngrams.txt & data/company/neg_ngrams.txt, each containing ngrams that were found to be influential to classifying phrases as company names. We then compute the f1 score as a function of random noise injected into the entities. The 'noise' value is the probability of a character flip in each character of each company entity in the test set.

This gives the following plot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

data

data

img

img

ngrams

ngrams

speed_test

speed_test

.gitignore

.gitignore

BLOG.md

BLOG.md

LICENSE.txt

LICENSE.txt

README.md

README.md

filter_lookup.py

filter_lookup.py

run_lookup.py

run_lookup.py

run_ngrams.py

run_ngrams.py

Repository files navigation

Phrase Matcher Demo

Running the demo.

Code Structure

Cleaning lookup tables

Speed Testing

Ngrams

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
data		data
img		img
ngrams		ngrams
speed_test		speed_test
.gitignore		.gitignore
BLOG.md		BLOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
filter_lookup.py		filter_lookup.py
run_lookup.py		run_lookup.py
run_ngrams.py		run_ngrams.py

License

sujeongHeo/rasa_lookup_demo

Folders and files

Latest commit

History

Repository files navigation

Phrase Matcher Demo

Running the demo.

Code Structure

Cleaning lookup tables

Speed Testing

Ngrams

About

Resources

License

Stars

Watchers

Forks

Languages