Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/
Python JavaScript CSS C++ C Makefile
Clone or download
mjluot Merge pull request #15 from frankier/fix-cython-build
Fix setup.py for new versions of Cython
Latest commit 4438f14 Apr 11, 2018
Permalink
Failed to load latest commit information.
setlib @ 11c224c
webapi
.gitignore
.gitmodules
LICENSE
Makefile
README.md
Readme.md~
api_query.py
bracketed2dsearch.py
build_index.py
compile_ext.py
db_util.pxd
db_util.pyx
example_en.conllu
exp_parser_test.py
lex.py
parsubj.pyx
pseudocode_ob_3.py
query.py
query_functions.cpp
query_functions.h
redone_expr.py
search_common.pxi
search_with_expr.py
setup.py
show_tree.py
test_search.py
trash_collector_d.py
tree.py
unit_test.data
unit_test_query.data
yacc.py

README.md

Requirements

The toolkit requires libsqlite3 development files, header files and static libraries for Python and Cython.

For Ubuntu, these are available as following packages:
libsqlite3-dev
python-dev
cython

The webUI requires python library flask and for uWSGI based deployment uwsgi & uwsgi-python plugin.

For Ubuntu, these are available as:
uwsgi
uwsgi-python-plugin
python-flask

Installation

git clone https://github.com/fginter/dep_search.git   
cd dep_search
git submodule init   
git submodule update   
make   

Command line usage

Indexing data

The data needs to be indexed before querying. Data is stored as sqlite databeses and the data is expected to be to be in conllu-format.

The data will be indexed by build_index.py which expects the conllu data in standard input and creates the required databases.

The following command will index the first 100000 trees from a conllu file fi-ud-train.conllu and save it into a folder fi.data

cat ../UD_Finnish/fi-ud-train.conllu | python build_index.py --max 100000 -d fi.data  

Querying the data

The data can be queried in command line using using query.py

The following command will query perform a query '_ <nsubj _' of the trees indexed in database(s) located in folder fi-data, the result will be outputted in standard output in conll-u format. As --max argument is set only the first 50 hits will be returned. Setting --max 0 will remove the restrictions.

python query.py '_ <nsubj' --max 50 -d './fi-data/*.db'  

Web Interface

The web interface of dep_search has two components. An API which is part of the dep_search codebase (webapi directory), and a browseable web interface which can be tested live at http://bionlp-www.utu.fi/dep_search. The code for the web interface is a separate project released at https://github.com/fginter/dep_search_serve.

The instructions for setting everything up are here: https://fginter.github.io/dep_search/

Query Language

Query language is described in detail at: http://bionlp.utu.fi/searchexpressions-new.html

Citations

If you use dep_search in your research, please cite papers:

J. Luotolahti & J. Kanerva & S. Pyysalo & F. Ginter. SETS: Scalable and Efficient Tree Search in Dependency Graphs. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2015

J. Luotolahti & J. Kanerva & F. Ginter. Dep_search: Efficient Search Tool for Large Dependency Parsebanks. Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa). 2017