LOTUS: Linked Open Text Unleashed
The indexing scripts of LOTUS, an award-winning full-text index to the LOD Laundromat, and the largest available LOD index today.
LOTUS was developed at VU University Amsterdam, as a collaboration between CLTL and the Knowledge Representation & Reasoning departments.
How to Create LOTUS index
This procedure roughly stays the same, but until we clean the instructions, please get in touch if you are interested in creating your own LOTUS index.
- Install Elastic Search & start it as a daemon: ./bin/elasticsearch -d
- Install Marvel monitoring tool (optional, recommended for monitoring). Start the instance.
- Clone the LOTUS_Indexer repo
- Setup server settings (index settings, java memory, # open files)
- export ES_HEAP_SIZE="16096M" (Java memory available -> 16GB)
- export MAX_OPEN_FILES="41000" (Increase the maximum amount of open files in the file system)
- Index settings (see & run file create_index.sh) - Make sure the settings here fit your system. Setup the amount of replicas and shards accordingly. In our current setup we use 1 replica and 10 shards (this creates20 shards in total = 10 for the index + 10 for the replica)
- Edit the config file for some general bulk settings (some may overlap with the index creation settings)
- Run indexer
- Set refresh interval to default
- Setup incrementality (make sure incremental_index.sh works)
- Setup URI rooting if needed
- On 9/9/2015, LOTUS v1.0 contained 5,319,878,204 (5,32B), weighing in total 485.5 GB.
- LOTUS v1.2 has size of 631GB.
Video from ESWC 2016's presentation on LOTUS http://videolectures.net/eswc2016_ilievski_linked_data/
Research paper from ESWC 2016 http://link.springer.com/chapter/10.1007\%2F978-3-319-34129-3_29
Slides from ESWC 2016's presentation on LOTUS http://www.slideshare.net/FilipIlievski1/lotus-adaptive-text-search-for-big-linked-data
Workshop paper from the ISWC 2015 COLD Workshop http://ceur-ws.org/Vol-1426/paper-06.pdf
The LOTUS Semantic Search engine was awarded the 2nd place in the European Linked Open Data Competition 2016 (http://2016.semantics.cc/eldc).
Filip Ilievski (firstname.lastname@example.org)