Entities Search Engine

⚠️ This repository has been archived as now the inventaire server itself takes care of keeping Elasticsearch entities and wikidata indexes updated

Entities Search Engine

Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities (see entities map), and keep those up-to-date, to answer questions like "give me all the humans with a name starting by xxx" in a super snappy way, typically for the needs of an autocomplete field.

For the Wikidata-only version see the archived branch #wikidata-subset-search-engine branch.

Summary

Setup
- Dependencies
- Start server
Data imports
- from scratch
- importing dumps
Query ElasticSearch
References
Donate
See Also
You may also like
License

Setup

see setup

Dependencies

see setup to install dependencies:

NodeJs >= v6.4
ElasticSearch (this repo was developed targeting ElasticSearch v2.4, but it should work with newer version with some minimal changes)
Nginx
Let's Encrypt
already installed in any good nix system: curl, gzip

Start server

see Wikidata and Inventaire per-entity import

Data imports

from scratch

add

Wikidata entities

3 ways to import Wikidata entities data into your ElasticSearch instance

Inventaire entities

update

To update any entity, simply re-add it, typically by posting its URI (ex: 'wd:Q180736' for a Wikidata entity, or 'inv:9cf5fbb9affab552cd4fb77712970141' for an Inventaire one) to the server

remove

To un-index entities that were mistakenly added, pass the path of a results json file, supposedly made of an array of ids. All those ids' documents will be deleted

index=wikidata
type=humans
ids_json_array=./queries/results/mistakenly_added_wikidata_humans_ids.json
npm run delete-from-results $index $type $ids_json_array

index=entities-prod
type=works
ids_json_array=./queries/results/mistakenly_added_inventaire_works_ids.json
npm run delete-from-results $index $type $ids_json_array

importing dumps

You can import dumps from inventaire.io prod elasticsearch instance:

# Download Wikidata dump
wget -c https://dumps.inventaire.io/wd/elasticsearch/wikidata_data.json.gz
gzip -d wikidata_data.json.gz
# elasticdump should have been installed when running `npm install`
# --limit: increasing batches size
./node_modules/.bin/elasticdump --input=./wikidata_data.json --output=http://localhost:9200/wikidata --limit 2000

# Same for Inventaire
wget -c https://dumps.inventaire.io/inv/elasticsearch/entities_data.json.gz
gzip -d entities_data.json.gz
./node_modules/.bin/elasticdump --input=./entities_data.json --output=http://localhost:9200/entities --limit 2000

Query ElasticSearch

curl "http://localhost:9200/wikidata/humans/_search?q=Victor%20Hugo"

References

ElasticSearch Search API

Donate

We are developing and maintaining tools to work with Wikidata from NodeJS, the browser, or simply the command line, with quality and ease of use at heart. Any donation will be interpreted as a "please keep going, your work is very much needed and awesome. PS: love". Donate

License

AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
bin		bin
config		config
docs		docs
lib		lib
nginx		nginx
queries/sparql		queries/sparql
scripts		scripts
server		server
test		test
.eslintrc		.eslintrc
.gitignore		.gitignore
README.md		README.md
SETUP.md		SETUP.md
install_elasticsearch		install_elasticsearch
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entities Search Engine

Summary

Setup

Dependencies

Start server

Data imports

from scratch

add

Wikidata entities

Inventaire entities

update

remove

importing dumps

Query ElasticSearch

References

Donate

See Also

You may also like

License

About

Releases

Packages

Contributors 2

Languages

inventaire/entities-search-engine

Folders and files

Latest commit

History

Repository files navigation

Entities Search Engine

Summary

Setup

Dependencies

Start server

Data imports

from scratch

add

Wikidata entities

Inventaire entities

update

remove

importing dumps

Query ElasticSearch

References

Donate

See Also

You may also like

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages