GitHub - liquidinvestigations/hoover-search: Backend for the search engine service in Liquid Investigations.

Hoover is a search tool for large collections of documents. It glues together proven open-source technologies like elasticsearch and Apache Tika to aid the work of investigative journalists.

Searching is done through a user-friendly web interface that leverages Lucene's rich query syntax. Hoover also provides an API to run queries using the elasticsearch query DSL.

Installation

Use Liquid Investigations

Development

There is a test suite; run it with ./run testsuite on the hoover-search container.

Running in production

Waitress is installed as part of the dependencies. It's a production-quality threaded wsgi server. Pick a port number, say 8888, and run it like this - it doesn't daemonize so you can start it from supervisor or another modern daemon manager:

./run server --host=127.0.0.1 --port=8888

Then you probably want to set up a reverse proxy in front of the app. Here's the minimal nginx config:

location / {
  proxy_pass http://localhost:8888;
  proxy_set_header Host $host;
  proxy_set_header X-Forwarded-Proto $scheme;
}

Configuration

To customize hoover's behaviour you can set the following Django settings in hoover/site/settings/local.py:

HOOVER_HYPOTHESIS_EMBED_URL: The URL to embed the Hypothesis client, e.g. https://hypothes.is/embed.js

Snoop and external collections

For a large dataset, it's not practical to upload files through the admin UI, so you can use hoover-snoop. It's a tool for pre-processing a collection, extracting metadata from emails and documents, and accessing the contents of archives and email attachments. Snoop comes as a standalone Django app, it listens on an HTTP port where it serves document previews and raw documents, and it handles indexing of documents in elasticsearch by itself.

To use it with hoover-search, first set up the snoop service, analyze the data, send it to elasticsearch, then go back to hoover-snoop and create a new collection of type External with the following options:

{
  "documents": "http://localhost:8001/doc",
  "renderDocument": true
}

The documents URL is composed of the URL of hoover-snoop (http://localhost:8001 in this example) followed by /doc.

renderDocument tells hoover-search to use the new doc.html view from hoover-ui to render the document preview pages. If you're not using hoover-ui then omit this flag.

Run tests locally

Install the drone CLI binary from their website onto your PATH. Install Docker CE, latest version.

Then, run ./run-tests with arguments you'd normally pass to py.test, like this:

./run-tests -vvv -x -k ratelimits

During the test a docker-setup directory will be created. Make sure to delete it after running the tests with sudo rm -r docker-setup.

Name		Name	Last commit message	Last commit date
Latest commit History 681 Commits
.github		.github
hoover		hoover
testsuite		testsuite
.dockerignore		.dockerignore
.drone.yml		.drone.yml
.env-tracing-example		.env-tracing-example
.flake8		.flake8
.gitignore		.gitignore
CHANGES.md		CHANGES.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Readme.md		Readme.md
gunicorn.conf.py		gunicorn.conf.py
manage.py		manage.py
migrate		migrate
pytest.ini		pytest.ini
run		run
run-tests		run-tests
runserver		runserver

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Development

Running in production

Configuration

Snoop and external collections

Run tests locally

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

liquidinvestigations/hoover-search

Folders and files

Latest commit

History

Repository files navigation

Installation

Development

Running in production

Configuration

Snoop and external collections

Run tests locally

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages