This document explains how to install Histograph v0.5 and configure all the essential components Histograph needs to run:
- Install Node.js and NPM
- Install Neo4j
- Install Elasticsearch
- Install Redis
- Install Histograph
- Create and import data files
Note: if you want to install the latest version instead of v0.5, you can remove the
--branch v0.5.0 command line argument for all
git clone commands.
For setting up a production environment on Amazon Web Services, please see the
Node.js and NPM
brew install node
On a Debian or Ubuntu machine:
sudo apt-get install -y nodejs
Download and install Neo4j, or use your favorite package manager. Or Homebrew:
brew install neo4j
Afterwards, you can start Neo4j by running:
On a Debian or Ubuntu system, Neo4j can be started with the
sudo service neo4j-service start
You can check if Neo4j is installed properly by going to http://localhost:7474.
It is also necessary to manually create a unique constraint and index, by running the following query in Neo4j Cypher console (via http://localhost:7474):
CREATE CONSTRAINT ON (n:_) ASSERT n.id IS UNIQUE
Histograph Neo4j plugin
Histograph depends on a server plugin for some of its graph queries. Before downloading and building the plugin, we need to tell Neo4j to create a
/histograph endpoint. Open
neo4j-server.properties, and add the following line:
Afterwards, you can install this plugin like this:
git clone --branch v0.5.0 https://github.com/histograph/neo4j-plugin.git cd neo4j-plugin ./install.sh
This script is for MacOS, on other systems, run
mvn package yourself to build the Neo4j plugin, copy the resulting JAR file to Neo4j's plugin directory, and restart Neo4j.
In a Debian install, the plugin directory is located at
Install Elasticsearch. With Homebrew, this is easy:
brew install elasticsearch
After installation type
brew info elasticsearch to see how you can start Elasticsearch. You can check if Elasticsearch is installed properly by pointing your browser to http://localhost:9200.
Add the following lines to
index.analysis.analyzer.lowercase: filter: lowercase tokenizer: keyword
brew install redis, redis.io otherwise.
After installation type
brew info redis to see how you can start Redis. You can check if Redis is installed properly by running:
All Histograph components depend on the histograph-config module, which specifies a set of (overridable) default options. However, some options must always be specified manually: histograph-config loads the default configuration from
histograph.default.yml and merges this with a required user-specified configuration file. You can specify the location of your own configuration file in two ways:
- Start the Histograph module with the argument
- Set the
HISTOGRAPH_CONFIGenvironment variable to the path of the configuration file:
This configuration file should at least specify the following options:
api: dataDir: /var/histograph/data # Directory where API stores data files. admin: name: histograph # Default Histograph user, is created password: passw🚜rd # when starting API the first time. neo4j: user: neo4j # Neo4j authentication (leave empty when password: password # running Neo4j without authentication) import: dirs: - ../data # List of directories containing Histograph - ... # datasets - used by import tool
Please see the histograph-config repository on GitHub to see the default options specified by
Histograph Core reads messages from Redis, and syncs Neo4j and Elasticsearch.
git clone --branch v0.5.0 https://github.com/histograph/core.git cd core npm install node index.js
You can specify the location of your configuration file by specifying its location using the
--config argument, or by setting the
HISTOGRAPH_CONFIG environment variable.
Histograph API exposes a search API, as well as an API to upload and download datasets. The search API reads from Elasticsearch and Neo4j; the dataset API allows users to upload datasets, reads NDJSON files and writes messages to the Redis queue.
git clone --branch v0.5.0 https://github.com/histograph/api.git cd api npm install node index.js
The API can also be started with the
--config command line argument:
node index.js --config /path/to/histograph/config.yml
Or, start the API with forever:
forever start -a --uid "api" index.js --prod --config /path/to/histograph/config.yml
Afterwards, the API will be available on http://localhost:3001.
Download or create NDJSON files
A Histograph dataset consists of a directory containing three files: two NDJSON files containing the dataset's PITs and relations, and a JSON file containing dataset metadata.
- Use Histograph Data to download and create NDJSON files from sources like the Getty Thesaurus of Geographic Names and GeoNames.
- Heritage & Location datasets
- Or, you can create NDJSON files yourself, from your own data.
To use Histograph Data to download and compile a default set of Histograph NDJSON files, do the following:
git clone https://github.com/histograph/data.git cd data npm install node index.js tgn geonames
With histograph-import you can upload local Histograph datasets to a remote (or local) instance of Histograph API. The import tool looks inside all directories specified by
config.import.dirs. Datasets must follow the Histograph dataset naming convention:
- One dataset per directory;
- Each directory contains must contain a dataset JSON file, and a PITs or relations NDJSON file (or both):
For example, the GeoNames dataset looks like this:
To download and install histograph-import, do the following:
npm install -g firstname.lastname@example.org
Without specifying one or more datasets as command line arguments, running
histograph-import will list all available datasets.
Custom ontology and schemas
By default, Histograph's default configuration contains a set of types and relations. You can overwrite these by precifying your own types and relations in your user configuration file.