Installation docs for Histograph
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md

README.md

Histograph Installation

This document explains how to install Histograph v0.5 and configure all the essential components Histograph needs to run:

  1. Install Node.js and NPM
  2. Install Neo4j
  3. Install Elasticsearch
  4. Install Redis
  5. Install Histograph
  6. Create and import data files

Note: if you want to install the latest version instead of v0.5, you can remove the --branch v0.5.0 command line argument for all git clone commands.

For setting up a production environment on Amazon Web Services, please see the histograph/aws repository.

Node.js and NPM

With Homebrew:

brew install node

On a Debian or Ubuntu machine:

sudo apt-get install -y nodejs

Neo4j

Installation

Download and install Neo4j, or use your favorite package manager. Or Homebrew:

brew install neo4j

Afterwards, you can start Neo4j by running:

neo4j start

On a Debian or Ubuntu system, Neo4j can be started with the service command:

sudo service neo4j-service start

You can check if Neo4j is installed properly by going to http://localhost:7474.

It is also necessary to manually create a unique constraint and index, by running the following query in Neo4j Cypher console (via http://localhost:7474):

CREATE CONSTRAINT ON (n:_) ASSERT n.id IS UNIQUE

Histograph Neo4j plugin

Histograph depends on a server plugin for some of its graph queries. Before downloading and building the plugin, we need to tell Neo4j to create a /histograph endpoint. Open neo4j-server.properties, and add the following line:

org.neo4j.server.thirdparty_jaxrs_classes=org.waag.histograph.plugins=/histograph

Afterwards, you can install this plugin like this:

git clone --branch v0.5.0 https://github.com/histograph/neo4j-plugin.git
cd neo4j-plugin
./install.sh

This script is for MacOS, on other systems, run mvn package yourself to build the Neo4j plugin, copy the resulting JAR file to Neo4j's plugin directory, and restart Neo4j.

In a Debian install, the plugin directory is located at /usr/share/neo4j.

Elasticsearch

Installation

Install Elasticsearch. With Homebrew, this is easy:

brew install elasticsearch

After installation type brew info elasticsearch to see how you can start Elasticsearch. You can check if Elasticsearch is installed properly by pointing your browser to http://localhost:9200.

Configuration

Add the following lines to elasticsearch.yml:

index.analysis.analyzer.lowercase:
  filter: lowercase
  tokenizer: keyword

Redis

With Homebrew brew install redis, redis.io otherwise.

After installation type brew info redis to see how you can start Redis. You can check if Redis is installed properly by running:

redis-cli

Histograph

Configuration

All Histograph components depend on the histograph-config module, which specifies a set of (overridable) default options. However, some options must always be specified manually: histograph-config loads the default configuration from histograph.default.yml and merges this with a required user-specified configuration file. You can specify the location of your own configuration file in two ways:

  1. Start the Histograph module with the argument --config path/to/histograph/config.yml
  2. Set the HISTOGRAPH_CONFIG environment variable to the path of the configuration file:
export HISTOGRAPH_CONFIG=/path/to/histograph/config.yml

This configuration file should at least specify the following options:

api:
  dataDir: /var/histograph/data   # Directory where API stores data files.
  admin:
    name: histograph              # Default Histograph user, is created
    password: passw🚜rd           # when starting API the first time.

neo4j:
  user: neo4j                     # Neo4j authentication (leave empty when
  password: password              # running Neo4j without authentication)

import:
  dirs:
    - ../data                     # List of directories containing Histograph
    - ...                         # datasets - used by import tool

Please see the histograph-config repository on GitHub to see the default options specified by histograph.default.yml.

Core

Histograph Core reads messages from Redis, and syncs Neo4j and Elasticsearch.

git clone --branch v0.5.0 https://github.com/histograph/core.git
cd core
npm install
node index.js

You can specify the location of your configuration file by specifying its location using the --config argument, or by setting the HISTOGRAPH_CONFIG environment variable.

API

Histograph API exposes a search API, as well as an API to upload and download datasets. The search API reads from Elasticsearch and Neo4j; the dataset API allows users to upload datasets, reads NDJSON files and writes messages to the Redis queue.

git clone --branch v0.5.0 https://github.com/histograph/api.git
cd api
npm install
node index.js

The API can also be started with the --config command line argument:

node index.js --config /path/to/histograph/config.yml

Or, start the API with forever:

forever start -a --uid "api" index.js --prod --config /path/to/histograph/config.yml

Afterwards, the API will be available on http://localhost:3001.

Download or create NDJSON files

A Histograph dataset consists of a directory containing three files: two NDJSON files containing the dataset's PITs and relations, and a JSON file containing dataset metadata.

To use Histograph Data to download and compile a default set of Histograph NDJSON files, do the following:

git clone https://github.com/histograph/data.git
cd data
npm install
node index.js tgn geonames

Import data

With histograph-import you can upload local Histograph datasets to a remote (or local) instance of Histograph API. The import tool looks inside all directories specified by config.import.dirs. Datasets must follow the Histograph dataset naming convention:

  • One dataset per directory;
  • Each directory contains must contain a dataset JSON file, and a PITs or relations NDJSON file (or both):
    • dataset1/dataset1.dataset.json
    • dataset1/dataset1.pits.ndjson
    • dataset1/dataset1.relations.ndjson

For example, the GeoNames dataset looks like this:

  • geonames/geonames.dataset.json
  • geonames/geonames.pits.ndjson
  • geonames/geonames.relations.ndjson

To download and install histograph-import, do the following:

npm install -g histograph-import@0.5.0

Without specifying one or more datasets as command line arguments, running histograph-import will list all available datasets.

Custom ontology and schemas

By default, Histograph's default configuration contains a set of types and relations. You can overwrite these by precifying your own types and relations in your user configuration file.