Document processing for investigations
Python JavaScript HTML CSS Shell PowerShell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs
gransk
utils
.coveragerc
.gitattributes
.gitignore
.pylintrc
.travis.yml
Dockerfile
LICENSE
README.md
config.yml
docker-compose.yml
docker-quickstart.ps1
docker-quickstart.sh
requirements.txt
run_tests.py
setup.py

README.md

Gransk - Document processing for investigations

A tool for when you have a bunch of documents to figure out of. Introduction to Gransk (YouTube)

Build Status Documentation Status Coverage Status

Gransk is an open source tool that aims to be a Swiss army knife of document processing and analysis. Its primary objective is to quikly provide users with insight to their documents during investigations. It includes a processing engine written in Python and a web interface. Under the hood it uses Apache Tika for content extraction, Elasticsearch for data indexing, and dfVFS to unpack disk images.

Quickstart

Using VirtualBox:
  1. Download Gransk VM: https://drive.google.com/uc?export=download&id=0B6iPjQOwe4MKOVhma2VhWmpWaEE
  2. Open VirtualBox and click "File" -> "Import appliance". Choose downloaded VM.
  3. Double click on the imported machine. (Hold shift to run in background)
  4. After a couple of seconds. open a web browser and go to http://localhost:8084
Using Docker on Linux/Mac:
curl -o docker-quickstart.sh -X GET https://raw.githubusercontent.com/pcbje/gransk/master/docker-quickstart.sh
sh ./docker-quickstart.sh
Using Docker on Windows:

Type the following command in to powershell

Invoke-WebRequest https://raw.githubusercontent.com/pcbje/gransk/master/docker-quickstart.ps1 -Outfile docker-quickstart.ps1
powershell -ExecutionPolicy ByPass -File docker-quickstart.ps1

Features

  • Unpack disk images with dfVFS and archives with 7zip
  • Extract metadata and text from documents with Apache Tika
  • Named entity recognition with Polyglot (NER) and Namefinder
  • Entity extraction with regular expressions
  • Simple data statistics
  • Search and explore data with Elasticsearch
  • +++

Processing tested on Python 2.7 and 3.4. The web interface requires a modern web browser.

Processing overview

Development

Subscribers

Subscribers are registered in config.yml.

import gransk.core.abstract_subscriber as abstract_subscriber
import gransk.core.helper as helper


class Subscriber(abstract_subscriber.Subscriber):
  CONSUMES = [helper.PROCESS_TEXT]

  def consume(self, doc, text):
    doc.meta['num_chars'] = len(text)

Programmatically adding files

import io

import gransk.api as api
import gransk.core.document as document

gransk = api.API(u'config.yml')

doc = document.get_document(u'filename-or-path.txt')
doc.tag = u'demo'

content = io.BytesIO(b'Data buffer')

gransk.add_file(doc, content)

gransk.stop()

Conventions, code quality and documentation

Processing

autopep8 --indent-size 2 --max-line-length 80 --in-place --recursive --aggressive gransk
py.test --cov-report html --cov gransk gransk
pylint --rcfile=.pylintrc gransk

Web interface

cd gransk/web/tests && npm install && cd ../../../
gransk/web/tests/node_modules/.bin/karma start gransk/web/tests/cover.conf.js
gransk/web/tests/node_modules/.bin/karma start gransk/web/tests/watch.conf.js
jshint gransk/web/static/modules/* gransk/web/tests/spec/modules/*

Continuous integration

https://travis-ci.org/pcbje/gransk

Test build:

docker build -t gransk-prebuilt -f utils/local-travis/Dockerfile .
docker run -v $PWD:/app --entrypoint=python -it gransk-prebuilt utils/local-travis/mock-travis.py

Documentation

http://gransk.readthedocs.io

Generate docs locally:

pip install sphinx sphinx_rtd_theme
sphinx-build -c docs -b html docs/ local_data/build

Building

git clone https://github.com/pcbje/gransk && cd gransk
virtualenv pyenv
source pyenv/bin/activate
pip install -r utils/dfvfs-requirements.txt
pip install -r requirements.txt
python setup.py install
python setup.py download

Processing from command line

python -m gransk.boot.run /path/to/data
python -m gransk.boot.run --help

Using Docker:

docker run -v /path/to/data:/data --entrypoint=python -i -t pcbje/gransk -m gransk.boot.run --workers=4 /data

Starting web UI

python -m gransk.boot.web

Using Docker:

docker run -p 8084:8084 --entrypoint=python -i -t pcbje/gransk -m gransk.boot.ui --host=0.0.0.0

Searching

See es-auto-query.

Licenses

  • dfVFS: Apache License Version 2.0
  • Apache Tika: Apache License Version 2.0
  • 7zip: GNU LGPL
  • Elasticsearch: Apache License Version 2.0
  • Polyglot: GNU GENERAL PUBLIC LICENSE
  • Flask: BSD

Uh, "gransk"?

"Gransk" is imperative form of "investigate" in Norwegian.