Skip to content

Sage-Bionetworks/synapsegraphdb

Repository files navigation

Introduction

Synapse provides a means of recording provenance as a graph, thus enabling a formal way of documenting work performed and the ability to assign credit to a specific Synapse user for performing it. However, the current implementation does not provide mechanisms to search or discover structures in the provenance graph across different activities.

This repository provides the mechanisms for loading Synapse provenance information into a graph database, which allows data to be organized such that relationships are prioritized. Those relationships can be exploited through queries that consider the nodes and the connections between them. By loading this information regarding into a graph database, users are empowered with a flexible means of tracking, searching, and visualizing provenance.

Here, we use the Neo4j graph database.

For all questions, suggestions, or inquiries, please open an issue.

Installation

This repository contains a Python requirements.txt file with a list of packages to be installed using pip. To install these dependencies, use pip install -r requirement.txt.

Download Neo4j for free from https://neo4j.com/download/, follow their online instructions to access the Neo4j browser.

Users must have Neo4j installed on a local or remote machine with their login information contained in a json file as follows:

{
    "machine": “your-machine”,
    "username": “your-username”,
    "password": “your-password”
}

Users must also have an active Synapse account.

Usage

The scripts in this repository are used to load data from any Synapse project to your graph database.

  • activities2Graph.py is a wrapper that allows the user to input a list of Synapse IDs for any given project or projects; and sequentially retrieves information on all entities, activities, and their provenance, creates a json file containing this information, and then loads this data directly to your Neo4j database.
  • load2Neo4j.py is a script that takes the json file outputted from running convertActivities2Graph.py. The data contained in the json file is loaded to your Neo4j database.
  • convertSynapse2Graph.py is a script to sparingly be used for uploading file entities and activities from all projects in Synapse. The output is a json file that can be uploaded to your local or remote Neo4j repository using load2Neo4j.py. This script is a modification of this gist.

See examples of useful cypher queries here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages