# Running JanusGraph locally

In [1]:
from thoth.lab import packages_info

packages_info()

2019-03-01 11:36:17,364 [5616] INFO     root:126: Logging to a Sentry instance is turned off
2019-03-01 11:36:17,365 [5616] INFO     root:148: Logging to rsyslog endpoint is turned off


Unnamed: 0,package,version,importable
0,thoth.adviser,0.3.0,True
1,thoth.analyzer,0.1.2,True
2,thoth.common,0.7.1,True
3,thoth.lab,0.0.3,True
4,thoth.package_extract,1.0.1,True
5,thoth.python,0.4.6,True
6,thoth.solver,1.1.0,True
7,thoth.storages,0.9.6,True
8,thoth.worker,0.0.2,True


In this notebook, we will demonstrate how to run a JanusGraph instance locally and use Jupyter notebooks (such as this one) to access and query data stored in JanusGraph as well as on Ceph (we will use test environment for this purpose).

Before you run this repo, reach out to [README file of janusgraph-thoth-config repository](https://github.com/thoth-station/janusgraph-thoth-config#running-janusgraph-instance-locally) which states how to run JanusGraph instance locally.

Basically the only thing you need to do is to clone the repo and run (in the root of the git repository):

```console
./local.sh all
```

After some time you should have a running instance of JanusGraph on your workstation with schema and indexes configured (follow instructions in the README file for more info). This instance will be configured however empty (no data). Now, let's try to connect to this instance and run a query to verify its running:

In [2]:
from thoth.storages import GraphDatabase
from thoth.lab import GraphQueryResult as gqr

# Instantiate and connect the JanusGraph database
graph = GraphDatabase.create('localhost')
graph.connect()

graph.is_connected()

True

In [3]:
gqr(graph.g.V().count().next()).result

0

As you can see, there is exactly zero verteces. Let's sync some data into this instance:

In [4]:
from thoth.storages import sync_solver_documents

help(sync_solver_documents)

Help on function sync_solver_documents in module thoth.storages.sync:

sync_solver_documents(document_ids:list=None, force:bool=False, graceful:bool=False, graph:thoth.storages.graph.janusgraph.GraphDatabase=None) -> tuple
    Sync solver documents into graph.



The method showed above can sync solver documents (referenced by their document ids) into a JanusGraph instance. If no connected `GraphDatabase` adapter is provided to the function call, it will transparently pick JanusGraph configuration from environment and instantiate a `GraphDatabase` adapter to perform sync.

Now, let's get some ids of solver documents we would like to have present in the JanusGraph database. We will explicitly state we are interested in solver documents present in the `thoth-test-core` deployment.

In [5]:
from thoth.storages import SolverResultsStore

solver_store = SolverResultsStore(deployment_name='thoth-test-core')
solver_store.connect()
solver_store.is_connected()

True

The next step is to find some documents we are interested in. Note this can take a lot of time if there is present a lot of documents (each document has to be downloaded from a remote Ceph instance).

In [6]:
def get_document_for_package(package_name: str):
    """Get documents which correspond to package.

    The argument package_name is used as a prefix, so "tensor" also matches "tensorflow".
    """
    result = []
    for document_id, document in solver_store.iterate_results():
        if document['metadata']['arguments']['pypi']['requirements'].startswith(package_name):
            result.append(document_id)

    return result

In [7]:
solver_store.get_document_count()

3034

In [8]:
document_ids = get_document_for_package("tensorflow")

In [9]:
document_ids

['solver-fedora-28-py36-e61843206520991f']

In [10]:
sync_solver_documents(document_ids, graph=graph)

After the step above, the referenced solver documents by their ids in `document_ids` list will be synced into the JanusGraph. After the step above, you can verify it by querying JanusGraph instance.

In [11]:
help(graph.get_all_versions_python_package)

Help on method get_all_versions_python_package in module thoth.storages.graph.janusgraph:

get_all_versions_python_package(package_name:str, index_url:str=None, *, os_name:str=None, os_version:str=None, python_version:str=None, without_error:bool=False) -> List[tuple] method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Get all versions available for a Python package.



In [12]:
graph.get_all_versions_python_package("tensorflow")

[('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/rhel7.5/jemalloc/simple'),
 ('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/centos7/jemalloc/simple'),
 ('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/fedora28/jemalloc/simple'),
 ('1.9.0', 'https://pypi.org/simple'),
 ('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/fedora27/jemalloc/simple'),
 ('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/rhel7.5/cuda9.2+jemalloc/simple'),
 ('1.9.0',
  'https://tensorflow.pypi.thoth-station.ninja/index/fedora26/jemalloc/simple')]

You can find more function which perform sync in the `thoth.storages.sync` module. All of them return a tuple representing number of processed, synced, skipped and failed to sync documents:

In [13]:
from thoth.storages.sync import sync_inspection_documents
from thoth.storages.sync import sync_analysis_documents
from thoth.storages.sync import sync_solver_documents
from thoth.storages.sync import sync_adviser_documents
from thoth.storages.sync import sync_provenance_checker_documents

If you have a local copy on your harddisk or in memory, you can still sync documents into JanusGraph. However, you will need to use lower level methods provided by the `GraphDatabase` adapter, namely:

In [14]:
help(graph.sync_adviser_result)

Help on method sync_adviser_result in module thoth.storages.graph.janusgraph:

sync_adviser_result(document:dict) -> None method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Sync adviser result into graph database.



In [15]:
help(graph.sync_analysis_result)

Help on method sync_analysis_result in module thoth.storages.graph.janusgraph:

sync_analysis_result(document:dict) -> None method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Sync the given analysis result to the graph database.



In [16]:
help(graph.sync_inspection_result)

Help on method sync_inspection_result in module thoth.storages.graph.janusgraph:

sync_inspection_result(document) -> None method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Sync the given inspection document into the graph database.



In [17]:
help(graph.sync_provenance_checker_result)

Help on method sync_provenance_checker_result in module thoth.storages.graph.janusgraph:

sync_provenance_checker_result(document:dict) -> None method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Sync provenance checker results into graph database.



In [18]:
help(graph.sync_solver_result)

Help on method sync_solver_result in module thoth.storages.graph.janusgraph:

sync_solver_result(document:dict) -> None method of thoth.storages.graph.janusgraph.GraphDatabase instance
    Sync the given solver result to the graph database.



All of the methods shown above accept document, the document id is automatically derived from metadata stated in the provided document (they must respect result schema). A simple snipped to show calls to the above methods:

In [19]:
import json
from pathlib import Path

solver_document = json.loads(Path("/home/fpokorny/git/thoth-station/data/solver-example.json").read_text())
graph.sync_solver_result(solver_document)

Feel free to experiment with syncing data you need for your development, adjusting the JanusGraph schema and indexes or running Thoth components against your local JanusGraph database (see the referenced README file on how to do that all).

... and that's it. Enjoy! ;-)