# Introduction: Graph Setup

We will use this notebook to explore `py2neo` and how to connect to our linked Neo4J graph. Once we are able to make a connection to the graph, we can query on it, and use our favorite Python data science libraries to make further insights.

In [4]:
from py2neo import Graph

### Neo4J Connection

We connect to a linked neo4j docker instance within our bridged network. Since this is a local demonstration, security is not a concern.

In [5]:
uri = 'http://neo4j:7474/'
graph = Graph(uri, user='neo4j', password='test', name='cgraph')

### Neosemantics Configuration

Calling Neosemantics `graphconfig.init()` to load the constraints for dealing with Linked Data.

In [None]:
# If graph isn't empty, don't initialize graphconfigs
if graph.match_one() == None:
    graph.call.n10s.graphconfig.init()
graph.call.n10s.graphconfig.show()

In the Omniscience taxonomy, there are prefixes available to make the graph readable, and forces `Neosemantics` to use appropriate namespace prefixes.

In [None]:
graph.call.apoc.cypher.runFile('/var/lib/neo4j/import/taxonomy/omniscience/setup/namespaces.cypher')

Creating a uniqueness constraint on Resources. Resources are defined as nodes that can be dereferenced via their `uri` property.

In [None]:
if not graph.schema.get_uniqueness_constraints('Resource'):
    graph.schema.create_uniqueness_constraint('Resource', 'uri')

### Data Loading

We will use `Neosemantics` to load a taxonomy in the existing bibliometric C-Graph database.

#### Loading OmniScience

In [None]:
# Make sure taxonomy isn't already loaded...
if graph.run("MATCH (c:skos__Concept) RETURN c LIMIT 1") == None:
    graph.run("CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/omniscience/statements.ttl', 'Turtle')")
    print('Successfully loaded Omniscience vocabulary.')
else:
    print('OmniScience taxonomy appears to be loaded.')

@TODO: Some experiment calculating hIndex on a concept.

In [None]:
def hIndex(citation_counts: list):
    citation_counts.sort(reverse=True)
    for index, citation_count in enumerate(citation_counts):
        if index > citation_count:
            return index
    return len(citation_counts)