## Graph Learning Experiment

Since we are running on a local machine, I can't exactly query the 1.5+ million triples in full-c-graph.db... Using a subset of cypher queries from the kd-graph-updater (or potentially new imports?) We will try to learn something about this knowledge set.

It is worth noting, since we are using a significantly smaller dataset, neural networks will be ineffective. We will try traditional ML techniques, random walks, and other methods supplied in the Neo4J graph data science library

In [1]:
from py2neo import Graph

Neo4J graph connector, we are supplying network link url, and credentials (none for now)

In [3]:
uri = 'http://neo4j:7474/'
graph = Graph(uri)
graph = Graph(uri, user='', password='')

Let's define a wrapper for our graph object so we can create and tear down nodes/relationships when necessary

In [4]:
class CGraphWrapper(object):
    
    def __init__(self, graph: Graph):
        self._graph = graph
        self._exports = ['export.auth-citations',
                'export.authors-of-article',
                'export.authors-of-work',
                'export.co-authors',
                'export.journal-articles',
                'export.organizations']
    
    @property
    def graph(self):
        return self._graph
    
    def load_schemas(self):
        tx = self._graph.begin()
        for export in self._exports:
            tx.run('CALL apoc.cypher.runSchemaFile("/var/lib/neo4j/import/schema/%s.schema.cypher")' % export)
        tx.commit()
    
    def load_data(self):
        tx = self._graph.begin()
        for export in self._exports:
            tx.run('''
                    CALL apoc.cypher.runFiles(["/var/lib/neo4j/import/load/%s.nodes.cypher",
                   "/var/lib/neo4j/import/load/%s.relationships.cypher"])
                   '''
                    % (export, export))
        tx.commit()
    
    def clear_data(self):
        self._graph.graph.delete_all()

Instance of our graph wrapper:

In [5]:
c_graph_sample = CGraphWrapper(graph=graph)

Load some data from the `imports/` cypher queries

In [6]:
c_graph_sample.load_schemas()

In [7]:
c_graph_sample.load_data()

### Problem 1: Finding works that are yet to be cited

First, let's find the Weakly Connected Components in the graph

In [8]:
c_graph_sample.graph

Graph('http://neo4j@neo4j:7474', name='neo4j')