<a href="https://colab.research.google.com/github/mneedham/link-prediction/blob/master/DataLoading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Citation Dataset Loading

In this notebook we're going to load the citation dataset into Neo4j.

In [None]:
! pip install neo4j 

In [1]:
from neo4j import GraphDatabase

In [2]:
driver = GraphDatabase.driver("bolt://35.153.194.5:36864", auth=("neo4j", "material-seas-machines"))

## Create Constraints

First let's create some constraints to make sure we don't import duplicate data:

In [5]:
with driver.session() as session:
    session.run("CREATE CONSTRAINT ON (article:Article) ASSERT article.index IS UNIQUE")
    session.run("CREATE CONSTRAINT ON (author:Author) ASSERT author.name IS UNIQUE")
    session.run("CREATE CONSTRAINT ON (v:Venue) ASSERT v.name IS UNIQUE")

## Loading the data

Now let's load the data into the database. We'll create nodes for Articles, Venues, and Authors.


In [4]:
with driver.session() as session:
    query = """
    CALL apoc.periodic.iterate(
      'UNWIND ["dblp-ref-0.json", "dblp-ref-1.json", "dblp-ref-2.json", "dblp-ref-3.json"] AS file
       CALL apoc.load.json("https://github.com/mneedham/link-prediction/raw/master/data/" + file)
       YIELD value WITH value
       WHERE value.venue IN ["Lecture Notes in Computer Science", 
                             "Communications of The ACM",
                             "international conference on software engineering",
                             "advances in computing and communications"]
       return value',
      'MERGE (a:Article {index:value.id})
       SET a += apoc.map.clean(value,["id","authors","references", "venue"],[0])
       WITH a, value.authors as authors, value.references AS citations, value.venue AS venue
       MERGE (v:Venue {name: venue})
       MERGE (a)-[:VENUE]->(v)
       FOREACH(author in authors | 
         MERGE (b:Author{name:author})
         MERGE (a)-[:AUTHOR]->(b))
       FOREACH(citation in citations | 
         MERGE (cited:Article {index:citation})
         MERGE (a)-[:CITED]->(cited))', 
       {batchSize: 1000, iterateList: true});
    """
    session.run(query)

## Choosing a machine learning algorithm

In this section we're going to choose a machine learning algorithm

## Generating graphy features

In this section we're going to generate features for our model

## Evaluating results

In this section we'll evaluate our model.