# Citation Dataset Loading

In this notebook we're going to load the citation dataset into Neo4j.

In [18]:
from neo4j import GraphDatabase

In [19]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "neo4jneo4j"))        
print(driver.address)

localhost:7687


## Create Constraints

First let's create some constraints to make sure we don't import duplicate data:

In [20]:
with driver.session(database="demo") as session:
    display(session.run("CREATE CONSTRAINT ON (a:Article) ASSERT a.index IS UNIQUE").consume().counters)
    display(session.run("CREATE CONSTRAINT ON (a:Author) ASSERT a.name IS UNIQUE").consume().counters)
    display(session.run("CREATE CONSTRAINT ON (v:Venue) ASSERT v.name IS UNIQUE").consume().counters)

{'_contains_updates': True, 'constraints_added': 1}

{'_contains_updates': True, 'constraints_added': 1}

{'_contains_updates': True, 'constraints_added': 1}

## Loading the data

Now let's load the data into the database. We'll create nodes for Articles, Venues, and Authors.


In [21]:
query = """
CALL apoc.periodic.iterate(
  'UNWIND ["dblp-ref-3.json"] AS file
   CALL apoc.load.json("https://github.com/mneedham/link-prediction/raw/master/data/" + file)
   YIELD value WITH value
   return value',
  'MERGE (a:Article {index:value.id})
   SET a += apoc.map.clean(value,["id","authors","references", "venue"],[0])
   WITH a, value.authors as authors, value.references AS citations, value.venue AS venue
   MERGE (v:Venue {name: venue})
   MERGE (a)-[:VENUE]->(v)
   FOREACH(author in authors | 
     MERGE (b:Author{name:author})
     MERGE (a)-[:AUTHOR]->(b))
   FOREACH(citation in citations | 
     MERGE (cited:Article {index:citation})
     MERGE (a)-[:CITED]->(cited))', 
   {batchSize: 1000, iterateList: true});
"""

with driver.session(database="demo") as session:
    result = session.run(query)
    for row in result:
        print(row)

<Record batches=5 total=4143 timeTaken=4 committedOperations=4143 failedOperations=0 failedBatches=0 retries=0 errorMessages={} batch={'total': 5, 'committed': 5, 'failed': 0, 'errors': {}} operations={'total': 4143, 'committed': 4143, 'failed': 0, 'errors': {}} wasTerminated=False failedParams={} updateStatistics={'nodesDeleted': 0, 'labelsAdded': 16620, 'relationshipsCreated': 18924, 'nodesCreated': 16620, 'propertiesSet': 29667, 'relationshipsDeleted': 0, 'labelsRemoved': 0}>


In [22]:
query = """
MATCH (a:Article) 
WHERE not(exists(a.title))
DETACH DELETE a
"""

with driver.session(database="demo") as session:
    result = session.run(query)
    print(result.consume().counters)

{'_contains_updates': True, 'nodes_deleted': 2473, 'relationships_deleted': 2690}
