# Search Summit 2020

The following experiment attempts to recommend peer reviewers for new Works based on tagged Omniscience concepts. The underlying technology used is Neo4j, the Neosemantics (n10s) plugin, and the Neo4j Graph Data Science (gds) plugin. The following notebook provides useful markup for understanding the process of loading data, and experimentation via the py2neo library.

Since this is done on a local machine, the knowledge graphs are small to demonstrate functionality.

## Problem: Recommending Peer Reviewers Based on Existing Concepts

### Configuring to Our Local Graph Instance

In [1]:
from py2neo import Graph
import os

We connect to a linked neo4j docker instance within our bridged network. Since this is a local demonstration, security is not a concern.

In [2]:
uri = 'http://neo4j:7474/'
graph = Graph(uri, user='neo4j', password='test')

Calling Neosemantics `graphconfig.init()` to load the constraints for dealing with Linked Data.

In [6]:
graph.run('CALL n10s.graphconfig.init({ handleMultival: "ARRAY"})')

 param           | value   
-----------------|---------
 handleVocabUris | SHORTEN 
 handleMultival  | ARRAY   
 handleRDFTypes  | LABELS  

Presetting some common prefixes to make the graph readable, and forcing Neosemantics to use appropriate namespace prefixes.

In [7]:
prefixSetter = """
WITH '
@prefix : <http://prismstandard.org/namespaces/basic/2.0/> .
@prefix a1: <http://www.elsevier.com/xml/schema/grant/grant-1.1> .
@prefix a2: <http://www.elsevier.com/xml/schema/grant/grant-1.2> .
@prefix arg: <http://spinrdf.org/arg#> .
@prefix bam: <http://www.elsevier.com/xml/schema/rdf/BasicAssetMetadata-1/> .
@prefix dash: <http://datashapes.org/dash#> .
@prefix dcam: <http://purl.org/dc/dcam/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix ebs: <http://www.elsevier.com/xml/schema/rdf/EMMeTBasicSatellite-1/> .
@prefix ecm: <http://www.elsevier.com/xml/schema/rdf/common/Metadata-1/> .
@prefix edg: <http://edg.topbraid.solutions/model/> .
@prefix edm: <https://data.elsevier.com/schema/edm/> .
@prefix egm: <http://www.elsevier.com/xml/schema/rdf/ElsevierGenericMembershipSatellite-1/> .
@prefix egq: <http://www.elsevier.com/xml/schema/rdf/ElsevierGenericSequenceSatellite-1/> .
@prefix egv: <http://www.elsevier.com/xml/schema/rdf/ElsevierGenericVocabularySatellite-1/> .
@prefix emloc: <http://data.elsevier.com/vocabulary/EMMeT/locrel/> .
@prefix emst: <http://www.elsevier.com/xml/schema/rdf/EGVSatelliteTagging-1/> .
@prefix epr: <http://www.elsevier.com/xml/schema/rdf/EMMeTProvenanceSatellite-1/> .
@prefix eum: <https://data.elsevier.com/schema/unitsOfMeasure/> .
@prefix evo: <https://data.elsevier.com/schema/evo/> .
@prefix geo-ont: <http://www.geonames.org/ontology#> .
@prefix graphql: <http://datashapes.org/graphql#> .
@prefix idm: <https://data.elsevier.com/schema/idm/> .
@prefix idtype: <https://data.elsevier.com/e/identifier/> .
@prefix knovel_properties_vocabulary: <https://data.elsevier.com/engineering/knovel/properties/schema/> .
@prefix knovelproperties: <http://data.elsevier.com/vocabulary/knovel/properties/> .
@prefix metadata: <http://topbraid.org/metadata#> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix omniscienceextensionmerged_02052019: <http://data.elsevier.com/vocabulary/OmniScienceExtension/> .
@prefix os: <http://data.elsevier.com/vocabulary/OmniScience/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix pav: <http://purl.org/swan/pav/provenance/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix provenance_authoring_and_versioning_ontology: <http://purl.org/pav/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rv: <https://data.elsevier.com/research/schema/rv/> .
@prefix s: <http://states.data/> .
@prefix sat: <http://www.elsevier.com/xml/schema/rdf/LDR-Satellites/Base-1/> .
@prefix schema: <https://none.schema.org/> .
@prefix semrel: <http://data.elsevier.com/EMMeT/SemRelations/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix skm: <http://synaptica.net/skm/> .
@prefix skmse: <http://synaptica.net/skm/subElement/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix skos_inferences: <urn:x-evn-master:skos_inferences/> .
@prefix skosshapes: <http://topbraid.org/skos.shapes#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
@prefix smf: <http://topbraid.org/sparqlmotionfunctions#> .
@prefix sp: <http://spinrdf.org/sp#> .
@prefix spin: <http://spinrdf.org/spin#> .
@prefix spl: <http://spinrdf.org/spl#> .
@prefix svf: <http://www.elsevier.com/xml/schema/grant/grant-1.2/> .
@prefix swa: <http://topbraid.org/swa#> .
@prefix tag: <http://www.elsevier.com/xml/schema/rdf/LDR-Satellites/TagAnnot-1/> .
@prefix teamwork: <http://topbraid.org/teamwork#> .
@prefix teamworkconstraints: <http://topbraid.org/teamworkconstraints#> .
@prefix tosh: <http://topbraid.org/tosh#> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix xsi: <http://www.w3.org/2001/XMLSchema-instance> .
@prefix zthes: <http://synaptica.net/zthes/> .
' AS prefixes
CALL n10s.nsprefixes.addFromText(prefixes)
YIELD prefix, namespace
RETURN prefix, namespace
"""
graph.run(prefixSetter)

 prefix                                       | namespace                                                                  
----------------------------------------------|----------------------------------------------------------------------------
 egq                                          | http://www.elsevier.com/xml/schema/rdf/ElsevierGenericSequenceSatellite-1/ 
 teamwork                                     | http://topbraid.org/teamwork#                                              
 provenance_authoring_and_versioning_ontology | http://purl.org/pav/                                                       

Creating a uniqueness constraint on Resources. Resources are defined as nodes that can be dereferenced via their `uri` property.

In [8]:
if not graph.schema.get_uniqueness_constraints('Resource'):
    graph.schema.create_uniqueness_constraint('Resource', 'uri')

First, we will load the Omniscience vocabulary. This will give us a way to connect Books/Chapters/Articles annotated concepts.

**Warning:** Executing this takes time - be sure to skip over it if you're sure the vocab is loaded!

In [39]:
graph.run("CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/omniscience/statements.ttl', 'Turtle')")
print('Successfully loaded Omniscience vocabulary.')

Successfully loaded Omniscience vocabulary.


Next, we will load sample annotation RDF. The container volume `/var/lib/neo4j/import/omniscience` contains relevant omniscience samples. The subdirectory `c-graph` contains related CSV files taken from the dev c-graph neo-4j cluster to link articles, concepts, authors, and references.

This is very ugly, but I wanted to avoid sharing mounted volumes between neo4j and jupyter instances.

In [40]:
piis = ["S0002944017305370","S0002944017306740","S0002944017306867","S0002944017309409","S0014489417304253",
        "S0014489417304575","S0031302517304488","S0031302517304932","S0031302517305184","S0031302517305238",
        "S0031302517305287","S088394411730922X","S0093775417300465","S0147956317301516","S0147956317302236",
        "S0147956317303709","S0147956317304399","S0272771417300768","S0272771417300987","S0272771417305280",
        "S0272771417305711","S0272771417306650","S0883944116308796","S0883944117308808","S0883944117310109",
        "S0883944117310638","S0883944117311176","S0883944117312030","S0883944117316453","S0883944117316933",
        "S0883944117316945","S0894113017303393","S2376999817300417","S2405844017300968","S2405844017304152",
        "S2405844017306084","S2405844017309210","S2405844017310204","S2405844017312641","S2405844017316754",
        "S2405844017316936","S2405844017317279","S2405844017319369","S2405844017323411","S2405844017326725"]
for pii in piis:
    graph.run("CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/omniscience/works/%s/annotations.rdf', 'RDF/XML')" % pii)
    graph.run("""
        LOAD CSV WITH HEADERS FROM 'file:///omniscience/works/%s/c-graph/authors.csv' AS rows
        WITH rows SKIP 0
        MERGE (r:Resource {uri: 'http://vtw.elsevier.com/content/pii/'+rows.pii})
        MERGE (a:Author {id: rows.authorId})
        MERGE (a)-[:authorOf]->(r)
    """ % pii)
    graph.run("""
        LOAD CSV WITH HEADERS FROM 'file:///omniscience/works/%s/c-graph/refs.csv' AS rows
        WITH rows SKIP 0
        MERGE (r:Resource {uri: 'http://vtw.elsevier.com/content/pii/'+rows.pii})
        MERGE (w:Work {id: rows.workId, publishedDate: rows.publishedDate, updatedDate: rows.updatedDate})
        MERGE (w)-[:references]->(r)
    """ % pii)

In [61]:
graph.run("""
    MATCH (r:Resource {uri: 'http://data.elsevier.com/vocabulary/OmniScience/Concept-254831502'})
    RETURN r.uri AS resource,
    size((r)<-[:skos__broader|:skos__exactMatch]-()) AS outDegree,
    size((r)<-[:ecm__isAssignedTo|:skos__narrower]-()) AS inDegree,
    r.skos__prefLabel AS label
""")

 resource                                                          | outDegree | inDegree | label                
-------------------------------------------------------------------|-----------|----------|----------------------
 http://data.elsevier.com/vocabulary/OmniScience/Concept-254831502 |         0 |        2 | ['Breast Pathology'] 