Roland Roller, Gaurav Vashisth, Philippe Thomas, He Wang, Michael Mikhailov and Mark Stevenson [To appear] Proceedings of the International Semantic Web Conference 2019.
Graph-KD is a general graph exploring tool, which has following functionalities:
- Finding K-shortest path between two nodes.
- Exploring paths around a given node.
- Infering relations between source and target node of a given path.
The knowledge-graph that is used to build this tool is the UMLS (Unified Medical Language System) dataset which is freely available at UMLS website
In order to use Graph-KD you should have:
- Neo4j
- JVM
- Python 3.x
The following tutorial supports only Unix-Systems
echo "deb http://httpredir.debian.org/debian jessie-backports main" | sudo tee -a /etc/apt/sources.list.d/jessie-backports.list
sudo apt-get update
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt-get install neo4j=3.2.2
systemctl start neo4j
to start Neo4j.systemctl status neo4j
to check the status whether Neo4j is running.systemctl stop neo4j
(After testing whether Neo4j is running, please disconnect the Neo4j service).
-
After starting the Neo4j service (
systemctl start neo4j
) please openhttp://localhost:7474/browser/
on your local browser. -
Username:
neo4j
Password:neo4j
this will redirect you to set a new password.
- Stop the Neo4j instance before performing following steps (
systemctl stop neo4j
) cp com.dfki.LT.OntologyExplorer-1.0-SNAPSHOT.jar /var/lib/neo4j/plugins/
- The default Neo4j function was used to create the DB.
- In order to create DB for Neo4j we need 4 files (2 header files and 2 content files)
Content: :ID,ConceptID,ConceptName
:ID
==> Node IDbiomedical-dfkiConceptID,ConceptName
==> Properties of the node
Content: :START_ID,:END_ID,:TYPE,RelationLabel,weight
:START_ID,:END_ID
==> Node ID:Type
==> VocabularyRelationLable
==> Relation Nameweight
==> Edge weight
This file contains data for node in 3 columns comma separated, without header.
This file contains data for relation in 4 columns comma separated, without header.
- get MRREL.RRF and MRCONSO.RRF from UMLS installed directory
- for creating relation.txt and node.txt we need UMLS_final file, which is obtained by running below script using python3.5
- First run the
Create_Graph_DB/DBcreation.py 0 <path of UMLS_relation> <path of MRREL> <path of MRCONSO>
- The first run will generate UMLS_Analyse file, which contains relationship that have equal number of records; where # of records greater than 2
- After merging the records from UMLS_Analyse into UMLS_relation (manually)
- Run
Create_Graph_DB/DBcreation.py 1 <path of UMLS_relation> <path of MRREL> <path of MRCONSO>
- We get node.txt and relation.txt
neo4j-import --into graph.db --nodes:<Node label> "nheader.txt,node.txt" --relationships "rheader.txt,relation.txt" --skip-duplicate-node true
--into
Name of the generated databasegraph.db
Recommended name
--nodes:UMLSConcepts
Node label Note: When you have one label only you provide it via this command, but when you have too many labels you must provide them via a file."nheader,node"
Name of the Node-Header-File (nheader.txt) and the Node-Content-File (nodefile)
--relationships
"rheader,relation"
Name of the Relation-Header-File (rheader.txt) and the Relation-Content-File (relationfile)
--skip-duplicate-node
true
Skip duplicate nodes
After running this command a graph.db folder will be generated in the present directory, you need to move this folder into the following folder /var/lib/neo4j/data/databases/
Default location of the folder is: /var/lib/neo4j/data/databases/
To change the path of the folder:
-
Open neo4j.conf by typing
gedit /etc/neo4j/neo4j.conf
-
Replace line dbms.directories.data=/var/lib/neo4j/data with dbms.directories.data=
<folder of your choice>
-
Copy the graph.db into
<folder of your choice>
/databases/
cp -r graph.db /var/lib/neo4j/data/databases/
3.3) (IMPORTANT) Everytime you add a new graph.db file you must stop the Neo4j instance otherwise you will corrupt the database.
3.4) (IMPORTANT!!!) In order to exploit NEO4j's efficient graph traversal speed, we have to index the database and this can be done via
CREATE INDEX ON :<Node label>(<Node property>)
CREATE INDEX ON :UMLSConcepts(ConceptID)
Before running the inference script, make sure you have flask and request module installed.
- python Inference/Inference.py
- cd to UI/biomedical-dfki-0.1/bin
- Run
./biomedical-dfki
- Open the browser and type http://localhost:9000/graph-kd