# Installs and Imports

## Installs

In [90]:
%pip install -i https://test.pypi.org/simple/ mdb-tools --upgrade

Looking in indexes: https://test.pypi.org/simple/
Collecting mdb-tools
  Downloading https://test-files.pythonhosted.org/packages/f0/54/94663311d1d0e8f1c28de44225d4bc4f913a1d934b04118936b1371d44fb/mdb_tools-0.5.13-py3-none-any.whl (9.2 kB)
Installing collected packages: mdb-tools
  Attempting uninstall: mdb-tools
    Found existing installation: mdb-tools 0.5.12
    Uninstalling mdb-tools-0.5.12:
      Successfully uninstalled mdb-tools-0.5.12
Successfully installed mdb-tools-0.5.13
Note: you may need to restart the kernel to use updated packages.


## Imports

In [1]:
from neo4j import GraphDatabase, basic_auth
import mdb_tools as mdb
from bento_meta.objects import Term, Concept, Predicate

In [2]:
# see what functions are available in package
from inspect import getmembers, isfunction

import mdb_tools.mdb_tools as mdb
func_list = (getmembers(mdb, isfunction))

for x in func_list:
    print(x[0])

check_term_exists
create_concept
create_object_relationship
create_predicate
create_represents_relationship
create_subject_relationship
create_term
detach_delete_concept
detach_delete_predicate
detach_delete_term
generate
get_concepts
get_predicates
get_term_synonyms
get_terms
link_concepts_to_predicate
link_term_synonyms_csv
link_two_terms
make_nano
merge_two_concepts
potential_synonyms_to_csv


# Database connection

URL, username, and password for database connection

In [3]:
# MDB sandbox
URL = "bolt://localhost:7687" # <URL for database>
USER = "neo4j" # <Username for database>
PASSWORD = "noble-use-dairy" # <Password for database>
driver = GraphDatabase.driver(URL, auth=(USER, PASSWORD))

# Linking Terms


### Check existence of Term

In [4]:
# generate test term
test_term = Term()
test_term.value = "Cancer"
test_term.origin_name = "GDC"

In [5]:
# should return "Term with that value found in DB"
with driver.session() as session:
  term_exists = session.read_transaction(mdb.check_term_exists, test_term)
  if term_exists:
    print("Term with that value found in DB")
  else:
    print("Term with that value not found in DB")
driver.close()

Term with that value found in DB


### Find existing Concept from Term

In [53]:
# generate test term
test_term = Term()
test_term.value = "Lung"
test_term.origin_name = "NCIt"

In [54]:
# should return "4jtzA3"
with driver.session() as session:
    test_term_concepts = session.read_transaction(mdb.get_concepts, test_term)
    print(test_term_concepts[0])
driver.close()

4jtzA3


### Create new Term

In [56]:
# generate test term
test_term = Term()
test_term.value = "Lung"
test_term.origin_name = "NDC"

In [57]:
# should add term with value "Lung" and origin_name "NDC" and check it exists afterwards
with driver.session() as session:
    session.write_transaction(mdb.create_term, test_term)
    
    term_exists = session.read_transaction(mdb.check_term_exists, test_term)
    if term_exists:
        print("Term with that value found in DB")
    else:
        print("Term with that value not found in DB")
driver.close()

Created new Term with value: Lung and origin: NDC
Term with that value found in DB


### Generate nanoid

In [55]:
with driver.session() as session:
  print(mdb.make_nano())
driver.close()

3gcVMM


### Create Concept

In [58]:
# generate test Concept
with driver.session() as session:
    test_nano = mdb.make_nano()
driver.close()

test_concept = Concept()
test_concept.nanoid = test_nano

In [59]:
with driver.session() as session:
    session.write_transaction(mdb.create_concept, test_concept)
driver.close()

Created Concept node with nanoid: GDUDWC


### Link Term and Concept

In [60]:
# test term and concept created above
print(f"Term value: {test_term.value}; Term origin: {test_term.origin_name}")
print(f"Concept nanoid: {test_concept.nanoid}")

Term value: Lung; Term origin: NDC
Concept nanoid: GDUDWC


In [75]:
with driver.session() as session:
    session.write_transaction(mdb.create_represents_relationship, test_term, test_concept)
driver.close()

Created represents relationship between Term with value: Cancer            and origin: GDC and Concept with nanoid: GDUDWC


## Link Two Terms

In [7]:
# both terms exist & connected via concept
test_term_1 = Term()
test_term_1.value = "Epithelioma, benign"
test_term_1.origin_name = "GDC"

test_term_2 = Term()
test_term_2.value = "Epithelial tumor, benign"
test_term_2.origin_name = "GDC"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Both terms are already connected via Concept gNbSxf


In [8]:
# both terms exist but neither have concept representing them
test_term_1 = Term()
test_term_1.value = "Cancer"
test_term_1.origin_name = "BentoTailorX"

test_term_2 = Term()
test_term_2.value = "Cancer"
test_term_2.origin_name = "GDC"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Created Concept node with nanoid: HH7YwF
Created represents relationship between Term with value: Cancer and origin: BentoTailorX and Concept with nanoid: HH7YwF
Created represents relationship between Term with value: Cancer and origin: GDC and Concept with nanoid: HH7YwF


In [10]:
# both terms exist & NOT connected via concept
test_term_1 = Term()
test_term_1.value = "Lung"
test_term_1.origin_name = "ICDC"

test_term_2 = Term()
test_term_2.value = "Lung"
test_term_2.origin_name = "BentoTailorX"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Created Concept node with nanoid: 4Ygrnp
Created represents relationship between Term with value: Lung and origin: ICDC and Concept with nanoid: 4Ygrnp
Created represents relationship between Term with value: Lung and origin: BentoTailorX and Concept with nanoid: 4Ygrnp


In [None]:
# both terms exist & NOT connected via concept, but one has a concept

In [9]:
# one term exists & already has concept
test_term_1 = Term()
test_term_1.value = "Minimally Invasive Lung Adenocarcinoma"
test_term_1.origin_name = "NCIt"

test_term_2 = Term()
test_term_2.value = "Alveolar adenocarcinoma"
test_term_2.origin_name = "NDC"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Created new Term with value: Alveolar adenocarcinoma and origin: NDC
Created represents relationship between Term with value: Alveolar adenocarcinoma and origin: NDC and Concept with nanoid: 0c271a


In [11]:
# one term exists & doesn't have concept yet
test_term_1 = Term()
test_term_1.value = "Carcinoma, anaplastic, NOS"
test_term_1.origin_name = "BentoTailorX"

test_term_2 = Term()
test_term_2.value = "Undifferentiated Carcinoma"
test_term_2.origin_name = "NDC"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Created Concept node with nanoid: gifRog
Created new Term with value: Undifferentiated Carcinoma and origin: NDC
Created represents relationship between Term with value: Carcinoma, anaplastic, NOS and origin: BentoTailorX and Concept with nanoid: gifRog
Created represents relationship between Term with value: Undifferentiated Carcinoma and origin: NDC and Concept with nanoid: gifRog


In [12]:
# neither term exists
test_term_1 = Term()
test_term_1.value = "Epithelioma, malignant"
test_term_1.origin_name = "NDC"

test_term_2 = Term()
test_term_2.value = "Carcinoma"
test_term_2.origin_name = "NDC"

with driver.session() as session:
    session.write_transaction(mdb.link_two_terms, test_term_1, test_term_2)
driver.close()

Created new Term with value: Epithelioma, malignant and origin: NDC
Created new Term with value: Carcinoma and origin: NDC
Created Concept node with nanoid: 5aX4EN
Created represents relationship between Term with value: Epithelioma, malignant and origin: NDC and Concept with nanoid: 5aX4EN
Created represents relationship between Term with value: Carcinoma and origin: NDC and Concept with nanoid: 5aX4EN


# Linking Concepts


When two existing Concept nodes are deemed synonymous, there are two primary ways to approach linking them together. The first way to link the synonymous Concepts would be via a Predicate node with the an "exactMatch" handle. This method maintains the exisiting Concept & Term structure while adding to it, allowing queries already in use to continue to work. 

The second way is simply merging the two so they are represented by the same Concept node. With this approach, the Terms linked to each Concept would then be linked to the new merged Concept instead. They could be merged under one of the exisiting Concepts or a new Concept could be created and the old two removed. This method would invalidate existing queries using relevant Concepts & Terms.

### Create Predicate

In [13]:
with driver.session() as session:
    test_nano = mdb.make_nano()
driver.close()

test_predicate = Predicate()
test_predicate.handle = "exactMatch"
test_predicate.nanoid = test_nano

In [14]:
with driver.session() as session:
    session.write_transaction(mdb.create_predicate, test_predicate)
driver.close()

Created new Predicate with handle: exactMatch and nanoid: 10B00s


### Link two Concepts to a Predicate

General pattern: (c1:concept)<-[:has_subject]-(p:predicate {handle:“exactMatch”})-[:has_object]->(c2:concept)

In [16]:
# these Concepts both represent Terms with value: 'Lung' in the MDB.
test_concept_1 = Concept({"nanoid": "4jtzA3"})
test_concept_2 = Concept({"nanoid": "n3udfp"})

with driver.session() as session: 
    session.write_transaction(mdb.link_concepts_to_predicate, test_concept_1, test_concept_2)
driver.close()

Created new Predicate with handle: exactMatch and nanoid: 6QQakF
Created has_subject relationship between source Predicate with handle: exactMatch and nanoid: 6QQakF and destination Concept with nanoid: 4jtzA3
Created has_object relationship between source Predicate with handle: exactMatch and nanoid: 6QQakF and destination Concept with nanoid: n3udfp


### Merge two Concepts

In [19]:
# see Terms attached to test Concept
test_concept = Concept({"nanoid": "4jtzA3"})

with driver.session() as session:
    terms = session.read_transaction(mdb.get_terms, test_concept)
    for term in terms:
        print(term.value, term.origin_name)
driver.close()

Lung NCIt
Lung GDC


In [21]:
# set up for merging Concepts
with driver.session() as session:
    # generate new Concepts, Terms, & Predicate
    test_nano_1 = mdb.make_nano()
    test_nano_2 = mdb.make_nano()
    test_nano_3 = mdb.make_nano()
    test_concept_1 = Concept({"nanoid": test_nano_1})
    test_concept_2 = Concept({"nanoid": test_nano_2})
    test_concept_3 = Concept({"nanoid": test_nano_3})
    session.write_transaction(mdb.create_concept, test_concept_1)
    session.write_transaction(mdb.create_concept, test_concept_2)
    session.write_transaction(mdb.create_concept, test_concept_3)
    test_term_1 = Term({"value": "Nelson", "origin_name": "NDC"})
    test_term_2 = Term({"value": "Nelsinghouse", "origin_name": "NDC"})
    test_term_3 = Term({"value": "Nelly", "origin_name": "NDC"})
    test_term_4 = Term({"value": "N. Moore", "origin_name": "NDC"})
    session.write_transaction(mdb.create_term, test_term_1)
    session.write_transaction(mdb.create_term, test_term_2)
    session.write_transaction(mdb.create_term, test_term_3)
    session.write_transaction(mdb.create_term, test_term_4)
    
    # link new Concepts to Terms above (2 Terms to 1st Concept, 1 Term to 2nd & 3rd Concept)  
    session.write_transaction(mdb.create_represents_relationship, 
                            test_term_1, test_concept_1)
    session.write_transaction(mdb.create_represents_relationship, 
                            test_term_2, test_concept_1)
    session.write_transaction(mdb.create_represents_relationship, 
                            test_term_3, test_concept_2)
    session.write_transaction(mdb.create_represents_relationship, 
                            test_term_4, test_concept_3)
    
    # link new Predicate to Concepts 2 and 3
    session.write_transaction(mdb.link_concepts_to_predicate, test_concept_2, test_concept_3)
driver.close()


Created Concept node with nanoid: wJt7F6
Created Concept node with nanoid: GGzwWF
Created Concept node with nanoid: UA40MK
Created new Term with value: Nelson and origin: NDC
Created new Term with value: Nelsinghouse and origin: NDC
Created new Term with value: Nelly and origin: NDC
Created new Term with value: N. Moore and origin: NDC
Created represents relationship between Term with value: Nelson and origin: NDC and Concept with nanoid: wJt7F6
Created represents relationship between Term with value: Nelsinghouse and origin: NDC and Concept with nanoid: wJt7F6
Created represents relationship between Term with value: Nelly and origin: NDC and Concept with nanoid: GGzwWF
Created represents relationship between Term with value: N. Moore and origin: NDC and Concept with nanoid: UA40MK
Created new Predicate with handle: exactMatch and nanoid: q4sJ9n
Created has_subject relationship between source Predicate with handle: exactMatch and nanoid: q4sJ9n and destination Concept with nanoid: GGzw

In [22]:
# merge Concepts into one
with driver.session() as session:
    session.write_transaction(mdb.merge_two_concepts, test_concept_1, test_concept_2)
driver.close()

Removed Concept node with nanoid: GGzwWF
Created represents relationship between Term with value: Nelly and origin: NDC and Concept with nanoid: wJt7F6
Created has_subject relationship between source Predicate with handle: exactMatch and nanoid: q4sJ9n and destination Concept with nanoid: wJt7F6


# Finding potentially synonymous Terms 

In [31]:
test_term = Term({
    "value": "Melanoma",
    "origin_name": "GDC"
})

with driver.session() as session:
    terms_to_csv = session.read_transaction(mdb.get_term_synonyms, test_term)
    print(terms_to_csv)
driver.close()

  similarity = term_1.similarity(term_2)


[{'value': 'Melanoma', 'origin_name': 'GDC', 'similarity': 1.0, 'valid_synonym': 0}, {'value': 'Melanoma', 'origin_name': 'BentoTailorX', 'similarity': 1.0, 'valid_synonym': 0}, {'value': 'Melanoma', 'origin_name': 'ICDC', 'similarity': 1.0, 'valid_synonym': 0}, {'value': 'Melanoma', 'origin_name': 'NCIt', 'similarity': 1.0, 'valid_synonym': 0}, {'value': 'Glioma', 'origin_name': 'NCIt', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Mesothelioma', 'origin_name': 'GDC', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Mesothelioma', 'origin_name': 'BentoTailorX', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Glioma', 'origin_name': 'ICDC', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Mesothelioma', 'origin_name': 'NCIt', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Low-CSD Melanoma', 'origin_name': 'NCIt', 'similarity': 0.9999999420005758, 'valid_synonym': 0}, {'value': 'Malignant Glioma',

In [34]:
file_name = "potential_synonyms_melanoma.csv"
file_path = "C:/Users/nelso/OneDrive - Georgetown University/School Stuff/Capstone/Test/" + file_name

mdb.potential_synonyms_to_csv(terms_to_csv, file_path)

### Import list of Terms marked as synonymous and link via Concept

In [36]:
test_term = Term({
    "value": "Melanoma",
    "origin_name": "GDC"
})

csv_path = "C:/Users/nelso/OneDrive - Georgetown University/School Stuff/Capstone/Test/potential_synonyms_melanoma.csv"

with driver.session() as session:
    session.write_transaction(mdb.link_term_synonyms_csv, test_term, csv_path)
driver.close()

Both terms are already connected via Concept THVkbK
Created represents relationship between Term with value: Melanoma and origin: BentoTailorX and Concept with nanoid: THVkbK
Created represents relationship between Term with value: Melanoma and origin: ICDC and Concept with nanoid: THVkbK
Both terms are already connected via Concept THVkbK


# Testing

In [100]:
# MDB sandbox
URL = "bolt://localhost:7687" # <URL for database>
USER = "neo4j" # <Username for database>
PASSWORD = "noble-use-dairy" # <Password for database>
driver = GraphDatabase.driver(URL, auth=(USER, PASSWORD))

In [28]:
# deleting created nodes & relationships to resuse notebook examples; move this and
# any node creation used in notebook examples to a setup section at beginning of notebook
# (should be all examples used so reproducable in empty sandbox db by user)
with driver.session() as session:
    #link 2 terms
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "HH7YwF"}))
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "4Ygrnp"}))
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "gifRog"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Alveolar adenocarcinoma", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Undifferentiated Carcinoma", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Epithelioma, malignant", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Carcinoma", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "5aX4EN"}))
    #link 2 concepts via predicate
    session.write_transaction(mdb.detach_delete_predicate, Predicate({"handle": "exactMatch", "nanoid": "6QQakF"}))
    #merge 2 concepts
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "wJt7F6"}))
    session.write_transaction(mdb.detach_delete_concept, Concept({"nanoid": "UA40MK"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Nelson", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Nelsinghouse", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "Nelly", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_term, Term({"value": "N. Moore", "origin_name": "NDC"}))
    session.write_transaction(mdb.detach_delete_predicate, Predicate({"handle": "exactMatch", "nanoid": "q4sJ9n"}))
driver.close()

Removed Concept node with nanoid: HH7YwF
Removed Concept node with nanoid: 4Ygrnp
Removed Concept node with nanoid: gifRog
Removed Term node with value: Alveolar adenocarcinoma and origin: NDC
Removed Term node with value: Undifferentiated Carcinoma and origin: NDC
Removed Term node with value: Epithelioma, malignant and origin: NDC
Removed Term node with value: Carcinoma and origin: NDC
Removed Concept node with nanoid: 5aX4EN
Removed Predicate node with handle: exactMatch and nanoid: 6QQakF
Removed Concept node with nanoid: wJt7F6
Removed Concept node with nanoid: UA40MK
Removed Term node with value: Nelson and origin: NDC
Removed Term node with value: Nelsinghouse  and origin: NDC
Removed Term node with value: Nelly and origin: NDC
Removed Term node with value: N. Moore and origin: NDC
Removed Predicate node with handle: exactMatch and nanoid: q4sJ9n
