# Create Node and Relationship files for MeSH Terms
This notebook creates Node and Relationship files that represent the MeSH tree. The Ontology is retrieved from the [BioPortal](https://bioportal.bioontology.org/ontologies/MESH).

The Node and Relationship files can be uploaded into a Neo4j Graph Database using the [kg-import](https://github.com/sbl-sdsc/kg-import).

In [1]:
import os
from pathlib import Path
import pandas as pd
from utils import parse_bioportal_csv

In [2]:
# reload modules before executing user code
%load_ext autoreload
%autoreload 2

In [3]:
# configure pandas dataframe
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [4]:
NODE_DIR = Path(os.getenv('NODE_DIR', default='../data'))
RELATIONSHIP_DIR = Path(os.getenv('RELATIONSHIP_DIR', default='../data'))                   

## MeSH Terms

Specify the CSV Download URL from [BioPortal](https://bioportal.bioontology.org/).

In [5]:
ontology_url = 'https://data.bioontology.org/ontologies/MESH/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb&download_format=csv'

Specify extra columns to be imported from the CSV file. The map specifies the mapping of the original column names to new column names to be used as node properties. Use the following the Neo4j convention for property names: lower-case, using underscore to separate words.

In [6]:
extra_properties = {}

Browse the Identifiers.org [registry](https://registry.identifiers.org/registry) to find curie (compact uri) for a data resource.

In [7]:
curie = 'mesh'

In [8]:
node_file_name = 'Category.csv'
relationship_file_name = 'Category-IS_A-Category.csv'

## Parse ontology file and create node and relationship dataframes

In [9]:
nodes, relationships = parse_bioportal_csv(ontology_url, extra_properties, curie)

In [10]:
print('Number of nodes:', nodes.shape[0])

Number of nodes: 348658


In [11]:
nodes.head()

Unnamed: 0,id,name,synonyms,definition,url
0,mesh:C000659400,Acaulospora ignota,,,http://purl.bioontology.org/ontology/MESH/C000...
1,mesh:C000624633,technetium 99m hydroxyethylene-diphosphonate,99mTc-HDP|99mTc-hydroxyethylene-diphosphonate,,http://purl.bioontology.org/ontology/MESH/C000...
2,mesh:C585345,"Tardbp protein, zebrafish","Tardbpl-FL protein, zebrafish|Tardbpl protein,...",,http://purl.bioontology.org/ontology/MESH/C585345
3,mesh:C000623720,Autographa californica multiple nuclear polyhe...,Trichoplusia ni multiple nucleopolyhedrovirus|...,,http://purl.bioontology.org/ontology/MESH/C000...
4,mesh:C000644313,Leeuwenhoekiella blandensis,,,http://purl.bioontology.org/ontology/MESH/C000...


In [12]:
print('Number of relationships:', relationships.shape[0])

Number of relationships: 41322


In [13]:
relationships.head()

Unnamed: 0,from,to
26,mesh:D014437,mesh:D012282
44,mesh:D019074,mesh:D001158
82,mesh:D006371,mesh:D005741
83,mesh:D054881,mesh:D012732
83,mesh:D054881,mesh:D011859


## Save files

In [14]:
nodes.to_csv(NODE_DIR / node_file_name, index=False)

In [15]:
relationships.to_csv(RELATIONSHIP_DIR / relationship_file_name, index=False)