# Create Node and Relationship files for ChEBI Taxonomy
This notebook creates Node and Relationship files that represent the ChEBI ontology tree. The Ontology is retrieved from the [BioPortal](https://bioportal.bioontology.org/ontologies/CHEBI).

The Node and Relationship files can be uploaded into a Neo4j Graph Database using the [kg-import](https://github.com/sbl-sdsc/kg-import).

In [1]:
import os
from pathlib import Path
import pandas as pd
from utils import parse_bioportal_csv

In [2]:
# reload modules before executing user code
%load_ext autoreload
%autoreload 2

In [3]:
# configure pandas dataframe
pd.options.display.max_rows = None  # display all rows
pd.options.display.max_columns = None  # display all columsns

In [4]:
NODE_DIR = Path(os.getenv('NODE_DIR', default='../data'))
RELATIONSHIP_DIR = Path(os.getenv('RELATIONSHIP_DIR', default='../data'))                   

## ChEBI

In [5]:
ontology_url = 'https://data.bioontology.org/ontologies/CHEBI/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb&download_format=csv'

In [6]:
curie = 'chebi'

In [7]:
extra_properties = {'http://purl.obolibrary.org/obo/chebi/formula': 'formula',
                    'http://purl.obolibrary.org/obo/chebi/inchikey': 'inchikey',
                    'http://purl.obolibrary.org/obo/chebi/inchi': 'inchi',
                    'http://purl.obolibrary.org/obo/chebi/mass': 'mass'}

In [8]:
node_file_name = 'Compound.csv'
relationship_file_name = 'Compound-IS_A-Compound.csv'

## Parse ontology file and create node and relationship dataframes

In [9]:
nodes, relationships = parse_bioportal_csv(ontology_url, extra_properties, curie)

In [10]:
print('Number of nodes:', nodes.shape[0])

Number of nodes: 176873


Note: many compounds are missing the InChIKey! Filed a bug report with ChEBI.

In [11]:
print('Number of compounds missing InChIKey:', nodes.query("inchikey == ''").shape[0])

Number of compounds missing InChIKey: 41336


In [12]:
nodes.head()

Unnamed: 0,id,name,synonyms,definition,url,formula,inchikey,inchi,mass
0,chebi:CHEBI_101465,"(2S,3S,4R)-4-(hydroxymethyl)-1-(2-methoxy-1-ox...",,,http://purl.obolibrary.org/obo/CHEBI_101465,C19H19N3O3,NONDGOMIDWLUNU-AOIWGVFYSA-N,InChI=1S/C19H19N3O3/c1-25-12-18(24)22-16(9-20)...,337.373
1,chebi:CHEBI_159237,Leu-His-Glu,(2S)-2-[[(2S)-2-[[(2S)-2-amino-4-methylpentano...,,http://purl.obolibrary.org/obo/CHEBI_159237,C17H27N5O6,KXODZBLFVFSLAI-AVGNSLFASA-N,InChI=1S/C17H27N5O6/c1-9(2)5-11(18)15(25)22-13...,397.432
2,chebi:CHEBI_101448,"2-fluoro-N-[(4S,7R,8S)-8-methoxy-4,7,10-trimet...",,,http://purl.obolibrary.org/obo/CHEBI_101448,C24H30FN3O4,ULFQVHHILZHREN-ZMPRRUGASA-N,InChI=1S/C24H30FN3O4/c1-15-12-26-16(2)14-32-21...,443.512
3,chebi:CHEBI_85476,O-hydroxyvaleroyl-L-carnitine,O-hydroxyvaleroyl-L-carnitines|O-hydroxyvalero...,An O-acyl-L-carnitine in which the acyl group ...,http://purl.obolibrary.org/obo/CHEBI_85476,C12H23NO5,,,261.31472
4,chebi:CHEBI_179157,Furfuryl propyl disulfide,2-[(propyldisulanyl)methyl]uran,,http://purl.obolibrary.org/obo/CHEBI_179157,C8H12OS2,YCXWJNAAXGVFED-UHFFFAOYSA-N,InChI=1S/C8H12OS2/c1-2-6-10-11-7-8-4-3-5-9-8/h...,188.3


In [13]:
print('Number of relationships:', relationships.shape[0])

Number of relationships: 229328


In [14]:
relationships.head()

Unnamed: 0,from,to
0,chebi:CHEBI_101465,chebi:CHEBI_38193
1,chebi:CHEBI_159237,chebi:CHEBI_25676
2,chebi:CHEBI_101448,chebi:CHEBI_52898
2,chebi:CHEBI_101448,chebi:CHEBI_24995
3,chebi:CHEBI_85476,chebi:CHEBI_133449


## Save files

In [15]:
nodes.to_csv(NODE_DIR / node_file_name, index=False)

In [16]:
relationships.to_csv(RELATIONSHIP_DIR / relationship_file_name, index=False)