to do
- Use the right terminology. Mapping, assigning, endpoint, material, bioassay,,, for the column names and the documentation
- allign with AMBIT and ISA-TAB structure
- figure out IDs (do i do it like (name + supplier + diameter)?
- motivate the inclussion of a whole set of classes into enanomapper ontology: why do we need Alamar blue (a reagent) if we have the Bioassay class "Alamnar Blue Assay"?
- Data isn't totally clean (e.g. Alamar blue and Alamar Blue)
once the proof-of-concept seems to be working:
- Write up a project plan to make an app/website/whatever for this workflow (I want a nice GUI)
  - Timeline (i.e., a Gannt chart)
  - Motivation
  - How it alligns w everything fp7 so far
  - Detail the proof of concept


---
title: 01 - QC + YARRRML generator for nn8b07562_si_001.xlsx

author: Javier Millan Acosta

---

# Introduction and motivation
## Source

This notebook explores the supplementary materials from the ACS Nano Paper:
>Labouta HI, Asgarian N, Rinker K, Cramb DT. Meta-Analysis of Nanoparticle Cytotoxicity via Data-Mining the Literature. ACS Nano. 2019 Jan 31; doi:10.1021/acsnano.8b07562 (Scholia)

ACS seems to block scrapers, so the supplementary data needs to be manually downloaded from the [supporting information link](https://pubs.acs.org/doi/suppl/10.1021/acsnano.8b07562/suppl_file/nn8b07562_si_001.xlsx) in the [ACS Nano page](https://pubs.acs.org/doi/full/10.1021/acsnano.8b07562), and then stored under [../data](../data).

## Summary
The steps in this notebook assist to prepare the input data for the RML-based RDFication of cytotoxicity data through the generation of a [YARRRML](https://rml.io/yarrrml/spec/) mapping file. Specifically, the goals are:

*I) Describe the data set*

*II) Identify inconsistencies and clean the data*

*III) Assist with the selection of eNanoMapper terms for the mapping/assign*

*IV) Help detect missing relevant classes in the eNanoMapper ontology*

*V) Provide the foundation for the RML YARRRML that will be used for the RDFication*

*VI) Serve as a proof-of-concept for an eNanoMapper ontology allign tool*

# Imports

In [19]:
import pandas as pd
import numpy as np
import math
import os
import sys
from IPython.display import Markdown, display
from code import interact
import re
import rdflib
import requests
from ipywidgets import interactive_output, interact_manual, Layout, widgets, interact, Dropdown, Select, Text, Button, Textarea
import json
from datetime import datetime

# Loading data
The dataset is an overview of literature nanoparticle citotoxicity assays. The authors harmonized the units and used the features in the table above to run decision tree analyses.

In [2]:
file = "../data/nn8b07562_si_001.xlsx"
df = pd.read_excel(file)

Next step is to verify the data types for each column:

In [3]:
df_dtypes = pd.DataFrame(df.dtypes, columns=["Dtype"])
cols = [i for i in df.columns]
display(df_dtypes.transpose())
display(Markdown("Data shape of {} is {}.".format(file, df.shape)))

Unnamed: 0,Nanoparticle,Type: Organic (O)/inorganic (I),coat,Diameter (nm),Concentration μM,Zeta potential (mV),Cells,Cell line (L)/primary cells (P),Human(H)/Animal(A) cells,Animal?,...,Test,Test indicator,Biochemical metric,% Cell viability,Interference checked (Y/N),Colloidal stability checked (Y/N),Positive control (Y/N),Publication year,Particle ID,Reference DOI
Dtype,object,object,object,float64,float64,float64,object,object,object,object,...,object,object,object,float64,object,object,object,int64,int64,object


Data shape of ../data/nn8b07562_si_001.xlsx is (2896, 24).

Converting all numeric columns to floats and `Particle ID` to string:

In [4]:
int_cols = list(df_dtypes.loc[df_dtypes['Dtype'] == int].index)
df[int_cols] = df[int_cols].astype(float)
df["Particle ID"] = df["Particle ID"].astype(object)
qual_cols = list(df_dtypes.loc[df_dtypes['Dtype'] == object].index)
display(pd.DataFrame(df[int_cols].dtypes, columns=["Dtype"]))

Unnamed: 0,Dtype
Exposure time (h),float64
Publication year,float64
Particle ID,object


# Describing the data features
## Qualitative features

Table below is a description of the qualitative variables in the data (`dtype=object`).

In [5]:
df.describe(include="object")

Unnamed: 0,Nanoparticle,Type: Organic (O)/inorganic (I),coat,Cells,Cell line (L)/primary cells (P),Human(H)/Animal(A) cells,Animal?,Cell morphology,"Cell age: embryonic (E), Adult (A)",Cell-organ/tissue source,Test,Test indicator,Biochemical metric,Interference checked (Y/N),Colloidal stability checked (Y/N),Positive control (Y/N),Particle ID,Reference DOI
count,2896,2896,1052,2896,2896,2896,651,2896,2896,2896,2896,2896,2896,2896,2896,2896,2896.0,2896
unique,33,2,46,81,2,2,8,15,2,30,23,17,6,2,2,2,118.0,89
top,Iron oxide,I,PEI,A549,L,H,Mouse,Epithelial,A,Blood,MTT,tetrazolium salt,cell metabolic activity,N,N,N,19.0,10.1186/1556-276X-7-77
freq,490,2274,123,298,2356,2231,411,1456,2757,536,872,1302,1678,2348,2309,2395,225.0,225


Table below shows the percentage of missing values.

In [6]:
df_null = pd.DataFrame(df[qual_cols].isnull().sum()/len(df)*100, columns = ["%na"]).transpose()
df_null

Unnamed: 0,Nanoparticle,Type: Organic (O)/inorganic (I),coat,Cells,Cell line (L)/primary cells (P),Human(H)/Animal(A) cells,Animal?,Cell morphology,"Cell age: embryonic (E), Adult (A)",Cell-organ/tissue source,Test,Test indicator,Biochemical metric,Interference checked (Y/N),Colloidal stability checked (Y/N),Positive control (Y/N),Reference DOI
%na,0.0,0.0,63.674033,0.0,0.0,0.0,77.520718,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


As described in the paper, there are many missing values for `coat`. The missing values in `Animal?` are not relevant -this column will be replaced with an `organism` column. 

In [7]:
df[qual_cols] = df[qual_cols].fillna('')

## Quantitative features

In [8]:
describe = df.drop(["Particle ID", "Publication year"], axis=1).describe()
cols_d = [i for i in describe.columns]
nas = [df[col].isna().sum()/len(df[col])*100 for col in cols_d]
describe.loc["%na"] = nas
display(describe)
na_overall = str(np.round(df.isna().sum().sum() / df.size * 100, 3))
display(Markdown("The overall percentage of missing values in the quantitative features is {}%.".format(na_overall)))

Unnamed: 0,Diameter (nm),Concentration μM,Zeta potential (mV),Exposure time (h),% Cell viability
count,2896.0,2896.0,1261.0,2896.0,2896.0
mean,125.082465,85.74635,-1.963933,35.515539,75.208409
std,171.931194,797.9487,28.925259,27.950149,34.267026
min,1.0,1.660539e-20,-48.0,1.0,-58.89764
25%,20.0,2.5e-06,-27.0,24.0,54.219643
50%,49.2,0.0005,-8.0,24.0,86.965674
75%,165.0,0.01054755,17.7,48.0,97.65237
max,957.0,15000.0,87.0,336.0,404.8117
%na,0.0,0.0,56.457182,0.0,0.0


The overall percentage of missing values in the quantitative features is 2.352%.

As described in the paper, the amount of rows missing `Zeta potential` measurements is very high.

# Cleaning data

(TBD)

In [9]:
df["Organism"] = [val if val !="" else "Human" for val in df["Animal?"]]
df["DOI"] = ["" if "(" in val else "https://doi.org/"+val for val in df["Reference DOI"]]
df["Reference"] = [val.replace("not provided (", "https://").replace(")", "") if "(" in val else "" for val in df["Reference DOI"]]
df["Type"] = ["organic" if val=="O" else "inorganic" for val in df["Type: Organic (O)/inorganic (I)"]]

# Mapping terms with the eNanoMapper ontology
The [Ontology Lookup Service](https://www.ebi.ac.uk/ols/index) [search API](https://www.ebi.ac.uk/ols/docs/api) is used to retrieve IRI and labels for matches of terms. These can be used as input in a workflow that creates the `RML` model.

## Column names
Some axioms will be added with the column names as predicates (i.e., measured values like Concentration). The widget below retrieves the best matches from the [Ontology Lookup Service](https://www.ebi.ac.uk/ols/docs/api#Search) for reference for these columns.

Defining a function that looks up column names in the OLS and retrieves all the matches:

In [10]:
def ols_lookup(var_list, base_url = "https://www.ebi.ac.uk", get_query = "/ols/api/search?q={}&groupField=iri&start=0&ontology=enm"):
    allign = dict()
    for var in var_list:
        r = requests.get(base_url+get_query.format(var))
        d = dict(r.json())
        matches = dict()
        for match in range(len(d["response"]["docs"])):
            label = d["response"]["docs"][match]["label"]
            iri = d["response"]["docs"][match]["iri"]
            matches[label] = iri
        allign[var] = matches
    return allign

In [11]:
allign = ols_lookup(list(df.columns))

Using the function on the data and visualizing the results:

In [12]:
mapping_cols = pd.DataFrame(["a" for i in list(allign.keys())], list(allign.keys()), columns=["Mapping"])
select_var = Select(options = allign.keys())
@interact(select = allign)
def show_matches(select):
    display(Markdown("Below are the matches returned by the OLS.")) 
    display(pd.DataFrame([select]).transpose())     

interactive(children=(Dropdown(description='select', options={'Nanoparticle': {'nanoparticle': 'http://purl.bi…

## Cell values
The unique values are analyzed individually with a similar approach as above. For the cells whose object is `subject a (cell content)`:

In [13]:
subject_a = ["Particle ID", "Type: Organic (O)/inorganic (I)", "Cell age: embryonic (E), Adult (A)", 
            "Positive control (Y/N)", "Colloidal stability checked (Y/N)", "Interference checked (Y/N)", 
             "Animal?", "Organism", "Type", "Reference DOI"] # Use this to select which are the factor columns
qual_cols = [i for i in qual_cols if i not in subject_a] 
quant_cols = {col:'' for col in set(df.columns) - set(subject_a) - set(qual_cols)}
str_quant = "- " + "\n- ".join(list(quant_cols.keys()))
display(Markdown("**Quantitative vars:**\n" + str_quant))
str_qual = str_qual = "- " + "\n- ".join(qual_cols)
display(Markdown("**Qualitative vars:**\n" +str_qual))
str_subject_a = "- " + "\n- ".join(subject_a)
display(Markdown("**RDF:type / owl:SubClassOf: / owl:namedIndividual: / rr:datatype xsd:(e.g. double)**\n" +str_subject_a))

**Quantitative vars:**
- Exposure time (h)
- Concentration μM
- Zeta potential (mV)
- Reference
- Publication year
- % Cell viability
- Diameter (nm)
- DOI

**Qualitative vars:**
- Nanoparticle
- coat
- Cells
- Cell line (L)/primary cells (P)
- Human(H)/Animal(A) cells
- Cell morphology
- Cell-organ/tissue source
- Test
- Test indicator
- Biochemical metric

**RDF:type / owl:SubClassOf: / owl:namedIndividual: / rr:datatype xsd:(e.g. double)**
- Particle ID
- Type: Organic (O)/inorganic (I)
- Cell age: embryonic (E), Adult (A)
- Positive control (Y/N)
- Colloidal stability checked (Y/N)
- Interference checked (Y/N)
- Animal?
- Organism
- Type
- Reference DOI

In [14]:
allign = {col : ols_lookup(np.unique(df[col])) for col in qual_cols}
all_keys = [allign[col].keys() for col in allign.keys()]
all_keys = [key_key for key in all_keys for key_key in key]
matches = {key:[] for key in all_keys}

In [32]:
select_col = Select(options = qual_cols,)
select_var = Select(options = allign[select_col.value].keys())
def update_var(*args):
    select_var.options = allign[select_col.value].keys()
select_var.observe(update_var)
    


input_iris = Textarea(layout = widgets.Layout(width='800px'))
map_button = Button(description = "Assign IRIs")
out = widgets.Output(layout={'border': '0px solid black'})



            
def display_mapping(*args):
    display(pd.DataFrame([allign[select_col.value][select_var.value]]).transpose())
    
def display_matches(*args):
    match_str = "\n- ".join(matches[select_var.value])
    print("Matches for {}:\n- {}".format(select_var.value, match_str))

def map_click(*args):
    with out:
        out.clear_output()
        if "http" in input_iris.value and input_iris.value not in matches[select_var.value]:
            matches[select_var.value] += list(np.unique(input_iris.value.split("\n")))
            display(Markdown("{} was assigned IRI(s): {}".format(select_var.value, list(np.unique(input_iris.value.split("\n"))))))
            display_matches()

            

@interact(select_var = select_var, input_iris = input_iris, select_col = select_col, allign = allign)
def show_matches(select_col, select_var, input_iris):
    display(out)
    display(map_button)
    iris = map_button.on_click(map_click)
    display(Markdown("#### Click the button to assign IRIs in `input_iris` to the highlighted term {}.".format(select_var)))
    display(Markdown("------"))
    display(Markdown("Below are the **eNanoMapper ontology matches** (label, IRI) returned by the OLS for **{} ({})**".format(select_var, select_col))) 
    display_mapping()
#ToDo: display the about text or description for the term? also show superclass of the term? // load the existent pickle    

interactive(children=(Select(description='select_col', options=('Nanoparticle', 'coat', 'Cells', 'Cell line (L…

In [33]:
for key in matches.keys():
    display(Markdown("#### {} : \n\t- {}".format(key, "\n\t- ".join(matches[key]))))

#### Ag : 
	- 

#### Al2O3 : 
	- 

#### Au : 
	- http://purl.bioontology.org/ontology/npo#NPO_401

#### Bi : 
	- http://purl.enanomapper.org/onto/ENM_9000247

#### Carbon NP : 
	- 

#### Carbon Nanotubes : 
	- http://purl.bioontology.org/ontology/npo#NPO_606

#### CdO : 
	- http://purl.enanomapper.org/onto/ENM_9000250

#### CeO2 : 
	- 

#### Chitosan : 
	- http://purl.bioontology.org/ontology/npo#NPO_261

#### Co : 
	- 

#### Co3O4 : 
	- http://purl.enanomapper.org/onto/ENM_9000254

#### Cr : 
	- 

#### Cu2O : 
	- http://purl.obolibrary.org/obo/CHEBI_134402

#### CuO : 
	- http://purl.obolibrary.org/obo/CHEBI_83159

#### CuS : 
	- http://purl.enanomapper.org/onto/ENM_9000246

#### Dendrimer : 
	- http://purl.bioontology.org/ontology/npo#NPO_735

#### Eudragit RL : 
	- 

#### Hydroxyapatite : 
	- http://purl.bioontology.org/ontology/npo#NPO_1568

#### Iron oxide : 
	- http://purl.bioontology.org/ontology/npo#NPO_729

#### Liposomes : 
	- 

#### MgO : 
	- http://purl.enanomapper.org/onto/ENM_9000252

#### MnO : 
	- http://purl.enanomapper.org/onto/ENM_9000251

#### Mo : 
	- http://purl.enanomapper.org/onto/ENM_9000253

#### PLGA : 
	- http://purl.bioontology.org/ontology/npo#NPO_1559

#### Polystyrene : 
	- http://purl.obolibrary.org/obo/CHEBI_134403

#### Pt : 
	- 

#### QDs : 
	- 

#### SLN : 
	- 

#### Se : 
	- http://purl.enanomapper.org/onto/ENM_9000244

#### SiO2 : 
	- http://purl.obolibrary.org/obo/CHEBI_30563

#### Ti : 
	- http://purl.enanomapper.org/onto/ENM_9000245

#### TiO2 : 
	- http://purl.obolibrary.org/obo/CHEBI_51050

#### ZnO : 
	- http://purl.bioontology.org/ontology/npo#NPO_1542

####  : 
	- 

#### 3-mercaptopropionic acid (COOH) : 
	- 

#### BSA : 
	- 

#### COOH : 
	- 

#### COONa : 
	- 

#### CTAB : 
	- 

#### Citrate : 
	- 

#### Citrate and PVP : 
	- 

#### D-penicillamine (NH2/COOH) : 
	- 

#### Dextran : 
	- 

#### Digestive enzymes : 
	- 

#### Folic acid : 
	- 

#### Gum Arabic : 
	- 

#### Hyaluronic acid : 
	- 

#### Hyaluronic acid  : 
	- 

#### NH2 : 
	- 

#### PEG : 
	- 

#### PEG to the PEI : 
	- 

#### PEG-COOH : 
	- 

#### PEG-NH2 : 
	- 

#### PEG-OCH3 : 
	- 

#### PEI : 
	- 

#### PVA : 
	- 

#### PVP : 
	- 

#### Phosphonate : 
	- 

#### Poloxamer 188 (Pluronic F68) : 
	- 

#### SGF : 
	- 

#### SO3Na : 
	- 

#### Sodium borohydride : 
	- 

#### Star anise : 
	- 

#### Starch : 
	- 

#### TGA : 
	- 

#### TGA-gelatine : 
	- 

#### Tween 80 : 
	- 

#### Zn then cysteamine : 
	- 

#### alginate : 
	- 

#### alumina-simethicone : 
	- 

#### cysteamine : 
	- 

#### cysteamine (NH2) : 
	- 

#### dimercaptosuccinic : 
	- 

#### folic acid with intermediate inorganic (silica) coating : 
	- 

#### folic acid with intermediate organic (PEG) coating : 
	- 

#### l-cysteine l-lysine l-lysine : 
	- 

#### metal catalyst residues : 
	- 

#### silica : 
	- 

#### simethicone then esters on top : 
	- 

#### A431 : 
	- http://www.ebi.ac.uk/efo/EFO_0006268

#### A549 : 
	- http://purl.obolibrary.org/obo/BTO_0000018

#### AGS : 
	- http://www.ebi.ac.uk/efo/EFO_0002109

#### B cells : 
	- 

#### BEAS­2B : 
	- 

#### C17.2 : 
	- 

#### C18–4 : 
	- 

#### CD3+ T cells : 
	- 

#### CD4+T cells : 
	- 

#### CDBgeo : 
	- 

#### CHO22 : 
	- 

#### CHO­K1 : 
	- 

#### COS­1 : 
	- 

#### Caco­2 : 
	- 

#### Calu­3 : 
	- 

#### Clone­9 : 
	- 

#### Colo­205 : 
	- 

#### ECV304 : 
	- 

#### EJ28 : 
	- 

#### Fibroblasts : 
	- 

#### GH3 : 
	- 

#### H4 : 
	- http://www.ebi.ac.uk/efo/EFO_0002184

#### HAEC : 
	- http://purl.obolibrary.org/obo/BTO_0004602

#### HCMEC : 
	- 

#### HDF : 
	- 

#### HEK : 
	- 

#### HEK­293 : 
	- http://purl.obolibrary.org/obo/BTO_0000007

#### HEp­2 : 
	- 

#### HMEC­1 : 
	- 

#### HMM : 
	- 

#### HUVEC : 
	- http://www.ebi.ac.uk/efo/EFO_0002795
	- http://www.ebi.ac.uk/efo/EFO_0002795

#### HaCaT  : 
	- http://www.ebi.ac.uk/efo/EFO_0002056

#### HeLa : 
	- http://www.ebi.ac.uk/efo/EFO_0001185

#### HepG2 : 
	- http://www.ebi.ac.uk/efo/EFO_0001187

#### IMR90 : 
	- http://www.ebi.ac.uk/efo/EFO_0001196

#### IP15 : 
	- 

#### J774 : 
	- http://purl.obolibrary.org/obo/BTO_0002279

#### KEC : 
	- 

#### L929 : 
	- 

#### LLC­PK1 : 
	- 

#### LoVo : 
	- http://www.ebi.ac.uk/efo/EFO_0006639

#### Lymphocytes : 
	- 

#### L­02 : 
	- 

#### MCF7 : 
	- http://www.ebi.ac.uk/efo/EFO_0001203

#### MDA­MB­231 : 
	- 

#### MDBK : 
	- 

#### MDCK : 
	- 

#### MEF : 
	- 

#### MG­63 : 
	- 

#### Macrophages : 
	- 

#### Memory T­cell : 
	- 

#### Monocytes : 
	- 

#### NCIH441 : 
	- 

#### NK cells : 
	- 

#### NR8383 : 
	- 

#### Naive T­cell : 
	- 

#### Neuro­2a : 
	- 

#### PAECs : 
	- 

#### PC12 : 
	- http://www.ebi.ac.uk/efo/EFO_0001225

#### PC3 : 
	- http://www.ebi.ac.uk/efo/EFO_0002074

#### PMA activated THP­1 : 
	- 

#### RAW 264.7 : 
	- 

#### SH­SY5Y : 
	- 

#### SKOV­3 : 
	- 

#### SK­BR­3 : 
	- 

#### SK­Mel­28 : 
	- 

#### SVEC4­10 : 
	- 

#### SW480 : 
	- http://www.ebi.ac.uk/efo/EFO_0002083

#### T cells (all types) : 
	- 

#### T98G : 
	- http://www.ebi.ac.uk/efo/EFO_0002085

#### TD : 
	- 

#### THP­1 : 
	- 

#### UM­UC­3 : 
	- 

#### V14 : 
	- 

#### VERO : 
	- 

#### hMSCs : 
	- 

#### hTERT­BJ1 : 
	- 

#### primary alveolar Epithelial cells : 
	- 

#### primary alveolar Macrophage : 
	- http://purl.obolibrary.org/obo/CL_1001603

#### primary alveolar epithelial cells : 
	- http://www.ebi.ac.uk/efo/EFO_0005728

#### primary tissue Macrophage : 
	- 

#### L : 
	- 

#### P : 
	- 

#### A : 
	- 

#### H : 
	- 

#### Endothelial : 
	- 

#### Endothelial-like : 
	- 

#### Epithelial : 
	- 

#### Epithelial-like : 
	- 

#### Fibroblast : 
	- 

#### Irregular : 
	- 

#### Keratinocyte : 
	- 

#### Lymphoblast : 
	- 

#### Macrophage : 
	- 

#### Mesenchymal : 
	- 

#### Monocyte : 
	- 

#### Monocyte/Macrophage : 
	- 

#### Neuronal : 
	- 

#### Polygonal : 
	- 

#### Spindle : 
	- 

#### Adrenal gland : 
	- 

#### Aorta : 
	- 

#### Areolar tissue : 
	- 

#### Blood : 
	- 

#### Bone : 
	- 

#### Bone Marrow : 
	- 

#### Bone marrow : 
	- 

#### Brain : 
	- 

#### Breast : 
	- 

#### Cervix : 
	- 

#### Colon : 
	- http://purl.obolibrary.org/obo/BTO_0000269

#### Foreskin : 
	- 

#### Heart : 
	- 

#### Kidney : 
	- 

#### Liver : 
	- 

#### Lung : 
	- http://purl.obolibrary.org/obo/BTO_0000763

#### Ovary : 
	- 

#### Prostate : 
	- 

#### Skin : 
	- 

#### Testis : 
	- 

#### Umbilical cord : 
	- 

#### Urinary bladder : 
	- 

#### adrenal gland : 
	- 

#### axillary lymph node : 
	- 

#### embryo : 
	- 

#### kidney : 
	- 

#### liver : 
	- 

#### lung : 
	- 

#### pituitary gland : 
	- 

#### stomach : 
	- 

#### ATP : 
	- http://purl.obolibrary.org/obo/OBI_0002175

#### ATPLite : 
	- 

#### Alamar Blue : 
	- http://purl.enanomapper.org/onto/ENM_8000224

#### Alamar blue : 
	- http://purl.enanomapper.org/onto/ENM_8000224

#### ApoTox­Glo™ Triplex : 
	- 

#### CellTiter­Blue : 
	- 

#### CellTiter­Blue  : 
	- 

#### CellTiter­Glo : 
	- 

#### CytoTox­One™ : 
	- 

#### LDH : 
	- http://purl.enanomapper.org/onto/ENM_8000269

#### Live/Dead : 
	- 

#### Live/Dead  : 
	- 

#### MTS : 
	- http://purl.enanomapper.org/onto/ENM_8000275

#### MTT : 
	- http://purl.bioontology.org/ontology/npo#NPO_1911

#### Modified MTT assay (MTT­formazan ppt dissolving by ethanol) : 
	- 

#### Modified MTT assay (MTT­formazan ppt dissolving by isopropanol/HCl) : 
	- 

#### NR : 
	- 

#### Promegas CTB Assay : 
	- 

#### Resazurin : 
	- http://purl.obolibrary.org/obo/CHEBI_8806

#### Vialight : 
	- 

#### WST­1 : 
	- 

#### WST­8 : 
	- 

#### XTT : 
	- 

#### Annexin V : 
	- 

#### Calcein AM, Ethidium homodimer-1 : 
	- 

#### Calcein AM, propidium iodide : 
	- 

#### Caspase-Glo® 3/7 Reagent : 
	- 

#### Hoechst 33358, propidium iodide : 
	- 

#### LDH activity assay kit : 
	- 

#### Propidium iodide : 
	- 

#### Sytox Red : 
	- 

#### Trypan Blue : 
	- 

#### calcein AM : 
	- 

#### luciferase : 
	- 

#### propidium iodide : 
	- 

#### propidium iodide, Annexin V : 
	- 

#### tetrazolium salt : 
	- 

#### toluylene red : 
	- 

#### trypan blue : 
	- http://purl.obolibrary.org/obo/CHEBI_78897

#### ATP content : 
	- 

#### LDH leakage : 
	- 

#### cell membrane integrity : 
	- 

#### cell metabolic activity : 
	- 

#### lysosomal uptake : 
	- 

#### protease activity : 
	- 

In [42]:
missing = [key for key in matches.keys() if len(matches[key])==0]
str_missing = "- " + "\n- ".join(missing)
display(Markdown("Missing terms: {} out of {}".format(len(missing), len(matches.keys()))))
display(Markdown(f"Missing terms in eNanoMapper:\n {str_missing}"))
with open("missing.txt", "w") as f:
    for miss in missing:
        f.write("\n{}".format(miss))
with open("matches.json", "w") as json_file:
    json.dump(matches, json_file)

Missing terms: 202 out of 254

Missing terms in eNanoMapper:
 - Ag
- Al2O3
- Carbon NP
- CeO2
- Co
- Cr
- Eudragit RL
- Liposomes
- Pt
- QDs
- SLN
- 
- 3-mercaptopropionic acid (COOH)
- BSA
- COOH
- COONa
- CTAB
- Citrate
- Citrate and PVP
- D-penicillamine (NH2/COOH)
- Dextran
- Digestive enzymes
- Folic acid
- Gum Arabic
- Hyaluronic acid
- Hyaluronic acid 
- NH2
- PEG
- PEG to the PEI
- PEG-COOH
- PEG-NH2
- PEG-OCH3
- PEI
- PVA
- PVP
- Phosphonate
- Poloxamer 188 (Pluronic F68)
- SGF
- SO3Na
- Sodium borohydride
- Star anise
- Starch
- TGA
- TGA-gelatine
- Tween 80
- Zn then cysteamine
- alginate
- alumina-simethicone
- cysteamine
- cysteamine (NH2)
- dimercaptosuccinic
- folic acid with intermediate inorganic (silica) coating
- folic acid with intermediate organic (PEG) coating
- l-cysteine l-lysine l-lysine
- metal catalyst residues
- silica
- simethicone then esters on top
- B cells
- BEAS­2B
- C17.2
- C18–4
- CD3+ T cells
- CD4+T cells
- CDBgeo
- CHO22
- CHO­K1
- COS­1
- Caco­2
- Calu­3
- Clone­9
- Colo­205
- ECV304
- EJ28
- Fibroblasts
- GH3
- HCMEC
- HDF
- HEK
- HEp­2
- HMEC­1
- HMM
- IP15
- KEC
- L929
- LLC­PK1
- Lymphocytes
- L­02
- MDA­MB­231
- MDBK
- MDCK
- MEF
- MG­63
- Macrophages
- Memory T­cell
- Monocytes
- NCIH441
- NK cells
- NR8383
- Naive T­cell
- Neuro­2a
- PAECs
- PMA activated THP­1
- RAW 264.7
- SH­SY5Y
- SKOV­3
- SK­BR­3
- SK­Mel­28
- SVEC4­10
- T cells (all types)
- TD
- THP­1
- UM­UC­3
- V14
- VERO
- hMSCs
- hTERT­BJ1
- primary alveolar Epithelial cells
- primary tissue Macrophage
- L
- P
- A
- H
- Endothelial
- Endothelial-like
- Epithelial
- Epithelial-like
- Fibroblast
- Irregular
- Keratinocyte
- Lymphoblast
- Macrophage
- Mesenchymal
- Monocyte
- Monocyte/Macrophage
- Neuronal
- Polygonal
- Spindle
- Adrenal gland
- Aorta
- Areolar tissue
- Blood
- Bone
- Bone Marrow
- Bone marrow
- Brain
- Breast
- Cervix
- Foreskin
- Heart
- Kidney
- Liver
- Ovary
- Prostate
- Skin
- Testis
- Umbilical cord
- Urinary bladder
- adrenal gland
- axillary lymph node
- embryo
- kidney
- liver
- lung
- pituitary gland
- stomach
- ATPLite
- ApoTox­Glo™ Triplex
- CellTiter­Blue
- CellTiter­Blue 
- CellTiter­Glo
- CytoTox­One™
- Live/Dead
- Live/Dead 
- Modified MTT assay (MTT­formazan ppt dissolving by ethanol)
- Modified MTT assay (MTT­formazan ppt dissolving by isopropanol/HCl)
- NR
- Promegas CTB Assay
- Vialight
- WST­1
- WST­8
- XTT
- Annexin V
- Calcein AM, Ethidium homodimer-1
- Calcein AM, propidium iodide
- Caspase-Glo® 3/7 Reagent
- Hoechst 33358, propidium iodide
- LDH activity assay kit
- Propidium iodide
- Sytox Red
- Trypan Blue
- calcein AM
- luciferase
- propidium iodide
- propidium iodide, Annexin V
- tetrazolium salt
- toluylene red
- ATP content
- LDH leakage
- cell membrane integrity
- cell metabolic activity
- lysosomal uptake
- protease activity

Next: save a JSON with the shape:

In [50]:
nanomaterial_dict = {"nanomaterial":
                     {"iri": "", 
                      "core":
                          {"label":"", 
                           "smiles":""}, 
                      "coating":
                          {"label":"", 
                           "smiles":"", 
                           "iri":""}}}

In [54]:
endpoint_dict = {}

In [56]:
bioassay_dict = {}

In [57]:
material_dict = {}

In [58]:
metadata_dict = {}