# RDFy curated data for the use case: Titanium dioxide-silica nanocomposite

This notebook uses the Google API client library to retrieve the [spreadsheet](https://docs.google.com/spreadsheets/d/13dqwura-jSnGMVBSO7pVXfRgbPegGC7QNoSLNAFls3A/) with the curated data for the TiO2/SiO2 SbD4Nano case (CreativeNano)

## Imports and configuration

In [1]:
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os.path
import pickle
import pandas as pd
import yaml


Each page in the spreadsheet is retrieved according to the range defined in the `config.yaml` config file.

In [2]:
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f.read())
    SCOPES = config['SCOPES']
    SPREADSHEET_ID = config['SPREADSHEET']
    RANGE_CAUSAL_ASSERTIONS = config['CAUSAL_ASSERTIONS']
    RANGE_MATERIALS = config['MATERIALS']
    RANGE_ASSAYS = config['ASSAYS']
    RANGE_NODE_BREAKDOWN = config['NODE_BREAKDOWN']
    RANGE_MEASUREMENT_GROUPS = config['MEASUREMENT_GROUPS']
    RANGE_QUOTES = config['QUOTES']

## Retrieve data
The following function is used over all ranges in [`config.yaml`]('config.yaml') to retrieve the following data as pandas data frames: 
- `causal_assertions`: the _nodes_ of causal relationships supported by the `quotes` and all other supporting data
- `quotes`: the quotes stating causal relationships, together with their source and the `cito:citesAsReference` statement, if applicable
- `node_breakdown`: TBD
- `assays`: characterization of the assays realized on `materials` to observe the results that led to the formulation of `causal_assertions`
- `measurement_groups`: characterization of the biological systems tested in the `assays`
- `materials`: characterization of the materials used in the `assays` on the `measurement_groups`

TBD: schema of the relations between tables

In [3]:
def get_google_sheet(sheet, spreadsheet_id=SPREADSHEET_ID):
    creds = None
    range_name = globals()['RANGE_{}'.format(sheet)]
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server()
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    service = build('sheets', 'v4', credentials=creds)
    gsheet = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range=range_name).execute()
    values = gsheet.get('values', [])
    if not values:
        print('No data found.')
    else:
        df = pd.DataFrame(values, columns=values[0])
        return df


In [4]:
quotes = get_google_sheet('QUOTES')
assertions = get_google_sheet('CAUSAL_ASSERTIONS')
materials = get_google_sheet('MATERIALS')
assays = get_google_sheet('ASSAYS')
measurement_groups = get_google_sheet('MEASUREMENT_GROUPS')
node_breakdown = get_google_sheet('NODE_BREAKDOWN')

### Quotes
TBD

In [5]:
quotes.describe()

Unnamed: 0,quote_id,quote,doi,cito:containsAssertionFrom,review,comment
count,157,156.0,156,156.0,156,21
unique,157,131.0,47,45.0,4,18
top,quote_id,,https://doi.org/10.3390/nano9071041,,no,assay-syn
freq,1,26.0,20,107.0,120,3


In [16]:
quotes.sample(5)

Unnamed: 0,quote_id,quote,doi,cito:containsAssertionFrom,review,comment
154,Q154,Data were also confirmed by the Annexin V/PI t...,https://doi.org/10.3390/nano9071041,,no,
43,Q43,"Glass, silicate, zeolite and ceramic supports ...",https://doi.org/10.1016/j.jhazmat.2007.12.047,,yes,
64,Q64,"In relation to these release studies, the rese...",https://doi.org/10.1088/1757-899X/1117/1/012029,"https://doi.org/10.1016/j.jhazmat.2007.12.047,49",no,
38,Q38,Acute oral toxicity studies have demonstrated ...,https://doi.org/10.1016/j.jhazmat.2007.12.047,,yes,
130,Q130,The only significant difference between the tw...,https://doi.org/10.1186/1743-8977-9-4,,no,


### Assertions
TBD

In [6]:
assertions.describe()

Unnamed: 0,Ref_quote,Reference,REVIEW?,quote,s,o,s_name,p,o_name,material,assay,measurement group,material_id,assay_id
count,51.0,51,51,50.0,50.0,50.0,50.0,50.0,50.0,50.0,50,50,22,18
unique,35.0,9,3,35.0,10.0,18.0,10.0,4.0,18.0,8.0,2,2,10,16
top,,https://doi.org/10.3390/nano9071041,no,,,,,,,,#REF!,#REF!,M5,A6
freq,6.0,22,44,5.0,25.0,25.0,25.0,25.0,25.0,30.0,49,49,9,2


In [17]:
assertions.sample(5)

Unnamed: 0,Ref_quote,Reference,REVIEW?,quote,s,o,s_name,p,o_name,material,assay,measurement group,material_id,assay_id
15,Q142,https://doi.org/10.1093/toxsci/kfj197,no,Nano-TiO2 anatase/rutile particles produced a ...,N74,N78,anatase phase TiO2,negatively_correlates,mitochondrial activity,placeholder:anatase-tio2,#REF!,#REF!,M5,A17
25,Q74,https://doi.org/10.3390/nano9071041,no,The physicochemical properties of NPs are resp...,,,,,,,#REF!,#REF!,,
17,Q13,https://doi.org/10.1093/toxsci/kfj197,no,"As Table 1 illustrates, those samples that wer...",N81,N20,photocatalytic potential,positively_correlates,toxicity,,#REF!,#REF!,,A6:A21
43,Q156,https://doi.org/10.3390/nano9071041,no,The increase of hydrodynamic diameter as a fun...,N68,N92,TiO2 to SiO2 ratio (placeholder),negatively_correlates,steric hindrance,3:1 nanocomposite of TiO2 in a Silica coat,#REF!,#REF!,"M9,M10",
50,,,,,,,,,,,,,,


### Materials
TBD

In [7]:
materials.describe()

Unnamed: 0,id,label,a,source,synthesis,q_synthesis,quote_synthesis,size,size_units,size_type,...,shape,surface_area,surface_area_units,coat,matrix,doping,ph,zeta_potential,zeta_potential_units,polidispersity_index
count,13,13,13,13,13.0,13.0,13.0,11.0,11,11.0,...,7.0,7.0,7,5.0,5.0,5.0,4.0,4,4,1
unique,11,9,10,2,5.0,8.0,8.0,10.0,3,6.0,...,3.0,5.0,3,2.0,2.0,3.0,2.0,4,2,1
top,M1,XX%TiO2NP@SiO2,placeholder:titanium-silica-nanocomposite,#REF!,,,,,nm,,...,,,m2/g,,,,,zeta_potential,mV,polidispersity_index
freq,2,2,2,12,8.0,4.0,4.0,2.0,8,3.0,...,4.0,3.0,3,4.0,4.0,3.0,3.0,1,3,1


In [18]:
materials.sample(5)

Unnamed: 0,id,label,a,source,synthesis,q_synthesis,quote_synthesis,size,size_units,size_type,...,shape,surface_area,surface_area_units,coat,matrix,doping,ph,zeta_potential,zeta_potential_units,polidispersity_index
12,M10,TiO2:SiO2 3:1,3:1 nanocomposite of TiO2 in a Silica coat,#REF!,,Q155,The following commercial materials were used t...,147.8±2.3,nm,hydrodynamic diameter (Z-average),...,,,,,,,,46.1±0.3,mV,
9,M7,rutile TiO2,UV-Titan L181,#REF!,,Q143,"In brief, the UV-Titan L181 (NanoTiO2) was a r...",20.6,nm,average crystallite size,...,,107.7,m2/g,,,"Si, Al, Zr, polyalcohol",,,,
4,M3,,,#REF!,,,,,,,...,,,,,,,,,,
7,M5,anatase TiO2,placeholder:anatase-tio2,#REF!,,,,10.1 ± 1.0,nm,average diameter,...,sphere,153.0,m2/g,,,,,,,
2,M1,XX%TiO2NP@SiO2,placeholder:titanium-silica-nanocomposite,#REF!,A1,Q138,TiO2 suspension was then added to 3 g of activ...,,,,...,,,,,,,,,,


### Assays
TBD

In [8]:
assays.describe()

Unnamed: 0,assay_id,source,guidance/sop/protocol,material,Unnamed: 5,type,short_description,e_id,endpoint_name,value,...,time,time_units,measurement_group,measurement_group_name,concentration,concentration_units,approx_from_figure,exposure_route,exposure_quantity,exposure_quantity_units
count,28,28,28,28,28,28,28.0,28,28,28.0,...,28,28,28,28,28.0,28.0,28,28,3,3
unique,24,4,6,7,5,10,8.0,16,10,12.0,...,4,4,5,5,5.0,4.0,3,3,2,2
top,A21,#REF!,none,M5,anatase TiO2,Human IL-8 Enzyme Immunometric Assay,,E13,spectroscopy IL-8,,...,48,h,G1,A549 cells,,,n,na,486,μg
freq,2,20,9,11,11,8,17.0,8,8,13.0,...,18,18,9,9,17.0,17.0,19,25,2,2


In [19]:
assays.sample(5)

Unnamed: 0,assay_id,source,guidance/sop/protocol,material,Unnamed: 5,type,short_description,e_id,endpoint_name,value,...,time,time_units,measurement_group,measurement_group_name,concentration,concentration_units,approx_from_figure,exposure_route,exposure_quantity,exposure_quantity_units
20,A19,#REF!,Human IL-8 Enzyme Immunometric Assay Kit (Assa...,M5,anatase TiO2,Human IL-8 Enzyme Immunometric Assay,,E13,spectroscopy IL-8,175,...,48.0,h,G2,HDF cells,3000.0,μg/ml,y,na,,
15,A15,#REF!,,M6,rutile TiO2,LDH release assay,,E11,,,...,48.0,h,G2,HDF cells,,,n,na,,
14,A14,#REF!,,M5,anatase TiO2,LDH release assay,,E10,,,...,48.0,h,G2,HDF cells,,,n,na,,
4,A4,#REF!,none,M3,,,,,,,...,,,,,,,n,na,,
26,A22,,total BAL cells,M7,rutile TiO2,,BAL reading after single intratracheal exposit...,E14,BAL,173600± 28600,...,28.0,day,G3,C57BL/6,10.0,% weight,n,single intratracheal instillation,486.0,μg


### Measurement groups
TBD

In [9]:
measurement_groups.describe()

Unnamed: 0,id,source,type,age
count,4,4.0,4,2
unique,4,2.0,4,2
top,id,,type,age
freq,1,3.0,1,1


In [22]:
measurement_groups

Unnamed: 0,id,source,type,age
0,id,source,type,age
1,G1,,A549 cells,
2,G2,,HDF cells,
3,G3,,C57BL/6,5-7 weeks


### Node breakdown

In [23]:
node_breakdown.describe()

Unnamed: 0,node,node_label,node_breakdown comment,IRI,Gracious,AOPWiki,exposure_step
count,94,91,47.0,42,10,2,1
unique,94,91,7.0,39,10,2,1
top,node,node_label,,http://purl.bioontology.org/ontology/npo#NPO_274,Gracious,AOPWiki,exposure_step
freq,1,1,41.0,2,1,1,1


In [24]:
node_breakdown.sample(5)

Unnamed: 0,node,node_label,node_breakdown comment,IRI,Gracious,AOPWiki,exposure_step
85,N85,BAL cell number,,,,,
67,N67,,,,,,
65,N65,organic matter (in medium),,,,,
4,N4,polydispersity,,http://purl.bioontology.org/ontology/npo#NPO_274,http://www.bigcat.unimaas.nl/sbd4nano/gracious...,,
42,N42,Phagocytosis,,,,,
