# SbD4Nano first survey

RDFy the spreadsheet curated at the beginning of the process for:
- Graphene-based NMs
- SiO2
- S-doped TiO2-SiO2
- ZnO
- Ag
- Fe3O4
- Al


## Background
More context needed (authors, timeline, purpose)?

## What's in this notebook

This notebook:
1.  Uses the Google API client library to retrieve the [spreadsheet](https://docs.google.com/spreadsheets/d/1VmZJErRoQi099PWj0m8oeMMvta5cjroxLNA3HrcLXCk) with the curated data
2.  Describes and characterizes the curated data set. This step includes the programatic retrieval of the license type for each used publication
3.  Guides through the process of RDF-ying using `rdflib`
4.  Performs some QC SPARQL queries on the generated graphs

## Status
This is a work in progress. 

## Imports and configuration

In [3]:
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os.path
import pickle
import pandas as pd
import yaml
from rdflib import Graph, URIRef, Literal, BNode, Namespace
from rdflib.namespace import DC, RDFS, FOAF, DCTERMS, VOID, RDF, XSD, OWL
import requests
import numpy as np

Each page in the spreadsheet is retrieved according to the range defined in the `config.yaml` config file.

In [4]:
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f.read())
    SCOPES = config['SCOPES']
    SPREADSHEET_ID = config['SPREADSHEET']
    RANGE_GO = config['GO']
    RANGE_SIO = config['SiO2']
    RANGE_SDOPED = config['SdopedTiO2SiO2']
    RANGE_ZNO = config['ZnO']
    RANGE_AG = config['Ag']
    RANGE_FEO = config['Fe3O4']
    RANGE_AL = config['Al']
    EMAIL = config['EMAIL']

In [5]:
def get_spreadsheet_names_in_worksheet(worksheet_id, credentials_file='token.pickle'):
    """Retrieves and prints the names of all spreadsheets within the specified worksheet using Google Sheets API v4.

    Args:
        worksheet_id (str): The ID of the worksheet you want to examine.
        credentials_file (str, optional): The path to the pickled credentials file. Defaults to 'token.pickle'.

    Returns:
        list: A list of spreadsheet names within the worksheet, or an empty list if no spreadsheets are found.

    Raises:
        HttpError: If an error occurs during API interaction.
    """

    try:
        # Load credentials from the pickle file
        with open(credentials_file, 'rb') as token:
            creds = pickle.load(token)

        # Build the Sheets service
        service = build('sheets', 'v4', credentials=creds)

        # Execute the request to get spreadsheet names
        sheets = service.spreadsheets().get(spreadsheetId=worksheet_id).execute()
        sheets = sheets.get('sheets', [])  # Handle edge case where there are no sheets

        # Extract and print spreadsheet names
        spreadsheet_names = [sheet['properties']['title'] for sheet in sheets]
        print("Spreadsheet names in the worksheet:")
        for name in spreadsheet_names:
            print(f"- {name}")

        return spreadsheet_names

    except HttpError as error:
        print(f"Error retrieving spreadsheet names: {error}")
        return []  # Indicate failure by returning an empty list

worksheet_id = SPREADSHEET_ID
spreadsheet_names = get_spreadsheet_names_in_worksheet(worksheet_id)


Spreadsheet names in the worksheet:
- silica
- Copia de silica
- Graphene based NMs
- sdoped
- ZnO
- Ag
- Fe3O4
- Al
- Cytotoxicity studies of NMs


## Retrieve data

In [6]:
def get_google_sheet(sheet, spreadsheet_id=SPREADSHEET_ID):
    creds = None
    range_name = globals()['RANGE_{}'.format(sheet)]
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server()
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
    service = build('sheets', 'v4', credentials=creds)
    gsheet = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range=range_name).execute()
    values = gsheet.get('values', [])
    if not values:
        print('No data found.')
    else:
        df = pd.DataFrame(values, columns=values[0]).drop(axis=1,index=0)
        return df


In [7]:
go = get_google_sheet('GO')
sio = get_google_sheet('SIO')
sdoped = get_google_sheet('SDOPED')
zno = get_google_sheet('ZNO')
ag = get_google_sheet('AG')
feo = get_google_sheet('FEO')
al = get_google_sheet('AL')

In [8]:
df = pd.concat([go,sio,sdoped,zno,ag,feo,al])
df.fillna(value=np.nan, inplace=True)
df = df.apply(lambda x: x.astype(str))
df.describe()

Unnamed: 0,DOI,Manuscript,Nanomaterial,Model-cell,Exposure Time,Dose,Index,Results,Notes,Quote,Unnamed: 11
count,99.0,99,99,99,99,99,99,99,99.0,99.0,99.0
unique,2.0,87,85,81,43,86,60,78,20.0,2.0,2.0
top,,"Hussain, S.M., Hess, K.L., Gearhart, J.M., Gei...",SWCNT,A549 cells- human lung cancer cell line,24 hours,"10, 25, 50, and 100 μg/mL",MTT,Dose (↑)= Cell viability (↓),,,
freq,76.0,4,4,4,25,4,19,14,80.0,98.0,81.0


In [9]:
df.sample(6)

Unnamed: 0,DOI,Manuscript,Nanomaterial,Model-cell,Exposure Time,Dose,Index,Results,Notes,Quote,Unnamed: 11
1,,"Monika Remzova, Radek Zouzelka, Tana Brzicova,...",TiO2,WST-1 (cell metabolism) and LDH (cell integrit...,,,,,,,
32,,"Uboldi, C., Giudetti, G., Broggi, F., Gillilan...","Pr-SiNPs (NM-200, 10–25nm) and Py-SiNPs (NM-20...",Immortalized balb/3T3-Mouse fibroblast,72 hours,1–100 µg/ml,"Cell Viability: MTT, Colony formation assay, C...",1. MTT: Reduction with 15 nm NPs ...,,,
11,,"Tedja, R., Marquis, C., Lim, M. and Amal, R., ...","TiO2 (20 nm, P25)",A549 and H1299- Human lung epithelial cell lines,24 hours,0.3–1000 μg/mL,"Trypan Blue, Alamar Blue and MTS",1.H1299 was more susceptible. ...,,,
2,,"Hussain, S.M., Hess, K.L., Gearhart, J.M., Gei...","Ag-15, 100 nm",BRL 3A- rat liver cells,24 hours,2.5-50 μg/ml,"MTT, LDH and ROS","Dose (↑)= Cell viability (↓), ROS (↑), LDH(↑)",100 nm was slightly more toxic,,
4,,"Alshatwi, A.A., Vaiyapuri Subbarayan, P., Rame...",Al (160 nm),HMSC-human bone marrow mesenchymal stem cells ...,12 hours,"25, 50, 100, 200, and 400 µg/mL",MTT,"Dose (↑)= Cell viability (↓), from 200µg/mL",,,
11,,"Tarantini, A., Lanceleur, R., Mourot, A., Lava...",SiNPs (15nm and 50nm),Caco2- Human colon ephitelial,"24, 48 and 72 hours",0.03–156.3 µg/cm2,"XTT assay (cell viability), ELISA and...",1. Dose-dependent reduction with 15 nm NPs (c...,,,


## Inspect and parse spreadsheet

The unique values for each column will be printed to determine a parsing strategy for each column.

### `Nanomaterial`

In [10]:
print("-","\n- ".join(list(set(df['Nanomaterial']))))

- TiO2 (25 nm?)
- TiO2 ( 12 to 140 nm); spherical or elongated ; TiO2–P25, TiO2-Sigma and TiO2-Sigma-R (elongated)
- Al (160 nm)
- SiNPs (30 nm)
- SiNPs, Aerosil® 200-Aerosil® Ox50 (12 and 40 nm) and S-SiNPs (200 nm)
- Al (8-12 nm )
- TiO2 (50nm)
- Multi-walled carbon nanotubes (CNT), with/out Fe
- GO (2.496 nm)
- TiO2 (21nm)
- Ag-20 and 40 nm
- SWCNT and quartz in serum containing (5%) and serum-free (0%) medium.
- SiC nanoparticles (17 nm)
- TiO2 (12 to 140 nm); anatase and rutile
- M-SiNPs (25 and 100 nm)
- SiO2 (–methyl)
- SiNP (30nm)
- Ag-15, 100 nm
- Fe3O4 (25 nm)
- GO and carboxyl graphene (CXYG)
- SWCNT
- Carbon Black (CB) and Graphene nanoplate (GP)
- SiNP (62nm)
- TiO2; anatase (100 ± 10, 25 ± 5, and 5 ± 1 nm) and rutile (100 ± 10 nm)
- Ag- 30 and 50 nm 
- TiO2 (20 nm, P25)
- Al (40–47 nm)
- GO (untreated)                      -sonicated GO                         -Base washed GO                   -s & bw GO                                   
- ZnO (50–70 nm)
- TiO2 (5-6nm)
-

### `Model-cell`

In [11]:
print("-","\n- ".join(list(set(df['Model-cell']))))

- A549 and BEAS-2A cells
- Caco2- Human colon ephitelial
- A549, U-87 MG, HepG2 and HL-60
- H596, H446, and Calu-1-lung epithelial cell lines
- Kupffer cells-Rat liver macrophage
,BRF- Rat liver cells
- Primary microglial cells-Rat brain macrophage like cells
- THP-1
- Neonatal HEKs
- EA.hy926-Human endothelial
- A431-human epidermal cells
- HBMVECs-human brain microvascular endothelial cells
- A549 -human alveolar carcinoma epithelial cell line
- A549-Human type II alveolar epithelial, A431-Human skin epithelial
- A549 & ISO HAS-1
- Primary Microglial cells-Rat macrophage like cells
- HaCaT-human epidermal keratinocytes
- MSTO-211H-Mesothelioma cell line
- A549-human lung carcinoma cell line
- HepG2-human liver epithelial 
- human hepatoma cell line, Hep G2
- V79- Hamster lung fibroblast,                     A549-Human type II alveolar epithelial
- U373MG human glioblastoma cell line
- BEAS-2B-human bronchial epithelial cell
- A549-Human type II alveolar epithelial, MeT-5A-Human bronc

### `Exposure Time`

In [12]:
print("-","\n- ".join(list(set(df['Exposure Time']))))

- 
- 24 and 48 hours
- 72 hours
- 4 days
- 12 hours
- 24, 48 and 72 h
- 48 hours
- 0.5–3 hours (0 minute, 30 minutes, 3 hours)
- 18 Hours
- 1.24 hours, 2.4–72 hours, 3.10 days in vivo, 4.72 hours
- 24h, 48h and 72h
- 24 hours
- 30 min
- 12–48 hours
- 4 hours
- 24, 48 and 72 hours
- 4-48 hours
- 4-20 h
- 1, 2, 4, 8, 12, 24, and 48 h 
- 72 hours
- Cultured for 7 days- 24 hours
- 24 hours
- 6-48 hours
- 24 and 48 hours
- 4 hours
- -
- 24 and 72 hours
- 24hours
- 72 Hours
- 1-4 days
- ALI (5 and 7 h), Sub-merged (24 h)
- 6, 12, 24 and 48 hours
- 24 Hours
- 12–24 hours
- nan
- 24, 48 and 72 hours
- 24 h
- 25 hours
- 3 and 24 h
- 6, 12 and 24 hours 
- 1, 2, 4, 8, 12, 24 and 48 h
- 3- 6 hours
- 6, 24 and 48 hours


### `Dose`

In [13]:
print("-","\n- ".join(list(set(df['Dose']))))

- 1–200 μg/ml
- 0.0–100 µg/ml
- 25–100 µg/ml
- 25–1000 µg/ml
- 0, 2.5, 5.0, 10.0 μg/mL
- 2.5-50 μg/ml
- 10–150 µg/ml
- 1, 5 and 25 µg/ml
- 6.25–100 µg/m (with LPS and PGN)
- 0, 1, 10, 100, and 400 g/mL
- 10 µg/ml
- 5 µg/cm2 of plate surface
- 12.5, 25, 50, 100, 200, 400 and 800 µg/mL
- nan
- 10, 50, 100, 250 and 500 μg/ml
- 10 µg/cm2
- 0.1, 0.2, and 0.4 mg/ml 
- 12–200 µg/ml
- 50-200 μg/ml
- 52 μg/cm2 and 117 μg/cm2 (ALI), 15.6 μg/cm2 (sub-merged)
- 25, 75, 100, 200, 400 and 500 µg/ml
- 12.5–100 µg/ml
- 0.06, 0.12 and 0.24 mg/ml
- 0.25–100 μg/mL
- 0.6–600 µg/ml
- 100 µg/ml
- 3.125-200 μg/mL
- 250 and 500 mg/ml
- SiO2EN100R- 3-30 mg/mL             SiO2EN100(−)-2-20  mg/mL       SiO2EN20R- 0.3-3.0 mg/mL       SiO2EN20(−)- 0.2-2.0 mg/mL
- 10 and 50 μg/ml
- 250 g μg/L
- MW1–4: 1–100 mg/mL        MW5 and 6: 0.5–50 mg/
mL
- 0.002-0.2 µg/mL
- 1-10 µM
- 0, 1, 10, 25, 50, and 100 µg/mL 
- 200 μg/ml
- 0.1–10 μg/ml
- 12.5–100 µg/cm2
- CB at 1 μg/cm2 and 5 μg/cm2. GP at a concentration range of 1 

### `Index` (Assay)

In [14]:
print("-","\n- ".join(list(set(df['Index']))))

- 
- XTT assay (cell viability),          ELISA and caspase-3-assay (apoptosis)
- LDH assay
- MTT and LDH assay
- Colony formation assays, cell viability and cytotoxicity, morphology
- 2-(4-iodophenyl)-3-(4-nitrophenyl)-5-(2,4-disulfophenyl)-2H-tetrazolium monosodium salt (WST-1)-based assay
- WST-1
- Alamar Blue, Neutral Red, coomissia blue and MTT
- MTT and WST
- FACS (cell viability) and GSH (oxidative stress)
- CCK-8 assay, LDH
- WST-8, Trypan Blue and MTS
- WST-8, sulforhodamine B assay, Propidium iodide (PI) uptake
- PI staining
- LDH, GSH and membrabe integrity, Cytokine Bead Array, ELISA
- Trypan Blue,  Alamar Blue and MTS 
- *CellTiter-Glo luminescent cell viability assay            *Fluorometric terminal deoxynucleotidyl transferase-mediated dUTP nick end labeling (TUNEL) System                                                   *DAPI
- flow cytometry , WST-1 assay, Trypan blue staining.
- MTT, ROS, comet assay, acridine orange staining
- Cell viability methods :SRB, Impedance

### `Result` (the observed outcome)

In [16]:
print("-","\n- ".join(list(set(df['Results']))))

- Time & Dose (↑)= Cell viability (↓) 
- 1. 20nm more cytotoxic than 100nm                                                                2. Treatment with 6 mg/mL of SiO2EN100(R) and SiO2EN100(−) reduced the viability by 68% and 65%, respectively                                                                          3. T0.6 and 0.9 mg/mL of SiO2EN20(R) reduced the viability  by 90% and 98%,                                                                                                                 4. 0.6 and 0.8 mg/mL of SiO2EN20(−) reduced the viability by 23% and 96%,                                                                                                                         5. Both SiO2 activated caspase-3 and induced DNA fragmentation in U373MG cells, suggesting the induction of apoptosis.                                                        
- Dose-dependent reduction in BRL
- Dose (↑)= Cell viability (↓) 
- Dose (↑)= Cell viability (↓), after 3μg/cm2 is signifi

In [18]:
print("-","\n- ".join(list(set(df['Notes']))))

- 
- air–liquid interphase (ALI) , pyrogenic samples Py-SiNP
- PS80:polyoxyethylene sorbitan monooleate
- the toxicity of carbon nanotubes increases
significantly when carbonyl (CdO), carboxyl (COOH), and/
or hydroxyl (OH) groups are present on their surface.
- Colloidal-SiNP
- brookite‐type (BTO) and anatase‐type (ATO) nanorods
- 100 nm was slightly more toxic
- NOTE:The ideal test for in vitro cell cytotoxicity must not interfere with the compound to be tested. The indicator dyes used in this study (Commassie, AB, NR, MTT and WST-1) are not appropriate for the quantitative toxicity assessment of carbon nanotubes. 
- 20 nm was slightly more toxic
- Single wall (NT1), multi wall (NT2-3)
- deferoxamine (DFO)
- Colloidal-SiNP and Mesaporous-SiNP
- In combination with lung surfactant, aSNP–plain revealed an increased cytotoxicity in monocultures of A549, aSNP–NH2 caused a slightly augmented toxic effect, whereas aSNP–COOH did not show any toxic alterations. 
- Oxidative stress techniques:

In [25]:
for i,row in df.iterrows():
    # See how many materials are described
    material_row = str(row['Nanomaterial'])
    if '(' in material_row:
        if ',' in material_row:
            material_row = material_row.split(") ")
            print(material_row)
    if ',' in material_row:
        material_row =  material_row.split(',')
        for i in range(len(material_row)):
            if 'and' in material_row[i]:
                material_row.append(material_row[i].split(' and '))
            if '&' in material_row[i]:
                material_row.append(material_row[i].split(' & '))                
    if '&' in material_row:
        material_row = material_row.split(' & ')
    elif 'and' in material_row:
        material_row = material_row.split(' and ')
    else:
        material_row = [material_row]
    material_row = [i for i in material_row if type(i)!=list]
    print(material_row, type(material_row))


['Reduced GO'] <class 'list'>
['Graphene NMs (GNPs–pristine, COOH and NH2', 'and GOs (SLGO and FLGO', '450 nm']
[] <class 'list'>
['Reduced GO'] <class 'list'>
['GO (2.496 nm)'] <class 'list'>
['Carbon Black (CB)', 'Graphene nanoplate (GP)'] <class 'list'>
['MWCNT (MW1, MW2, MW3, MW4, MW4, MW5, MW6)- regarding the shape and diameter']
[] <class 'list'>
['GO', 'carboxyl graphene (CXYG)'] <class 'list'>
['GO (pGO-5, probe sonicated 5 min', 'and GS (Graphene sheet)']
[] <class 'list'>
['GO (untreated)                      -sonicated GO                         -Base washed GO                   -s', 'bw GO                                   '] <class 'list'>
[] <class 'list'>
['Multi-walled carbon nanotubes (CNT), with/out Fe']
[] <class 'list'>
['SWCNT'] <class 'list'>
['MWCNTs'] <class 'list'>
['MWCNTs'] <class 'list'>
['SWCNT', 'quartz in serum containing (5%)', 'serum-free (0%) medium.'] <class 'list'>
[] <class 'list'>
['SWCNT'] <class 'list'>
['NT-1, NT-2, NT-3 and SWCNT , carbon black

## Retrieve licenses

This step retrieves the licenses for the journal articles from unpaywall.

In [None]:
def check_open_access(doi):
    id = doi.split('.org/')[1]
    api_url = 'https://api.unpaywall.org/v2/{}?email={}'.format(id,EMAIL)
    try:
        response = requests.get(api_url).json()
        if response['is_oa'] == True:
            try:
                license = response['best_oa_location']['license']
            except:
                license = 'NA/closed'
            return True, license
        else:
            return False, 'closed'
    except Exception as e:
        print('Exception for: {}. Could not retrieve {})'.format(api_url, e))
        return 'NA'

for doi in set(assertions['doi']):
    if isinstance(doi, str) and 'https' in doi:
        is_oa = check_open_access(doi)
        if isinstance(is_oa[0], bool):
            assertions.loc[assertions['doi'] == doi, 'is_oa'] = is_oa[0]
            assertions.loc[assertions['doi'] == doi, 'license'] = is_oa[1]


  assertions.loc[assertions['doi'] == doi, 'is_oa'] = is_oa[0]
  assertions.loc[assertions['doi'] == doi, 'license'] = is_oa[1]


Plotting the different license types, we see a majority of non-open papers:

## Define RDF namespaces




In [None]:
# TODO define sbdbel properly (ontological modelling of stated correlations? stated effects? need to discuss)

In [None]:
sbd = Namespace('https://www.sbd4nano.eu/rdf/#')
sbdbel = Namespace('https://www.sbd4nano.eu/bel/#')
ECO = Namespace('https://evidenceontology.org/#')
kb = Namespace('https://h2020-sbd4nano.github.io/sbd-data-landscape/')
enm = Namespace('http://purl.enanomapper.org/onto/')
ncit = Namespace('http://purl.obolibrary.org/obo/NCIT')
npo = Namespace('http://purl.bioontology.org/ontology/npo#')
pato = Namespace('http://purl.org/obo/owl/PATO#')
cito = Namespace('http://purl.org/spar/cito/')
gracious = Namespace('https://h2020-sbd4nano.github.io/sbd4nano-gracious-owl/gracious.html#')
aop_event = Namespace('https://identifiers.org/aop.events/')
bio = Namespace('http://purl.jp/bio/4/id/')
sio = Namespace('http://semanticscience.org/resource/')
efo = Namespace('http://www.ebi.ac.uk/efo/')
penm = Namespace('http://purl.placeholder.enanomapper.net/')

## RDFy

## QC SPARQL Queries