# Prepare loading of data objects to App DB

A curated list of data objects referenced from a set of publications is formated to facilitate loading into the App DB.

Instead of only referencing to data, these process refer to **data objects**, which are any data which is published to complement the publication, this includes raw data, supplementary data, processing data, tables, images, movies, and compilations containing one or more of such resources (corrections to publications may fall in this category but need to discuss it with stakeholders).

The operations to be performed are: 
- get metadata from objects identifed with DOIs and arrange it in a way that it can be loaded to the AppDB.
- format all objects without DOI (mostly supplementary materials) to align with the metadata from DOI identified objects


In [1]:
# library containign read and write functions to csv file
import lib.handle_csv as csvh

# managing files and file paths
from pathlib import Path

# library for handling url searchs
import lib.handle_urls as urlh

# add a progress bar
from tqdm import tqdm_notebook
    
# library for accessing system functions
import os

# import custom functions (common to various notebooks)
import processing_functions as pr_fns

# Connecting to the db
import lib.handle_db as dbh

# get the publications list from the app database
ukchapp_db = "db_files/app_db.sqlite3"


## Get DOI objects metadata


In [2]:
# get names and links for references in data mentions
data_reference, _ = csvh.get_csv_data('pub_data_load.csv', 'num')

# ast needed to parse string saved dictionary
import ast

for dr in tqdm_notebook(data_reference):
    # get metadata if it is missing
    if data_reference[dr]['do_metadata'] == "" and data_reference[dr]['do_doi'] != "":
        ref_link = "https://doi.org/" + data_reference[dr]['do_doi']
        data_object = urlh.getObjectMetadata(ref_link)
        data_reference[dr]['do_metadata'] = data_object['metadata']
    if data_reference[dr]['do_metadata'] != "":
        do_metadata = ast.literal_eval(str(data_reference[dr]['do_metadata']))
        data_reference[dr]['do_title'] = do_metadata['title']
        print('Title: ', do_metadata['title'])
        if 'abstract' in do_metadata:
            print('Abstract: ', do_metadata['abstract'])
            data_reference[dr]['do_description'] = do_metadata['abstract']
        print('URL: ', do_metadata['URL'])
        data_reference[dr]['do_location'] = do_metadata['URL']
        print('DOI: ', do_metadata['DOI'])
        data_reference[dr]['do_doi'] = do_metadata['DOI']
        repo_address = urlh.getBaseUrl(do_metadata['URL'])
        print('repository:', repo_address)
        data_reference[dr]['do_repository'] = repo_address
        print('Type:',do_metadata['type']) 
        data_reference[dr]['do_type'] = do_metadata['type']
        if do_metadata['type'] != 'dataset':
            data_reference[dr]['do_inferred_type'] = 'dataset'

# write to csv file
if len(data_reference) > 0:
    csvh.write_csv_data(data_reference, 'pub_data_load.csv')            

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  import sys


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=346.0), HTML(value='')))

Title:  Tuning of catalytic sites in Pt/TiO2 catalysts for chemoselective hydrogenation of 3-nitrostyrene
Abstract:  The dataset contains the raw data of XPS, STEM, XAS, GC and CO DRIFT analysis of the Pt/TiO2 catalysts prepared by impregnation, heat treated at 450°C, tested for a selective hydrogenation reaction and characterised by various techniques.
URL:  https://research.cardiff.ac.uk/converis/portal/detail/Dataset/79744472?auxfun=&lang=en_GB
DOI:  10.17035/D.2019.0079744472
repository: https://research.cardiff.ac.uk
Type: article
Title:  Impact of nanoparticle-support interactions in co3o4/al2o3 catalysts for the preferential oxidation of carbon monoxide: Raw data for Nyathi et al., ACS Catal., 2019 (10.1021/acscatal.9b00685)
Abstract:  Impact of nanoparticle-support interactions in co3o4/al2o3 catalysts for the preferential oxidation of carbon monoxide. Raw data related to an article by Nyathi <i>et al</i>., <i>ACS Catal.</i>, 2019 (10.1021/acscatal.9b00685) for public access wh

## Add metadata to file objects


In [3]:
# get names and links for references in data mentions
data_reference, _ = csvh.get_csv_data('pub_data_load.csv', 'num')

db_conn = dbh.DataBaseAdapter(ukchapp_db)

for dr in tqdm_notebook(data_reference):
    
    # get publication metadata to fill in missing fields in DO metadata
    ref_link = "https://doi.org/" + data_reference[dr]['doi']
    publication_title = db_conn.get_title(data_reference[dr]['doi'])
    if data_reference[dr]['do_doi'] == "":
        if data_reference[dr]['do_file']!="":
            do_title = data_reference[dr]['do_file'].split("/")[1]
            print("Title: ", do_title)
            data_reference[dr]['do_title'] = do_title
            print("Description: Supplementary information for ", publication_title)
            data_reference[dr]['do_description'] = "Supplementary data for " + publication_title[0]
            repo_address = urlh.getBaseUrl(data_reference[dr]['do_location'])
            print('URL:', data_reference[dr]['do_location'])
            print('Repository:', repo_address)
            data_reference[dr]['do_repository'] = repo_address
            do_type = data_reference[dr]['do_file'][data_reference[dr]['do_file'].rfind(".")+1:]
            print("Type: ", do_type)
            data_reference[dr]['do_type'] = do_type
            
# write to csv file
if len(data_reference) > 0:
    csvh.write_csv_data(data_reference, 'pub_data_load.csv')  

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=346.0), HTML(value='')))

Title:  41929_2019_334_MOESM1_ESM.pdf
Description: Supplementary information for  ('Tuning of catalytic sites in Pt/TiO2 catalysts for the chemoselective hydrogenation of 3-nitrostyrene',)
URL: https://static-content.springer.com/esm/art%3A10.1038%2Fs41929-019-0334-3/MediaObjects/41929_2019_334_MOESM1_ESM.pdf
Repository: https://static-content.springer.com
Type:  pdf
Title:  cs9b00685_si_001.pdf
Description: Supplementary information for  ('Impact of Nanoparticle–Support Interactions in Co3O4/Al2O3 Catalysts for the Preferential Oxidation of Carbon Monoxide',)
URL: https://pubs.acs.org/doi/suppl/10.1021/acscatal.9b00685/suppl_file/cs9b00685_si_001.pdf?cookieSet=1
Repository: https://pubs.acs.org
Type:  pdf
Title:  cctc201901268-sup-0001-misc_information.pdf
Description: Supplementary information for  ('In Situ Monitoring of Nanoparticle Formation during Iridium‐Catalysed Oxygen Evolution by Real‐Time Small Angle X‐Ray Scattering',)
URL: https://chemistry-europe.onlinelibrary.wiley.com/

URL: https://static-content.springer.com/esm/art%3A10.1007%2Fs11244-018-0887-4/MediaObjects/11244_2018_887_MOESM1_ESM.docx
Repository: https://static-content.springer.com
Type:  docx
Title:  ae8b00873_si_001.pdf
Description: Supplementary information for  ('Electrochemical Synthesis of Nanostructured Metal-Doped Titanates and Investigation of Their Activity as Oxygen Evolution Photoanodes',)
URL: https://pubs.acs.org/doi/suppl/10.1021/acsaem.8b00873/suppl_file/ae8b00873_si_001.pdf?cookieSet=1
Repository: https://pubs.acs.org
Type:  pdf
Title:  celc201800729-sup-0001-misc_information.pdf
Description: Supplementary information for  ('Mean Intrinsic Activity of Single Mn Sites at LaMnO3 Nanoparticles Towards the Oxygen Reduction Reaction',)
URL: https://chemistry-europe.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fcelc.201800729&file=celc201800729-sup-0001-misc_information.pdf&cookieSet=1
Repository: https://chemistry-europe.onlinelibrary.wiley.com
Type:  pdf
Title:  c7

Title:  c6cy01129b1.pdf
Description: Supplementary information for  ('Niobic acid nanoparticle catalysts for the aqueous phase transformation of glucose and fructose to 5-hydroxymethylfurfural',)
URL: http://www.rsc.org/suppdata/c6/cy/c6cy01129b/c6cy01129b1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  cs6b02369_si_001.pdf
Description: Supplementary information for  ('Platinum-Catalyzed Aqueous-Phase Hydrogenation of d-Glucose to d-Sorbitol',)
URL: https://pubs.acs.org/doi/suppl/10.1021/acscatal.6b02369/suppl_file/cs6b02369_si_001.pdf?cookieSet=1
Repository: https://pubs.acs.org
Type:  pdf
Title:  c6cp01494a1.pdf
Description: Supplementary information for  ('Determination of toluene hydrogenation kinetics with neutron diffraction',)
URL: http://www.rsc.org/suppdata/c6/cp/c6cp01494a/c6cp01494a1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  c6re00140h1.pdf
Description: Supplementary information for  ('Continuous flow gas phase photoreforming of methanol at elevated reacti

Title:  c5sc03494a1.pdf
Description: Supplementary information for  ('Encapsulation of an organometallic cationic catalyst by direct exchange into an anionic MOF',)
URL: http://www.rsc.org/suppdata/c5/sc/c5sc03494a/c5sc03494a1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  c5cc08714g1.pdf
Description: Supplementary information for  ('A mild hydration of nitriles catalysed by copper(ii) acetate',)
URL: http://www.rsc.org/suppdata/c5/cc/c5cc08714g/c5cc08714g1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  c5cc08681g1.pdf
Description: Supplementary information for  ('Conversion of nitroalkanes into carboxylic acids via iodide catalysis in water',)
URL: http://www.rsc.org/suppdata/c5/cc/c5cc08681g/c5cc08681g1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  cssc201501225-sup-0001-misc_information.pdf
Description: Supplementary information for  ('Catalytic Response and Stability of Nickel/Alumina for the Hydrogenation of 5-Hydroxymethylfurfural in Water',)
URL: https://ch

Title:  c5ta08709k1.pdf
Description: Supplementary information for  ('Hierarchically porous BEA stannosilicates as unique catalysts for bulky ketone conversion and continuous operation',)
URL: http://www.rsc.org/suppdata/c5/ta/c5ta08709k/c5ta08709k1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  c6gc01288d1.pdf
Description: Supplementary information for  ('Intensification and deactivation of Sn-beta investigated in the continuous regime',)
URL: http://www.rsc.org/suppdata/c6/gc/c6gc01288d/c6gc01288d1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  c7cy01553d1.pdf
Description: Supplementary information for  ('Towards the upgrading of fermentation broths to advanced biofuels: a water tolerant catalyst for the conversion of ethanol to isobutanol',)
URL: http://www.rsc.org/suppdata/c7/cy/c7cy01553d/c7cy01553d1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  10562_2014_1443_MOESM1_ESM.docx
Description: Supplementary information for  ('The Photocatalytic Window: Photo-Refo

Repository: http://www.rsc.org
Type:  pdf
Title:  c9dt03590g1.pdf
Description: Supplementary information for  ('Understanding the mechanochemical synthesis of the perovskite LaMnO3 and its catalytic behaviour',)
URL: http://www.rsc.org/suppdata/c9/dt/c9dt03590g/c9dt03590g1.pdf
Repository: http://www.rsc.org
Type:  pdf
Title:  cctc201901955-sup-0001-misc_information.pdf
Description: Supplementary information for  ('Influence of Synthesis Conditions on the Structure of Nickel Nanoparticles and their Reactivity in Selective Asymmetric Hydrogenation',)
URL: https://chemistry-europe.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fcctc.201901955&file=cctc201901955-sup-0001-misc_information.pdf&cookieSet=1
Repository: https://chemistry-europe.onlinelibrary.wiley.com
Type:  pdf
Title:  c9sc04905c1.pdf
Description: Supplementary information for  ('Detection of key transient Cu intermediates in SSZ-13 during NH3-SCR deNOx by modulation excitation IR spectroscopy',)
URL: http://ww

In [4]:
publication_title[0]

'A ruthenium(ii) bis(phosphinophosphinine) complex as a precatalyst for transfer-hydrogenation and hydrogen-borrowing reactions'