# Prepare loading of data objects to App DB

A curated list of data objects referenced from a set of publications is formated to facilitate loading into the App DB.

Instead of only referencing to data, these process refer to **data objects**, which are any data which is published to complement the publication, this includes raw data, supplementary data, processing data, tables, images, movies, and compilations containing one or more of such resources (corrections to publications may fall in this category but need to discuss it with stakeholders).

The operations to be performed are: 
- get metadata from objects identifed with DOIs and arrange it in a way that it can be loaded to the AppDB.
- format all objects without DOI (mostly supplementary materials) to align with the metadata from DOI identified objects


In [1]:
# library containign read and write functions to csv file
import lib.handle_csv as csvh

# managing files and file paths
from pathlib import Path

# library for handling url searchs
import lib.handle_urls as urlh

# add a progress bar
from tqdm import tqdm_notebook
    
# library for accessing system functions
import os

# import custom functions (common to various notebooks)
import processing_functions as pr_fns

# Connecting to the db
import lib.handle_db as dbh

# get the publications list from the app database
ukchapp_db = "./db_files/app_db20211122.sqlite3"


## Get DOI objects metadata

In [4]:
# get names and links for references in data mentions
# data_reference, do_keys = csvh.get_csv_data('./new_references202111.csv')
data_reference, do_keys = csvh.get_csv_data('./data_load_201111.csv', 'num')
for dr in tqdm_notebook(data_reference):
    # start copying data to do fields
    if not 'num' in do_keys:
        data_reference[dr]['num'] = dr
    
    if pr_fns.valid_doi(data_reference[dr]['target_id']):
        data_reference[dr]['do_doi'] = data_reference[dr]['target_id']
        data_reference[dr]['do_location'] = "https://doi.org/" + data_reference[dr]['target_id']
        data_reference[dr]['do_metadata'] = ""
    else:
        if not pr_fns.valid_doi(data_reference[dr]['do_doi']):
            data_reference[dr]['do_doi'] = ""
        data_reference[dr]['do_location'] = data_reference[dr]['target_id']
        data_reference[dr]['do_metadata'] = ""

# ast needed to parse string saved dictionary
import ast

for dr in tqdm_notebook(data_reference):
    # get metadata if it is missing
    'do_doi' in do_keys
    if data_reference[dr]['do_metadata'] == "" and data_reference[dr]['do_doi'] != "":
        ref_link = "https://doi.org/" + data_reference[dr]['do_doi']
        data_object = urlh.getObjectMetadata(ref_link)
        if 'metadata' in data_object.keys():
            data_reference[dr]['do_metadata'] = data_object['metadata']
        else:
            data_reference[dr]['do_metadata'] = ""
    if data_reference[dr]['do_metadata'] != "":
        do_metadata = ast.literal_eval(str(data_reference[dr]['do_metadata']))
        data_reference[dr]['do_title'] = do_metadata['title']
        print('Title: ', do_metadata['title'])
        if 'abstract' in do_metadata:
            print('Abstract: ', do_metadata['abstract'])
            data_reference[dr]['do_description'] = do_metadata['abstract']
        print('URL: ', do_metadata['URL'])
        data_reference[dr]['do_location'] = do_metadata['URL']
        print('DOI: ', do_metadata['DOI'])
        data_reference[dr]['do_doi'] = do_metadata['DOI']
        repo_address = urlh.getBaseUrl(do_metadata['URL'])
        print('repository:', repo_address)
        data_reference[dr]['do_repository'] = repo_address
        print('Type:',do_metadata['type']) 
        data_reference[dr]['do_type'] = do_metadata['type']
        if do_metadata['type'] != 'dataset':
            data_reference[dr]['do_inferred_type'] = 'dataset'
# write to csv file
if len(data_reference) > 0:
    csvh.write_csv_data(data_reference, './data_load_201111.csv')            

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for dr in tqdm_notebook(data_reference):


  0%|          | 0/296 [00:00<?, ?it/s]

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for dr in tqdm_notebook(data_reference):


  0%|          | 0/296 [00:00<?, ?it/s]

trying to recover object metadata from https://doi.org/10.2210/pdb6er9/pdb
got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb6er9%2Fpdb/transform
Title:  Crystal structure of cyclohexanone monooxygenase from Rhodococcus sp. Phi1 bound to NADP+
URL:  http://dx.doi.org/10.2210/pdb6er9/pdb
DOI:  10.2210/pdb6er9/pdb
repository: http://dx.doi.org
Type: component
trying to recover object metadata from https://doi.org/10.2210/pdb6era/pdb
got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb6era%2Fpdb/transform
Title:  Crystal structure of cyclohexanone monooxygenase mutant (F249A, F280A and F435A) from Rhodococcus sp. Phi1 bound to NADP+
URL:  http://dx.doi.org/10.2210/pdb6era/pdb
DOI:  10.2210/pdb6era/pdb
repository: http://dx.doi.org
Type: component
trying to recover object metadata from https://doi.org/10.2210/pdb5ojb/pdb
got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb5ojb%2Fpdb/transform
Title:  Structu

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1zngy1
Title:  CCDC 1836934: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Harry M. O’Brien, Hazel A. Sparkes, Sean A. Davis, June Callison, David Elorriaga, Oscar Hernandez-Fajardo, Robin B. Bedford |2018|Nat. Catal.|2|123|doi:10.1038/s41929-018-0197-z
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1zngy1&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1ZNGY1
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1zngz2
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1zngz2
Title:  CCDC 1836935: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Har

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1znhbg
Title:  CCDC 1836946: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Harry M. O’Brien, Hazel A. Sparkes, Sean A. Davis, June Callison, David Elorriaga, Oscar Hernandez-Fajardo, Robin B. Bedford |2018|Nat. Catal.|2|123|doi:10.1038/s41929-018-0197-z
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1znhbg&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1ZNHBG
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1znhch
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1znhch
Title:  CCDC 1836947: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Har

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1znhqv
Title:  CCDC 1836958: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Harry M. O’Brien, Hazel A. Sparkes, Sean A. Davis, June Callison, David Elorriaga, Oscar Hernandez-Fajardo, Robin B. Bedford |2018|Nat. Catal.|2|123|doi:10.1038/s41929-018-0197-z
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1znhqv&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1ZNHQV
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1znhrw
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1znhrw
Title:  CCDC 1836959: Experimental Crystal Structure Determination
Abstract:  Related Article: Antonis M. Messinis, Stephen L. J. Luckham, Peter P. Wells, Diego Gianolio, Emma K. Gibson, Har

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1q5ndy
Title:  CCDC 1584360: Experimental Crystal Structure Determination
Abstract:  Related Article: Robert J. Newland, Alana Smith, David M. Smith, Natalie Fey, Martin J. Hanton, Stephen M. Mansell|2018|Organometallics|37|1062|doi:10.1021/acs.organomet.8b00063
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1q5ndy&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1Q5NDY
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1q5nfz
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1q5nfz
Title:  CCDC 1584361: Experimental Crystal Structure Determination
Abstract:  Related Article: Robert J. Newland, Alana Smith, David M. Smith, Natalie Fey, Martin J. Hanton, Stephen M. Mansell|2018|Organometallics|37|1062|doi:10.1021/acs.organomet.8b00063
URL:  http://www.ccdc.cam.ac.uk/services/str

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1nt0yf
Title:  CCDC 1543395: Experimental Crystal Structure Determination
Abstract:  Related Article: Fern Sinclair, Johann A. Hlina, Jordann A. L. Wells, Michael P. Shaver, Polly L. Arnold|2017|Dalton Trans.|46|10786|doi:10.1039/C7DT02167D
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1nt0yf&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1NT0YF
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1nt0zg
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1nt0zg
Title:  CCDC 1543396: Experimental Crystal Structure Determination
Abstract:  Related Article: Fern Sinclair, Johann A. Hlina, Jordann A. L. Wells, Michael P. Shaver, Polly L. Arnold|2017|Dalton Trans.|46|10786|doi:10.1039/C7DT02167D
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1nt

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc14f15s
Title:  CCDC 1055214: Experimental Crystal Structure Determination
Abstract:  Related Article: Andrey V. Protchenko, Joshua I. Bates, Liban M. A. Saleh, Matthew P. Blake, Andrew D. Schwarz, Eugene L. Kolychev, Amber L. Thompson, Cameron Jones, Philip Mountford, and Simon Aldridge|2016|J.Am.Chem.Soc.|138|4555|doi:10.1021/jacs.6b00710
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc14f15s&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC14F15S
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc14f16t
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc14f16t
Title:  CCDC 1055215: Experimental Crystal Structure Determination
Abstract:  Related Article: Andrey V. Protchenko, Joshua I. Bates, Liban M. A. Saleh, Matthew P. Blake, Andrew D. Schwarz, Eugene L. Kolychev, Amber L. Th

got something back
resource url https://data.crosscite.org/10.5517%2Fcc1k3xg1
Title:  CCDC 1433733: Experimental Crystal Structure Determination
Abstract:  Related Article: Jennifer A. Garden,  Prabhjot K. Saini,  and Charlotte K. Williams|2015|J.Am.Chem.Soc.|137|15078|doi:10.1021/jacs.5b09913
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/cc1k3xg1&sid=DataCite
DOI:  10.5517/CC1K3XG1
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/cc1jjr1v
got something back
resource url https://data.crosscite.org/10.5517%2Fcc1jjr1v
Title:  CCDC 1416267: Experimental Crystal Structure Determination
Abstract:  Related Article: Charles Romain, Michael S. Bennington, Andrew J. P. White, Charlotte K. Williams, Sally Brooker|2015|Inorg.Chem.|54|11842|doi:10.1021/acs.inorgchem.5b02038
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/cc1jjr1v&sid=DataCite
DOI:  10.5517/CC1JJR1V
repository: ht

got something back
resource url https://data.crosscite.org/10.5517%2Fcc12t7vz
Title:  CCDC 1007371: Experimental Crystal Structure Determination
Abstract:  Related Article: Andrey V. Protchenko,  Deepak Dange,  Matthew P. Blake,  Andrew D. Schwarz,  Cameron Jones,  Philip Mountford,  and Simon Aldridge|2014|J.Am.Chem.Soc.|136|10902|doi:10.1021/ja5062467
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/cc12t7vz&sid=DataCite
DOI:  10.5517/CC12T7VZ
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/cc12t7w0
got something back
resource url https://data.crosscite.org/10.5517%2Fcc12t7w0
Title:  CCDC 1007372: Experimental Crystal Structure Determination
Abstract:  Related Article: Andrey V. Protchenko,  Deepak Dange,  Matthew P. Blake,  Andrew D. Schwarz,  Cameron Jones,  Philip Mountford,  and Simon Aldridge|2014|J.Am.Chem.Soc.|136|10902|doi:10.1021/ja5062467
URL:  http://www.ccdc.cam.ac.uk/services/stru

got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb6qw8%2Fpdb/transform
Title:  Crystal structure of CTX-M-15 complexed with relebactam (16 hour soak)
URL:  http://dx.doi.org/10.2210/pdb6qw8/pdb
DOI:  10.2210/pdb6qw8/pdb
repository: http://dx.doi.org
Type: component
trying to recover object metadata from https://doi.org/10.2210/pdb6qw9/pdb
got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb6qw9%2Fpdb/transform
Title:  Crystal structure of KPC-2 complexed with relebactam (16 hour soak)
URL:  http://dx.doi.org/10.2210/pdb6qw9/pdb
DOI:  10.2210/pdb6qw9/pdb
repository: http://dx.doi.org
Type: component
trying to recover object metadata from https://doi.org/10.2210/pdb6qwa/pdb
got something back
resource url https://api.crossref.org/v1/works/10.2210%2Fpdb6qwa%2Fpdb/transform
Title:  Crystal structure of KPC-3 complexed with relebactam (16 hour soak)
URL:  http://dx.doi.org/10.2210/pdb6qwa/pdb
DOI:  10.2210/pdb6qwa/pdb
repository: http:

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc209fb3
Title:  CCDC 1856104: Experimental Crystal Structure Determination
Abstract:  Related Article: Polly Arnold, Ryan Kerr, Catherine Weetman, Scott Doherty, Julia Rieb, Kai Wang, Christian Jandl, Max McMullon, Alexander Pöthig, Fritz Kühn, Andrew Smith|2018|Chemical Science|9|8035|doi:10.1039/C8SC03312A
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc209fb3&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC209FB3
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc209fc4
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc209fc4
Title:  CCDC 1856105: Experimental Crystal Structure Determination
Abstract:  Related Article: Polly Arnold, Ryan Kerr, Catherine Weetman, Scott Doherty, Julia Rieb, Kai Wang, Christian Jandl, Max McMullon, Alexander Pöthig, Fritz Kühn, Andrew Smith|2018

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1zq16y
Title:  CCDC 1838430: Experimental Crystal Structure Determination
Abstract:  Related Article: Robert J. Newland, Matthew P. Delve, Richard L. Wingad, Stephen M. Mansell|2018|New J.Chem.|42|19625|doi:10.1039/C8NJ03632B
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1zq16y&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1ZQ16Y
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1zq17z
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1zq17z
Title:  CCDC 1838431: Experimental Crystal Structure Determination
Abstract:  Related Article: Robert J. Newland, Matthew P. Delve, Richard L. Wingad, Stephen M. Mansell|2018|New J.Chem.|42|19625|doi:10.1039/C8NJ03632B
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1zq17z&sid=DataCite
DOI:  10.5517

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1nxj69
Title:  CCDC 1546751: Experimental Crystal Structure Determination
Abstract:  Related Article: Clara S. B. Gomes, Alejandro F. G. Ribeiro, Anabela C. Fernandes, Artur Bento, M. Rosário Ribeiro, Gabriele Kociok-Köhn, Sofia I. Pascu, M. Teresa Duarte, Pedro T. Gomes|2017|Cat.Sci.Tech.|7|3128|doi:10.1039/C7CY00875A
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc1nxj69&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC1NXJ69
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc1nxj7b
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc1nxj7b
Title:  CCDC 1546752: Experimental Crystal Structure Determination
Abstract:  Related Article: Clara S. B. Gomes, Alejandro F. G. Ribeiro, Anabela C. Fernandes, Artur Bento, M. Rosário Ribeiro, Gabriele Kociok-Köhn, Sofia I. Pascu, M. Teresa 

got something back
resource url https://data.crosscite.org/10.5258%2Fsoton%2Fd1346
Title:  Dataset for 'Dual-Site Mediated Hydrogenation Catalysis on Pd/NiO: Selective Biomass Transformation and Maintaining Catalytic Activity at Low Pd Loading'.
Abstract:  Dataset for the paper 'Dual-Site Mediated Hydrogenation Catalysis on Pd/NiO: Selective Biomass Transformation and Maintaining Catalytic Activity at Low Pd Loading' in ACS Catalysis. DOI:10.1021/acscatal.0c00414
URL:  http://eprints.soton.ac.uk/id/eprint/439578
DOI:  10.5258/SOTON/D1346
repository: http://eprints.soton.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.17630/306bd3c3-014b-466f-9538-b107628c847d
got something back
resource url https://data.crosscite.org/10.17630%2F306bd3c3-014b-466f-9538-b107628c847d
Title:  Effects of Crystal Size on Methanol to Hydrocarbon Conversion over Single Crystals of ZSM-5 Studied by Synchrotron Infrared Microspectroscopy (dataset)
Abstract:  The following file types

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc24p3jh
Title:  CCDC 1986496: Experimental Crystal Structure Determination
Abstract:  Related Article: Ryan W. F. Kerr, Paul M. D. A. Ewing, Sumesh K. Raman, Andrew D. Smith, Charlotte K. Williams, Polly L. Arnold|2021|ACS Catalysis|11|1563|doi:10.1021/acscatal.0c04858
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc24p3jh&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC24P3JH
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc24486s
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc24486s
Title:  CCDC 1970304: Experimental Crystal Structure Determination
Abstract:  Related Article: Sumesh K. Raman, Arron C. Deacy, Leticia Pena Carrodeguas, Natalia V. Reis, Ryan W. F. Kerr, Andreas Phanopoulos, Sebastian Morton, Matthew G. Davidson, Charlotte K. Williams|2020|Organometallics|39|

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc27l8qv
Title:  CCDC 2073147: Experimental Crystal Structure Determination
Abstract:  Related Article: Wouter Lindeboom, Duncan A. X. Fraser, Christopher B. Durr, Charlotte K. Williams|2021|Chem.-Eur.J.|27|12224|doi:10.1002/chem.202101140
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc27l8qv&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC27L8QV
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc27l8ty
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc27l8ty
Title:  CCDC 2073150: Experimental Crystal Structure Determination
Abstract:  Related Article: Wouter Lindeboom, Duncan A. X. Fraser, Christopher B. Durr, Charlotte K. Williams|2021|Chem.-Eur.J.|27|12224|doi:10.1002/chem.202101140
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc27l8ty&si

got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc271rbd
Title:  CCDC 2057263: Experimental Crystal Structure Determination
Abstract:  Related Article: Arron C. Deacy, Christopher B. Durr, Ryan W. F. Kerr, Charlotte K. Williams|2021|Cat.Sci.Tech.|11|3109|doi:10.1039/D1CY00238D
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc271rbd&sid=DataCite
DOI:  10.5517/CCDC.CSD.CC271RBD
repository: http://www.ccdc.cam.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5517/ccdc.csd.cc271rcf
got something back
resource url https://data.crosscite.org/10.5517%2Fccdc.csd.cc271rcf
Title:  CCDC 2057264: Experimental Crystal Structure Determination
Abstract:  Related Article: Arron C. Deacy, Christopher B. Durr, Ryan W. F. Kerr, Charlotte K. Williams|2021|Cat.Sci.Tech.|11|3109|doi:10.1039/D1CY00238D
URL:  http://www.ccdc.cam.ac.uk/services/structure_request?id=doi:10.5517/ccdc.csd.cc271rcf&sid=DataCite
DOI:  10.

got something back
resource url https://data.crosscite.org/10.5525%2Fgla.researchdata.1092
Title:  Investigation of MoOx/Al2O3 under Cyclic Operation for Oxidative and Non-Oxidative Dehydrogenation of Propane
URL:  http://researchdata.gla.ac.uk/id/eprint/1092
DOI:  10.5525/GLA.RESEARCHDATA.1092
repository: http://researchdata.gla.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.5525/gla.researchdata.1141
got something back
resource url https://data.crosscite.org/10.5525%2Fgla.researchdata.1141
Title:  Operando XAFS investigation on the effect of ash deposition on three-way catalyst used in Gasoline Particulate Filters and the effect of the manufacturing process on the catalytic activity
URL:  http://researchdata.gla.ac.uk/id/eprint/1141
DOI:  10.5525/GLA.RESEARCHDATA.1141
repository: http://researchdata.gla.ac.uk
Type: dataset
trying to recover object metadata from https://doi.org/10.17035/d.2021.0129557921
got something back
resource url https://data.cross

## Add metadata to file objects


In [3]:
# get names and links for references in data mentions
data_reference, do_keys = csvh.get_csv_data('./data_load_201111.csv', 'num')

db_conn = dbh.DataBaseAdapter(ukchapp_db)

for dr in tqdm_notebook(data_reference):
    
    # get publication metadata to fill in missing fields in DO metadata
    ref_link = "https://doi.org/" + data_reference[dr]['pub_doi']
    publication_title = db_conn.get_title(data_reference[dr]['pub_doi'])
    if data_reference[dr]['do_doi'] == "":
        if data_reference[dr]['do_file']!="":
            do_title = data_reference[dr]['do_file'].split("/")[1]
            print("Title: ", do_title)
            data_reference[dr]['do_title'] = do_title
            print("Description: Supplementary information for ", publication_title)
            data_reference[dr]['do_description'] = "Supplementary data for " + publication_title[0]
            repo_address = urlh.getBaseUrl(data_reference[dr]['do_location'])
            print('URL:', data_reference[dr]['do_location'])
            print('Repository:', repo_address)
            data_reference[dr]['do_repository'] = repo_address
            do_type = data_reference[dr]['do_file'][data_reference[dr]['do_file'].rfind(".")+1:]
            print("Type: ", do_type)
            data_reference[dr]['do_type'] = do_type
            
# write to csv file
if len(data_reference) > 0:
    csvh.write_csv_data(data_reference, './data_load_201111.csv')  

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for dr in tqdm_notebook(data_reference):


  0%|          | 0/296 [00:00<?, ?it/s]

Title:  https:
Description: Supplementary information for  ('Combination of Cu/ZnO Methanol Synthesis Catalysts and ZSM-5 Zeolites to Produce Oxygenates from CO2 and H2',)
URL: https://static-content.springer.com/esm/art%3A10.1007%2Fs11244-021-01447-8/MediaObjects/11244_2021_1447_MOESM1_ESM.docx
Repository: https://static-content.springer.com
Type:  docx
Title:  om1c00055_si_001.pdf
Description: Supplementary information for  ('Dinuclear Ce(IV) Aryloxides: Highly Active Catalysts for Anhydride/Epoxide Ring-Opening Copolymerization',)
URL: https://pubs.acs.org/doi/suppl/10.1021/acs.organomet.1c00055/suppl_file/om1c00055_si_001.pdf
Repository: https://pubs.acs.org
Type:  pdf
Title:  ic1c00327_si_001.pdf
Description: Supplementary information for  ('Instantaneous and Phosphine-Catalyzed Arene Binding and Reduction by U(III) Complexes',)
URL: https://pubs.acs.org/doi/suppl/10.1021/acs.inorgchem.1c00327/suppl_file/ic1c00327_si_001.pdf
Repository: https://pubs.acs.org
Type:  pdf
Title:  om9b

## Insert into datasets table

In [40]:
# get names and links for references in data mentions
data_reference, _ = csvh.get_csv_data('./data_load_201111.csv', 'num')

db_conn = dbh.DataBaseAdapter(ukchapp_db)

db_table = "datasets"
table_columns = ["dataset_complete", "dataset_description","dataset_doi","dataset_enddate", "dataset_location",
                  "dataset_name","dataset_startdate","created_at","updated_at", "ds_type", "repository"]
for dr in tqdm_notebook(data_reference):
    if data_reference[dr]['do_location']!= "":
        if data_reference[dr]['do_inferred_type'] != "":
            do_type = data_reference[dr]['do_inferred_type']
        else:
            do_type = data_reference[dr]['do_type']
        table_values = [None, data_reference[dr]['do_description'], data_reference[dr]['do_doi'], None, data_reference[dr]['do_location'],data_reference[dr]['do_title'], data_reference[dr]['target_published'],
                        "2021-11-23 14:17:00", "2021-11-23 14:17:00" , do_type, data_reference[dr]['do_repository']]
        db_conn.put_values_table(db_table, table_columns, table_values)
        #get the id of inserted record
        new_do_id = db_conn.get_value( db_table, "id", "dataset_location", data_reference[dr]['do_location'])[0]
        print(new_do_id)
        linktable = "article_datasets"
        linktable_columns = ["doi", "article_id", "dataset_id", "created_at", "updated_at"]
        linktable_values = [data_reference[dr]['pub_doi'], data_reference[dr]['pub_id'], new_do_id, "2021-11-23 14:17:00", "2021-11-23 14:17:00"]
        db_conn.put_values_table(linktable, linktable_columns, linktable_values)

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for dr in tqdm_notebook(data_reference):


  0%|          | 0/290 [00:00<?, ?it/s]

529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
603
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778


## Fix for adding start date 
Add date of publication as start date

In [41]:
from datetime import date

# create connection to the DB
db_conn = dbh.DataBaseAdapter(ukchapp_db)
# get a list of the datasets in the DB
db_datasets = db_conn.get_full_table('datasets')

for db_ds in db_datasets:
    if db_ds[7] == None or db_ds[7] == "":
        #print (db_ds)
        # get article id
        art_id = db_conn.get_value("article_datasets", "article_id", "dataset_id", db_ds[0])
        
        art_pub_year = db_conn.get_value("articles", "pub_year", "id", art_id[0])[0]
        art_poy = db_conn.get_value("articles", "pub_ol_year", "id", art_id[0])[0]
        art_pom = db_conn.get_value("articles", "pub_ol_month", "id", art_id[0])[0]
        art_pod = db_conn.get_value("articles", "pub_ol_day", "id", art_id[0])[0]
        art_ppy = db_conn.get_value("articles", "pub_print_year", "id", art_id[0])[0]
        art_ppm = db_conn.get_value("articles", "pub_print_month", "id", art_id[0])[0]
        art_ppd = db_conn.get_value("articles", "pub_print_day", "id", art_id[0])[0]
        print (art_id[0],art_pub_year, art_poy, art_pom, art_pod, art_ppy, art_ppm, art_ppd)
        if art_poy != '' and art_pom != '' and art_pod != '' and art_poy != None and art_pom != None and art_pod != None:
            print ("use online date: ", art_poy, art_pom, art_pod, art_ppy, art_ppm, art_ppd)
            print(date(int(art_poy), int(art_pom), int(art_pod)))
            db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate", date(art_poy, art_pom, art_pod).isoformat())
        #db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate",art_pub_year[0])
        elif art_poy != '' and art_pom != '' and art_poy != None and art_pom != None:
            print ("use online date: ", art_poy, art_pom, art_pod, art_ppy, art_ppm, art_ppd)
            print(date(int(art_poy), int(art_pom), 1))
            db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate", date(art_poy, art_pom, 1).isoformat())
        elif art_ppy != '' and art_ppm != '' and art_ppd != '' and art_ppy != None and art_ppm != None and art_ppd != None:
            print ("use print date: ",art_ppy, art_ppm, art_ppd)
            print(date(art_ppy, art_ppm, art_ppd))
            db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate", date(art_ppy, art_ppm, art_ppd).isoformat())
        elif art_ppy != '' and art_ppm != '' and art_ppy != None and art_ppm != None:
            print ("use print date: ",art_ppy, art_ppm, 1)
            print(date(art_ppy, art_ppm, 1))
            db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate", date(art_ppy, art_ppm, 1).isoformat())
        elif art_poy != '' and art_poy != None:
            print ("use online date: ", art_poy, 1, 1)
            db_conn.set_value_table('datasets', db_ds[0], "dataset_startdate", date(art_poy, 1, 1).isoformat())

259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
259 2019 2019 9 23 2019 8 5
use online date:  2019 9 23 2019 8 5
2019-09-23
260 2019 2019 7 8 2019 7 31
use online date:  2019 7 8 2019 7 31
2019-07-08
593 2020 2020 1 29 2020 2 21
use online date:  2020 1 29 2020 2 21
2020-01-29
664 2021 2021 None None None None None
use online date:  2021 1 1
665 2020 2020 11 24 None None None
use online date:  2020 11 24 None None None
2020-11-24
674 2021 2021 6 4 2021 7 14
use online date:  2021 6 4 2021 7 14
2021-06-04
680 20