# Verification of references to UK Catalysis Hub 
A list of articles is obtainded from publish or perish. This list will contain a titles and some IDs whic need to be verified. 

The criteria for adding a publication to the database are: 
a) has an explicit acknowledgement of UK Catalysis Hub
b) mentions one of the UK Catalysis Hub grants
c) has two or more authors with affiliation to UK Catalysis Hub
d) acknowledges support from a scientist affiliated to UK Catalysis Hub.

In [1]:
# Libraries
# library containign functions that read and write to csv files
import lib.handle_csv as csvh
# library for connecting to the db
import lib.handle_db as dbh
# library for handling text matchings
import lib.text_comp as txtc
# library for getting data from crossref
import lib.crossref_api as cr_api
# library for handling url searchs
import lib.handle_urls as urlh

from pathlib import Path


# input files
new_results_file = 'pop_searches/PoPCites5.csv'
previous_results = 'pop_searches/ukch_pop_prev_res.csv'

#output files
nr_wf = new_results_file[:-4]+"_wf.csv"
working_filem = wf_fields = None
current_pass = 0
if Path(nr_wf).is_file():
    working_file, wf_fields = csvh.get_csv_data(nr_wf,'Num')
    for art_num in working_file:
        if current_pass < int(working_file[art_num]['ignore']) :
            current_pass = int(working_file[art_num]['ignore'])
#print(nr_wf)

## Verify if already processed titles are included
Read data and verify if results in file have already been included in previous searches


In [2]:
if current_pass == 0:
    csv_articles, fn_articles = csvh.get_csv_data(new_results_file,'Num')
    prev_articles, fn_prev = csvh.get_csv_data(previous_results,'Num')
    # print(prev_articles)
    # pass 1a exact match
    for art_num in csv_articles:
        new_title = csv_articles[art_num]['Title']
        for prev_num in prev_articles:
            if new_title == prev_articles[prev_num]['Title']:
                #print(art_num, 'Title:', csv_articles[art_num]['Title'], "already processed", prev_num, prev_articles[prev_num]['Title'])
                csv_articles[art_num]['ignore']=1
                break
        if not 'ignore' in csv_articles[art_num].keys():
            csv_articles[art_num]['ignore']=0
    # pass 1b approximate match
    for art_num in csv_articles:
        if csv_articles[art_num]['ignore']==0:
            new_title = csv_articles[art_num]['Title']
            for prev_num in prev_articles:
                if txtc.similar(new_title, prev_articles[prev_num]['Title'])> 0.80:
                    #print(art_num, 'Title:', csv_articles[art_num]['Title'], "already processed", prev_num, prev_articles[prev_num]['Title'])
                    csv_articles[art_num]['ignore']=1
                    break
    csvh.write_csv_data(csv_articles, nr_wf)
    if Path(nr_wf).is_file():
        working_file, wf_fields = csvh.get_csv_data(nr_wf,'Num')
        for art_num in working_file:
            if current_pass < int(working_file[art_num]['ignore']) :
                current_pass = int(working_file[art_num]['ignore'])

## Check Title Wording
Using the workds in previous catalysis hub papers check if the title is likely to be a cat hub title

In [3]:
if current_pass in [0,1]:
    # pass 2
    # check titles for likelihood of being catalysis articles using keywords from titles in current DB 
    print("Get word list from DB")
    x = dbh.DataBaseAdapter('ukch_articles.sqlite')
    db_titles = x.get_value_list('articles','title')
    title_words = set()
    ignore_words=set(['the','of','to','and','a','in','is','it', 'their', 'so', 'as'])
    average = 0
    words_sum = 0.0
    for title in db_titles:
        one_title = set(title.lower().split())
        one_title = one_title - ignore_words
        title_words = title_words.union(one_title)
        words_sum += len(one_title) 
        
    average = words_sum /len(db_titles)
    print("Average words per title:", average)
    title_words = title_words - ignore_words
    for art_num in working_file:
        if 0 == int(working_file[art_num]['ignore']):
            art_title = working_file[art_num]['Title']
            art_words = set(art_title.lower().split())
            occurrences = len(title_words.intersection(art_words))
            working_file[art_num]['keywords']=occurrences
            if occurrences <= 4:
                print("occurrences:", occurrences, "in title:", art_title)
                working_file[art_num]['ignore']=2
            #elif occurrences <= 7:
            else:
                print("occurrences:", occurrences, "in title:", art_title)
    csvh.write_csv_data(working_file, nr_wf)
    current_pass = 2

In [4]:
if current_pass == 2:
    i = 0
    for art_num in working_file:
        #print('Title:', working_file[art_num]['Title'],working_file[art_num]['ignore'])
        if working_file[art_num]['ignore']=='0':
            inspected = False
            while not inspected:
                new_title = working_file[art_num]['Title']
                print('Title:', working_file[art_num]['Title'])
                print('***************************************************************')
                print("Oprions:\n\ta) add\n\tb) ignore")
                print("selection:")
                usr_select = input()
                if usr_select == 'b':
                    working_file[art_num]['ignore']=3 # visual inspection
                    inspected = True
                elif usr_select == 'a':
                    inspected = True
            i += 1
    print("To Process:", i, "Pass:", current_pass)
    csvh.write_csv_data(working_file, nr_wf)
    current_pass = 3

## Get DOIs for Articles
The remaining titles need to be further analysed. Recovering their DOIs can help obtain abstracts and acknowledgement statements. 

In [5]:
if current_pass == 3:
    i = 0
    for art_num in working_file:
        if working_file[art_num]['ignore']=='0':
            new_title = working_file[art_num]['Title']
            new_doi = cr_api.getDOIForTitle(new_title)
            if new_doi == "":
                print("Missing DOI:", new_title)
                working_file[art_num]['ignore'] = '4'
                i +=1
            else:
                print("DOI found:", new_doi, "for:", new_title)
                working_file[art_num]['DOIcr'] = new_doi
                working_file[art_num]['ignore'] = '0'
    print("without DOI:", i)
    csvh.write_csv_data(working_file, nr_wf)
    current_pass = 4

## Verify DOIs in DB
Verify that articles do not exist in the DB

In [6]:
if current_pass == 4:
    i = 0
    db_conn = dbh.DataBaseAdapter('ukch_articles.sqlite')
    for art_num in working_file:
        if working_file[art_num]['ignore']=='0':
            new_title = working_file[art_num]['Title']
            new_doi = working_file[art_num]['DOIcr']
            db_title = db_conn.get_title(new_doi)
            if db_title == None:
                print("Not in DB:", new_doi, new_title)
            else:
                print("Already in DB:", new_doi, "for:", new_title, db_title)
                working_file[art_num]['ignore'] = '5'
    print("without DOI:", i)
    csvh.write_csv_data(working_file, nr_wf)
    current_pass = 5


## Get Acknowledgement statements

In [7]:
if current_pass >= 5:
    i = 0
    for art_num in working_file:
        if working_file[art_num]['ignore']=='0':
            article_title = working_file[art_num]['Title']
            article_doi = working_file[art_num]['DOIcr']
            article_url =working_file[art_num]['ArticleURL']
            print("Analysing:", article_title, article_doi, article_url)
            # try to retrive html page for article using link from crossref first
            # and if not try url from pop
            # find reference to uk catalysis hub in html text
            # if found mark as relevant
            found = ""
            referents = ["uk catalysis hub", "uk catalysis", "catalysis hub",
                 'EP/R026645/1', 'resources', 'EP/K014668/1', 'EPSRC', 'EP/K014714/1',
                 'Hub','provided', 'grant', 'biocatalysis', 'EP/R026815/1', 'EP/R026939/1',
                 'support', 'membership', 'EP/M013219/1', 'UK', 'kindly', 'Catalysis',
                 'funded', 'EP/R027129/1', 'Consortium', 'thanked', 'EP/K014854/1', 'EP/K014706/2']
            found = urlh.findFromDOI(article_title, article_doi, referents)
            working_file[art_num]['checked_doi'] = 1
            working_file[art_num]['ack_doi'] = found
            found = urlh.findFromURI(article_title, article_url, referents)
            working_file[art_num]['checked_url'] = 1
            working_file[art_num]['ack_url'] = found
            print("Ack:", found)
    csvh.write_csv_data(working_file, nr_wf)

Analysing: Catalytic decomposition of N 2 O over Cu–Al–O x mixed metal oxides 10.1039/c8ra10509j https://pubs.rsc.org/en/content/articlehtml/2019/ra/c8ra10509j
Ack: <span class="italic">UK Catalysis Hub, Research Complex at Harwell, Rutherford Appleton Laboratories, Didcot, Oxon OX11 0FA, UK</span>
Analysing: Understanding the mechanochemical synthesis of the perovskite LaMnO 3 and its catalytic behaviour 10.1039/c9dt03590g https://pubs.rsc.org/en/content/articlehtml/2020/dt/c9dt03590g
Ack: <span>The authors acknowledge Diamond Light Source and The UK Catalysis Hub for provision of beam time (Experiment sp15151-1 and sp15151-4). The staff on B18 at Diamond Light Source, particularly Dr Diego Gianolio are thanked for their assistance in collecting data. The RCaH are acknowledged for use of facilities and staff support. Johnson Matthey is acknowledged for their provision of precursor materials and milling equipment. The Johnson Matthey advanced analytical department are thanked for their

Ack: 
Analysing: Molecular behaviour of phenol in zeolite Beta catalysts as a function of acid site presence: a quasielastic neutron scattering and molecular dynamics … 10.1039/c9cy01548e https://pubs.rsc.org/en/content/articlehtml/2019/cy/c9cy01548e
Ack: <span>This work was performed using the computational facilities of the Advanced Research Computing @ Cardiff (ARCCA) Division, Cardiff University, and HPC Wales. <span class="italic">Via</span> our membership of the UK's HEC Materials Chemistry Consortium, which is funded by EPSRC [grant number: EP/L000202], this work used the ARCHER UK National Supercomputing Service (http://www.archer.ac.uk). The authors acknowledge EPSRC [grant number: EP/K009567/2] and NERC [grant number: NE/R009376/1] for funding. C. H. T. thanks Dr. S. E. Ruiz-Hernandez for valuable consultations. A. J. O. M. acknowledges the Ramsay Memorial Trust for the provision of a Ramsay Memorial Fellowship, and Roger and Sue Whorrod for the funding of the Whorrod Fellows

Ack: 
Analysing: Host–Guest Chemistry Meets Electrocatalysis: Cucurbit[6]uril on a Au Surface as a Hybrid System in CO2 Reduction 10.29363/nanoge.ngfm.2019.070 https://pubs.acs.org/doi/abs/10.1021/acscatal.9b04221
Ack: <span class="article_header-suppInfo-text">Supporting Info (1)</span>
Analysing: Sequence-defined multifunctional polyethers via liquid-phase synthesis with molecular sieving 10.1038/s41557-019-0212-2 https://www.nature.com/articles/s41557-018-0169-6
Ack: 
Analysing: Physisorption of water on graphene: subchemical accuracy from many-body electronic structure methods 10.1021/acs.jpclett.8b03679 https://pubs.acs.org/doi/abs/10.1021/acs.jpclett.8b03679
Ack: <span class="article_header-suppInfo-text">Supporting Info (3)</span>
Analysing: A Caged E3 Ligase Ligand for PROTAC-mediated Protein Degradation with Light 10.26434/chemrxiv.11350079.v2 https://chemrxiv.org/ndownloader/files/20140748
Ack: 
Analysing: Highly Sensitive and Selective Molecular Probes for Chromo‐Fluorogenic

Ack: <span class="article_header-suppInfo-text">Supporting Info (1)</span>
Analysing: Highly efficient catalytic pyrolysis of polyethylene waste to derive fuel products by novel polyoxometalate/kaolin composites 10.1177/0734242x19899718 https://journals.sagepub.com/doi/abs/10.1177/0734242X19899718
Ack: <span class="NLM_article-title">Phosphotungstic acid supported on acid-leached porous kaolin for friedel-crafts acylation of anisole</span>
<span class="ref-google"><a class="google-scholar" href="http://scholar.google.com/scholar_lookup?hl=en&amp;publication_year=2015&amp;pages=27-34&amp;author=NM+Basir&amp;author=HO+Lintang&amp;author=S+Endud&amp;title=Phosphotungstic+acid+supported+on+acid-leached+porous+kaolin+for+friedel-crafts+acylation+of+anisole">Google Scholar</a></span>
<span class="NLM_article-title">Synthesis, characterization and catalytic evaluation of SBA-15 supported 12-tungstophosphoric acid mesoporous materials in the oxidation of benzaldehyde to benzoic acid</span>
<sp

Ack: <span>Metal support interaction</span>
<span class="anchor-text">Contact and support</span>
Analysing: Palladium complexes bearing an N‐heterocyclic carbene–sulfonamide ligand for cooligomerization of ethylene and polar monomers 10.1002/pola.29270 https://onlinelibrary.wiley.com/doi/abs/10.1002/pola.29270
Ack: <span>Supporting Information</span>
Analysing: Polyamine-decorated mesocellular silica foam nanocomposites: Effect of the reaction parameters on the grafted polymer content and silica mesostructure 10.1007/s10971-019-05070-8 https://link.springer.com/article/10.1007/s10971-019-05070-8
Ack: 
Analysing: Site-Selective Double and Tetracyclization Routes to Fused Polyheterocyclic Structures by Pd-Catalyzed Carbonylation Reactions 10.1021/acs.orglett.0c00171 https://pubs.acs.org/doi/abs/10.1021/acs.orglett.0c00171
Ack: <span class="article_header-suppInfo-text">Supporting Info (1)</span>
Analysing: Structure, Hirshfeld surface and theoretical study of a new inorganic organic arse

Ack: <span>H. A. A. is grateful to the University of Kufa and the Ministry of Higher Education and Scientific Research (Iraq) for financial support through the award of a PhD scholarship. P. F. and J. V. acknowledge the Research Foundation-Flanders (FWO) for a post-doctoral and doctoral grant, respectively. Calculations were performed on: the University of Birmingham's BlueBEAR high-performance computer (http://www.bear.bham.ac.uk/bluebear); ATHENA at HPC Midlands Plus, which is funded by the EPSRC through grant (EP/P020232); THOMAS, the UK Materials and Molecular Modelling Hub for computational resources, which is partially funded by EPSRC (EP/P020194/1); and ARCHER, the UK National Supercomputing Service (http://www.archer.ac.uk) <span class="italic">via</span> membership of the UK's HPC Materials Chemistry Consortium, which is funded by EPSRC (EP/L000202), and “TOUCAN: Towards an Understanding of Catalysis on Nanoalloys” membership, which is funded by EPSRC under Critical Mass Grant

Ack: <span class="anchor-text">Contact and support</span>
Analysing: Synthesis, crystal structure and bovine serum albumin–binding studies of a new Cd (II) complex incorporating 2, 2′-(propane-1, 3-diyl) bis (1H-imidazole-4, 5 … 10.1177/1747519819895240 https://journals.sagepub.com/doi/abs/10.1177/1747519819895240
Ack: 
Analysing: Hydrophilic microporous membranes for selective ion separation and flow-battery energy storage 10.1038/s41563-019-0536-8 https://idp.nature.com/authorize/casa?redirect_uri=https://www.nature.com/articles/s41563-019-0536-8&casa_token=tx2b8oiDfK0AAAAA:zORBLeozH26mbMl4HnPzUE0YDF2K3dtBOi94x8aNb6Z-R-pEettkC32Gk3T1HylngGNlClA_zZl1kk7V
Ack: 
Analysing: Structural mechanism of DNA-end synapsis in the non-homologous end joining pathway for repairing double-strand breaks: bridge over troubled ends 10.1042/bst20180518 https://portlandpress.com/biochemsoctrans/article-abstract/47/6/1609/221466
Ack: 
Analysing: Understanding supported noble metal catalysts using first-pri

Ack: 
Analysing: Design and evolution of an enzyme with a non-canonical organocatalytic mechanism 10.1038/s41586-019-1262-8 https://idp.nature.com/authorize/casa?redirect_uri=https://www.nature.com/articles/s41586-019-1262-8&casa_token=p-gSTtICHIIAAAAA:olrfXAWarnTMmIgstr_4K31FkT0-Dl_tNVlQne6WHAka8NNp5Gj-zn_Qf34kEXAS-XYvdrSN1GNFo1Gn
Ack: 
Analysing: Insight into the process of product expulsion in cellobiohydrolase Cel6A from Trichoderma reesei by computational modeling 10.1080/07391102.2018.1450164 https://www.tandfonline.com/doi/abs/10.1080/07391102.2018.1450164
Ack: 
Analysing: Materials Informatics for Heat Transfer: Recent Progresses and Perspectives 10.1080/15567265.2019.1576816 https://www.tandfonline.com/doi/abs/10.1080/15567265.2019.1576816
Ack: 
Analysing: One-Step Production of Amine-Functionalized Hollow Mesoporous Silica Microspheres via Phase Separation Induced Cavity in Miniemulsion System for Opaque and … 10.1021/acs.iecr.9b04642 https://pubs.acs.org/doi/abs/10.1021/acs.

Ack: <span class="anchor-text">Contact and support</span>
Analysing: Electrocatalytic and enhanced photocatalytic applications of sodium niobate nanoparticles developed by citrate precursor route 10.1038/s41598-019-40745-w https://www.nature.com/articles/s41598-019-40745-w
Ack: 
Analysing: The role of conserved residues in the catalytic activity of NDM-1: an approach involving site directed mutagenesis and molecular dynamics 10.1039/c9cp02734c https://pubs.rsc.org/lv/content/articlehtml/2019/cp/c9cp02734c
Ack: 
Analysing: TiO 2 nanoparticles potentiated the cytotoxicity, oxidative stress and apoptosis response of cadmium in two different human cells 10.1007/s11356-019-07130-6 https://link.springer.com/article/10.1007/s11356-019-07130-6
Ack: 
Analysing: Purification of fuel ethanol engine exhaust with platinum-loaded Ce0.5Zr0.5O2 catalyst 10.1177/1847980419886673 https://journals.sagepub.com/doi/abs/10.1177/1847980419886673
Ack: <span class="NLM_fn" id="fn3-1847980419886673"><p><span cl

Ack: 
Analysing: Novel adsorptive materials by adenosine 5ʹ‐triphosphate imprinted‐polymer over the surface of polystyrene nanospheres for selective separation of adenosine 5ʹ … 10.1002/jssc.201900583 https://onlinelibrary.wiley.com/doi/abs/10.1002/jssc.201900583
Ack: <span>Supporting Information</span>
Analysing: Recent advancements in sorption technology for solar thermal energy storage applications 10.1016/j.solener.2018.06.102 https://www.sciencedirect.com/science/article/pii/S0038092X18306546
Ack: <span class="anchor-text">Contact and support</span>
Analysing: Nylon 612/TiO2 composites by anionic copolymerization-molding process: Comparative evaluation of thermal and mechanical performance 10.1177/0021998319862345 https://journals.sagepub.com/doi/abs/10.1177/0021998319862345
Ack: 
