# Molecular Biology Database Metrics Jupyter Notebook
At [UniProt](https://uniprot.org) we have developed some useful metrics over the years to measure and report our progress and impact to funders and policy makers. As part of an administrative supplement awarded by NIH [ODSS](https://datascience.nih.gov/about/odss) we have created [a Google Colab page](https://colab.research.google.com/drive/1aEmSQR9DGQIZmHAIuQV9mLv7Mw9Ppkin) and this Jupyter notebook to enable other data resource providers to easily run these metrics.

Both allow you to investigate how often a resource is mentioned in the full text literature and how often it is mentioned in papers that cite specific funding sources. You need not limit yourself to only looking at database resources. You can use any keywords to investigate the impact of experimental techniques or anything else you can think of.

If you have specific feedback on this notebook then please contact Alex Bateman (<agb@ebi.ac.uk>).
We would like to fund the fantastic work of [Europe PubMed Central](https://europepmc.org/) and [ChEMBL](https://www.ebi.ac.uk/chembl/) for making the APIs that power much of this page.


## Contents

This notebook will perform queries in publication ([EuropePMC](https://europepmc.org/)) and patents ([SureCHEMBL](https://www.surechembl.org/)) databases to compute and display graphically the impact of a resource.
In particular plots will be generated that
- show how the number of mentions changes over time
- group publications by the grant agency they acknowledge
- find in which section of a publication a resource is mentioned.

# Instructions

Start by executing BOTH the (⏵︎) **Code** and the **Parameters** cells (wherein you can specify what to query for).

After doing that, to (re)generate the plots, execute (⏵︎) the corresponding cells.
Alternatively, after choosing what to query in the **Parameters** section, you can choose **Run all** from the **Runtime** menu at the top of the page.

If this is your first time using an interactive notebook or colab, you may want to check the [intro to jupyter notebooks](https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb) or the [colab introductory page](https://colab.research.google.com/)

# Caveat

To find all mentions of the data resource it may be important to provide synonyms. Each search term must match exactly, thus “UniProt” and “UniProtKB” will match different sets of papers.

If the name of the resource matches a common English word such as STRING or PRINTS then you will get many false positive matches.

Also keep in mind that this tool can only search the full text of the open access portion of EuropePMC, thus the count of papers will be an underestimate of the total number of mentions in the whole research literature.

### Changelog:

    2023.03.12 first colab version with basic query capabilities
    2023.03.13 added plot generation via plotly and parameters selection via form
    2023.03.14 interface and plot improvements; added capability to download data as csv, introduction text and first tests of patent info retrieval
    2023.03.15 improved plot interface, also with possibility to draw on plots; plots can be saved with meaningful filenames; added retrieval of patent information from SureChEMBL
    2023.03.16 added retrieval of total patents in timeframe; added progress bars and error messages if Code has not been initialised
    2023.06.27 changed to ascii progress bar to avoid incompatibility with plotly display
    2023.08.25 converted from colab version to run independently as jupyter notebook
    2023.09.14 added downloadable list of API url links for totals of publication mentions
    2023.11.03 added possibility to specify custom comma separated lists of granters
    2023.11.13 expanded list of granters in dropdown, added link to a list of the top 1000 ones
    2024.01.22 updated to new SureCHEMBL API for retrieving patent data

### Credits:

Code by Alex Bateman, Alex Ignatchenko and [Giuseppe Insana](https://insana.net)

© 2023- [UniProt consortium](https://www.uniprot.org/help/about)

### Tips

- Add your own query search terms: simply modify the `options` array in the `query_widget = widgets.Dropdown` definition.
- Add more funding agencies: simply modify the `options` array in the `granters_widget = widgets.Dropdown` definition.
- Export all generated plots as svg (Scalable Vector Graphics format): look for `plotly_config` at the beginning of the Code cell and three lines below that change from `'format': 'png'` to `'format': 'svg'`. Once that is done and the cell re-run, the *Download plots as* button in the plot interface (the camera icon) will allow to export the plots in svg format.

### Citation

If you find this software useful, please consider citing the [journal article](https://doi.org/10.1093/bioadv/vbad180) ([pubmed 38130879](https://pubmed.ncbi.nlm.nih.gov/38130879)):

Bibtex:
```
@article{10.1093/bioadv/vbad180,
    author = {Insana, Giuseppe and Ignatchenko, Alex and Martin, Maria and Bateman, Alex and UniProt Consortium },
    title = "{MBDBMetrics: an online metrics tool to measure the impact of biological data resources}",
    journal = {Bioinformatics Advances},
    volume = {3},
    number = {1},
    pages = {vbad180},
    year = {2023},
    month = {12},
    issn = {2635-0041},
    doi = {10.1093/bioadv/vbad180},
    url = {https://doi.org/10.1093/bioadv/vbad180},
    eprint = {https://academic.oup.com/bioinformaticsadvances/article-pdf/3/1/vbad180/54717401/vbad180.pdf},
}
```


## Code

In [None]:
#imports
import os
import sys
import re
import pandas as pd
import plotly.graph_objects as go
import requests
from requests.adapters import HTTPAdapter, Retry
import datetime
from time import sleep
from tqdm import tqdm
import plotly.io as pio
pio.renderers.default = 'notebook'
import ipywidgets as widgets #for jupyter input dialogs
from IPython.display import clear_output

RED = '\033[91m'
BOLD = '\033[1m'
END = '\033[0m'

#plotly config
plotly_config = {
    'displayModeBar': True,
    'toImageButtonOptions': {
        'format': 'png', # one of png, svg, jpeg, webp
        'filename': 'impact_plot',
        #'width': 1280,
        #'height': 1024,
    },
    'modeBarButtonsToRemove': ['lasso2d'],
    'modeBarButtonsToAdd': ['drawline', 'drawopenpath']
}

#paper sections:
paper_sections=['ABBR','ACK_FUND','APPENDIX','AUTH_CON','CASE','COMP_INT','FIG','OTHER','REF','TABLE','ABSTRACT','INTRO','KEYWORD','METHODS','RESULTS','DISCUSS','CONCL','SUPPL'];
#grant agencies' lists:
granters_nih = ['NIH','NHGRI NIH HHS','NIGMS NIH HHS','NCI NIH HHS',
                'NHLBI NIH HHS','NIAID NIH HHS','NIDDK NIH HHS','NINDS NIH HHS',
                'NIMH NIH HHS',	'NCRR NIH HHS','NICHD NIH HHS','NIA NIH HHS',
                'NIDA NIH HHS','NEI NIH HHS','NIAMS NIH HHS','NIEHS NIH HHS',
                'NIAAA NIH HHS','NIDCR NIH HHS','NIDCD NIH HHS','NIBIB NIH HHS']
granters_us = ['Department of Energy','National Science Foundation','Bill and Melinda Gates',
            'Department of Defense','Department of Agriculture','Food and Drug Administration',
            'Environmental Protection Agency','Howard Hughes Medical Institute']
granters_uk = ['Wellcome Trust','Medical Research Council','Biotechnology and Biological Sciences Research Council',
            'Engineering and Physical Sciences Research Council','Natural Environment Research Council',
            'Economic and Social Research Council','Arts and Humanities Research Council',
            'Science and Technology Facilities Council']
css_colors = {'aliceblue', 'antiquewhite', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanchedalmond', 'blue', 'blueviolet', 'brown', 'burlywood', 'cadetblue', 'chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk', 'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod', 'darkgray', 'darkgrey', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen', 'darkslateblue', 'darkslategray', 'darkslategrey', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue', 'dimgray', 'dimgrey', 'dodgerblue', 'firebrick', 'floralwhite', 'forestgreen', 'fuchsia', 'gainsboro', 'ghostwhite', 'gold', 'goldenrod', 'gray', 'grey', 'green', 'greenyellow', 'honeydew', 'hotpink', 'indianred', 'indigo', 'ivory', 'khaki', 'lavender', 'lavenderblush', 'lawngreen', 'lemonchiffon', 'lightblue', 'lightcoral', 'lightcyan', 'lightgoldenrodyellow', 'lightgray', 'lightgrey', 'lightgreen', 'lightpink', 'lightsalmon', 'lightseagreen', 'lightskyblue', 'lightslategray', 'lightslategrey', 'lightsteelblue', 'lightyellow', 'lime', 'limegreen', 'linen', 'magenta', 'maroon', 'mediumaquamarine', 'mediumblue', 'mediumorchid', 'mediumpurple', 'mediumseagreen', 'mediumslateblue', 'mediumspringgreen', 'mediumturquoise', 'mediumvioletred', 'midnightblue', 'mintcream', 'mistyrose', 'moccasin', 'navajowhite', 'navy', 'oldlace', 'olive', 'olivedrab', 'orange', 'orangered', 'orchid', 'palegoldenrod', 'palegreen', 'paleturquoise', 'palevioletred', 'papayawhip', 'peachpuff', 'peru', 'pink', 'plum', 'powderblue', 'purple', 'red', 'rosybrown', 'royalblue', 'rebeccapurple', 'saddlebrown', 'salmon', 'sandybrown', 'seagreen', 'seashell', 'sienna', 'silver', 'skyblue', 'slateblue', 'slategray', 'slategrey', 'snow', 'springgreen', 'steelblue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'whitesmoke', 'yellow', 'yellowgreen'}
HEX_COLOR_REGEX = re.compile(r'^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$')


#helper functions
def eprint(*myargs, **kwargs):
    """
    print to stderr, useful for error messages
    """
    print(*myargs, file=sys.stderr, **kwargs)


def query2publications(query, description='', debug=False, apilinks=None):
    """
    return total number of hits of a query submitted to Europe PMC database
    """
    headers = {'user-agent': 'resourceimpact/0.5', 'Accept' : 'application/json'}
    params = {'resultType': "idlist", 'format': "json", 'pageSize': 1, 'query': query}
    api_url = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"

    s = requests.Session() #for retries
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
    s.mount('http://', HTTPAdapter(max_retries=retries))

    r = s.get(api_url, params=params, headers=headers)
    if r.status_code in (200, 206):
        parsed = r.json()
        hits = parsed['hitCount']
        if debug:
            eprint(r.url)
            eprint('{}: {}'.format(description, hits))
        if apilinks is not None and os.path.isfile(apilinks):
            with open(apilinks, 'a') as apifh:
                apifh.write('<a href="{}">{}</a><br />\n'.format(r.url, description))
        return hits
    else:
        eprint("ERROR: problems querying EuropePMC, http status code is '{}'({})".format(r.status_code, r.reason))
        return -1


def query2patents(query, description='', debug=False):
    """
    return total number of hits of a query submitted to patent website
    query format example: query=(uniprot) AND ((pnctry:(US OR EP OR WO OR JP))) AND (pd:20220608)
    """
    query ='({}) AND ((pnctry:(US OR EP OR WO OR JP)))'.format(query)
    headers = {'user-agent': 'resourceimpact/0.5', 'Accept' : 'application/json'}
    params = {'itemsPerPage': 0, 'page': 1, 'query': query}
    api_url = "https://www.surechembl.org/api/search/content"

    s = requests.Session() #for retries
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
    s.mount('http://', HTTPAdapter(max_retries=retries))

    r = s.post(api_url, params=params, headers=headers)
    if r.status_code in (200, 206):
        parsed = r.json()
        hits = parsed['data']['results']['total_hits']
        if debug:
            eprint(r.url)
            eprint('{}: {}'.format(description, hits))
        return hits
    else:
        eprint("ERROR: problems querying SureCHEMBL, http status code is '{}'({})".format(r.status_code, r.reason))
        return -1


def barplot(x, y, title='', xaxis='', resulttype='papers', filename='resource_impact', sort=True):
    """
    plot a bar plot
    return a dataframe with the data used
    """
    layout = go.Layout(title='Number of {} mentioning {}<br />{}'.format(resulttype, query, title),
                   #barmode="stack",
                   xaxis_title = xaxis,
                   #xaxis_showticklabels = False,
                   yaxis_title = 'Number of {}'.format(resulttype))
    trace = go.Bar(x=x, y=y, marker_color=bar_colors, text=y)
    plot = go.Figure([trace], layout)
    plot.update_xaxes(type='category')

    if sort:
        plot.update_xaxes(categoryorder='total ascending')

    if filename:
        #update filename for downloadable plot image
        plotly_config['toImageButtonOptions']['filename']=filename


    data = {}
    data[xaxis] = x
    data[resulttype] = y
    df = pd.DataFrame(data)
    plot.show(config=plotly_config, renderer='notebook')
    return df


def show_form():
    display(query_widget)
    display(years_slider)
    display(granters_widget)
    display(bars_color_scheme_widget)
    
    
def print_parameters():
    clear_output(wait=True)
    show_form()
    print('Your chosen parameters:\n\tquery string is: {}'.format(query))
    print('\ttime frame: from {} to {}'.format(year_from, year_to))
    print('\tgrant agencies: {}'.format(', '.join(grants)))
    
    
def query_handler(change):
    global query, query_short, query_keywords, resource_searchstring
    query = change.new
    query_keywords = query.split(',')
    if len(query_keywords) == 2:
        query_short = ', '.join(query_keywords[0:2])
    elif len(query_keywords) > 2:
        query_short = ', '.join(query_keywords[0:2]) + ', ...'
    else:
        query_short = query_keywords[0]
        
    resource_searchstring = '("{}")'.format('" OR "'.join(query_keywords))
    print_parameters()

    
def years_handler(change):
    global year_from, year_to, query_time_constraint
    year_from, year_to = [date.strftime('%Y') for date in change.new]
    #year_from, year_to = [date.strftime('%Y') for date in years_slider.value]
    if int(year_from) > int(year_to):
        eprint("\t\tATTENTION: 'year_to' cannot be smaller than 'year_from'! Defaulting year_to same as year_from: '{}'".format(year_from))
        year_to = year_from

    if int(year_to) > int(current_year):
        eprint("\t\tNOTICE: 'year_to' cannot be later than current year. Defaulting year_to to '{}'".format(current_year))
        year_to = str(current_year)

    query_time_constraint = ' and (FIRST_PDATE:[{}-01-01 TO {}-12-31])'.format(year_from, year_to)
    print_parameters()
    
    
def granters_handler(change):
    global granters, grants, all_grant_agencies_constraint
    granters = change.new
    if granters == 'US funders':
        grants = granters_nih + granters_us
    #elif granters == 'US not including NIH':
    #  grants = granters_us
    elif granters == 'UK funders':
        grants = granters_uk
    elif granters == 'NIH Institutes':
        grants = granters_nih
    elif granters == 'UK and US incl. NIH':
        grants = granters_nih + granters_us + granters_uk
    elif ',' in granters:
        grants = granters.replace(', ',',').split(',')
    else:
        grants = [granters]
        
    all_grant_agencies_constraint = ' and (GRANT_AGENCY:"{}")'.format('" or GRANT_AGENCY:"'.join(grants))
    print_parameters()
    
    
def color_scheme_handler(change):
    global bars_color_scheme, bar_colors
    bars_color_scheme = change.new
    #colors validator
    if bars_color_scheme == 'multicolor':
        #color list from plotly.express.colors.qualitative.Alphabet
        bar_colors = ['#AA0DFE','#3283FE','#85660D','#782AB6','#565656','#1C8356','#16FF32','#F7E1A0','#E2E2E2','#1CBE4F','#C4451C','#DEA0FD','#FE00FA','#325A9B','#FEAF16','#F8A19F','#90AD1C','#F6222E','#1CFFCE','#2ED9FF','#B10DA1','#C075A6','#FC1CBF','#B00068','#FBE426','#FA0087']
    elif bars_color_scheme == 'default(muted blue)':
        bar_colors = '#1f77b4'
    elif HEX_COLOR_REGEX.search(bars_color_scheme) or bars_color_scheme in css_colors:
        bar_colors = bars_color_scheme
    else:
        eprint("WARNING: specified color is not valid. Defaulting to muted blue")
        bar_colors = '#1f77b4'
    print_parameters()
    
    
#QUERY string
query = 'UniprotKB,Uniprot,Trembl,SwissProt,Swiss-Prot,Uniparc,Uniref,UniRef100,UniRef90,UniRef50'
query_keywords = ['UniprotKB', 'Uniprot', 'Trembl', 'SwissProt', 'Swiss-Prot', 'Uniparc', 'Uniref', 'UniRef100', 'UniRef90', 'UniRef50']
query_short = "UniprotKB, Uniprot, ..."
resource_searchstring = '("UniprotKB" OR "Uniprot" OR "Trembl" OR "SwissProt" OR "Swiss-Prot" OR "Uniparc" OR "Uniref" OR "UniRef100" OR "UniRef90" OR "UniRef50")'
query_widget = widgets.Dropdown(
    options=['UniprotKB,Uniprot,Trembl,SwissProt,Swiss-Prot,Uniparc,Uniref,UniRef100,UniRef90,UniRef50', 'Ensembl,EnsemblGenomes,EnsemblPlants,EnsemblBacteria,EnsemblFungi,EnsemblProtists,EnsemblVertebrates', 'RefSeq', 'OMIM,Online Mendelian Inheritance in Man', 'SGD,Saccharomyces Genome Database,Yeastgenome.org,yeast genome database', 'Zfin,zfin.org,Zebrafish Information Network', 'MGI,MGD,Mouse Genome Informatics,informatics.jax.org', 'Rat genome database,rgd.mcw.edu', 'FlyBase,flybase.org', 'Gene Ontology,geneontology', 'Wormbase,wormbase.org', 'TAIR,arabidopsis.org,Arabidopsis information resource'],
    value=query,
    description='Query:',
    layout={'width': '30em'},
    disabled=False,
)
query_widget.observe(query_handler, names='value')

#YEAR range
current_year = datetime.datetime.now().date().year
year_from, year_to = str(current_year - 5), str(current_year)
query_time_constraint = " and (FIRST_PDATE:[{}-01-01 TO {}-12-31])".format(current_year - 5, current_year)
start_date = datetime.date(1980, 1, 1)
end_date = datetime.date(current_year, 12, 31)

dates = pd.date_range(start_date, end_date, freq='Y')
options = [(date.strftime(' %Y '), date) for date in dates]
index = (0, len(options)-1)

years_slider = widgets.SelectionRangeSlider(
    options=options,
    index=index,
    description='Years range',
    orientation='horizontal',
    layout={'width': '30em'},
)
years_slider.value = (dates[-6],dates[-1]) #default: last 6 years
years_slider.observe(years_handler, names='value')

#GRANTERS dropdown
granters = "US funders"
grants = granters_nih + granters_us
all_grant_agencies_constraint = ' and (GRANT_AGENCY:"{}")'.format('" or GRANT_AGENCY:"'.join(grants))
resource_searchstring = '("UniprotKB" OR "Uniprot" OR "Trembl" OR "SwissProt" OR "Swiss-Prot" OR "Uniparc" OR "Uniref")'
granters_widget = widgets.Dropdown(
    options = ["US funders", "UK funders", "NIH Institutes", "UK and US incl. NIH", "National Natural Science Foundation of China", "NCI NIH HHS", "NIGMS NIH HHS", "NHLBI NIH HHS", "NIAID NIH HHS", "NIDDK NIH HHS", "NINDS NIH HHS", "National Institutes of Health", "NICHD NIH HHS", "NIMH NIH HHS", "NCRR NIH HHS", "NIA NIH HHS", "Medical Research Council", "Wellcome Trust", "NIDA NIH HHS", "National Science Foundation", "NCATS NIH HHS", "National Institute for Health Research (NIHR)", "NEI NIH HHS", "European Research Council", "Deutsche Forschungsgemeinschaft", "NIEHS NIH HHS", "NIAMS NIH HHS", "PHS HHS", "Japan Society for the Promotion of Science", "Biotechnology and Biological Sciences Research Council", "National Research Foundation of Korea", "Canadian Institutes of Health Research", "Intramural NIH HHS", "NIH HHS", "Engineering and Physical Sciences Research Council", "National Key Research and Development Program of China", "NIAAA NIH HHS", "NIBIB NIH HHS", "Ministry of Education, Culture, Sports, Science and Technology", "NIH", "NIDCR NIH HHS", "NIADDK NIH HHS", "NIDCD NIH HHS", "Swiss National Science Foundation", "Fundamental Research Funds for the Central Universities", "CIHR", "National Cancer Institute", "Natural Sciences and Engineering Research Council of Canada", "Conselho Nacional de Desenvolvimento Científico e Tecnológico", "China Postdoctoral Science Foundation", "Cancer Research UK", "National Institute of General Medical Sciences", "Howard Hughes Medical Institute", "Ministry of Science and Technology of the People&apos;s Republic of China", "Coordenação de Aperfeiçoamento de Pessoal de Nível Superior", "NHGRI NIH HHS", "Australian Research Council", "British Heart Foundation", "National Health and Medical Research Council", "Agence Nationale de la Recherche", "National Institute of Mental Health", "Austrian Science Fund FWF", "U.S. Department of Energy", "Dutch Research Council (NWO)", "AHRQ HHS", "FIC NIH HHS", "National Heart, Lung, and Blood Institute", "European Regional Development Fund", "NIMHD NIH HHS", "Ministry of Science and Technology, Taiwan", "European Commission", "National Institute on Aging", "China Scholarship Council", "Chinese Academy of Sciences", "Natural Environment Research Council", "NINR NIH HHS", "Fundação de Amparo à Pesquisa do Estado de São Paulo", "National Institute of Allergy and Infectious Diseases", "Natural Science Foundation of Jiangsu Province", "Novo Nordisk Fonden", "Economic and Social Research Council"],
    value=granters,
    description='Granters:',
    layout={'width': '30em'},
    disabled=False,
)
granters_widget.observe(granters_handler, names='value')

#COLOR scheme
bars_color_scheme = 'default(muted blue)'
bar_colors = '#1f77b4'
bars_color_scheme_widget = widgets.Dropdown(
    options=['default(muted blue)', 'black', 'multicolor', 'darkgreen', 'darkgray', 'goldenrod', 'teal', 'slateblue', 'blueviolet', '#ff7f0e'],
    value=bars_color_scheme,
    description='Color scheme:',
    layout={'width': '30em'},
    disabled=False,
)
bars_color_scheme_widget.observe(color_scheme_handler, names='value')

initialized_code = True

# Parameters

In [None]:
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE cell (the one above this one) in order to use this notebook. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #DISPLAY
    show_form()
    print_parameters()

## Plots

In [None]:
#@title **Generate plot of total results** { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #total result(s)
    hits = []
    names = []
    apilinksfile='totals_api.html'
    open(apilinksfile, 'w').close() #init file
    hits.append(query2publications(resource_searchstring, 'Total papers', apilinks=apilinksfile))
    names.append('Total papers')
    if int(year_from) > int(year_to):
        eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
    else:
        eprint('Querying publication and patent databases')
        t = tqdm(total=4, unit='query', ascii=" ▖▘▝▗▚▞█")
        desc = 'Papers from {} to {}'.format(year_from, year_to)
        hits.append(query2publications(resource_searchstring + query_time_constraint, desc, apilinks=apilinksfile))
        names.append(desc); t.update(1)

        desc = 'Papers acknowledging "{}"'.format(granters)
        hits.append(query2publications(resource_searchstring + all_grant_agencies_constraint, desc, apilinks=apilinksfile))
        names.append(desc); t.update(1)

        desc = 'Acknowledging "{}", {} to {}'.format(granters, year_from, year_to)
        hits.append(query2publications(resource_searchstring + query_time_constraint + all_grant_agencies_constraint, desc, apilinks=apilinksfile))
        names.append(desc); t.update(1)

        desc = 'Patents from {} to {}'.format(year_from, year_to)
        patent_hits = query2patents(resource_searchstring + ' AND (pd:[{}0101 TO {}1231])'.format(year_from, year_to), desc)
        hits.append(patent_hits)
        names.append(desc); t.update(1)

        t.close()
        
        if patent_hits == -1:
            print("\"{}{}WARNING! Timeout when retrieving patent data.\"{}".format(BOLD, RED, END))
            
        df = barplot(names, hits, title='{}'.format('; '.join(names)), xaxis='Set', filename='totals', resulttype='papers/patents', sort=False)
        df.to_csv('totals.csv', encoding = 'utf-8-sig')

If you wish to download the data from the previous plot (once it is completely generated): [totals.csv](./totals.csv)

If you wish to explore further the list of mentions, check the API URL used for fetching publication data: [totals_api.html](./totals_api.html).  
Please refer to the [EuropePMC API documentation](https://europepmc.org/RestfulWebService) for the available options.

In [None]:
#@title **Generate plots by year** (both in total and acknowledging any of the grant agencies specified) { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #time plot
    if resource_searchstring:
        if int(year_from) > int(year_to):
            eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
        else:
            years = list(range(int(year_from), int(year_to) + 1))
            hits_by_year = []
            hits_by_year_selected_grants = []
            eprint('Querying publication database for {} years'.format(len(years)))
        for year in tqdm(years, unit='query', ascii=" ▖▘▝▗▚▞█"):
            query_year_constraint = ' and (FIRST_PDATE:[{0}-01-01 TO {0}-12-31])'.format(year)
            hits_by_year.append(query2publications(resource_searchstring + query_year_constraint, '{}'.format(year)))
            hits_by_year_selected_grants.append(query2publications(resource_searchstring + query_year_constraint + all_grant_agencies_constraint, '{}'.format(year)))
        df = barplot(years, hits_by_year, title='from {} to {}'.format(year_from, year_to), xaxis='Year', filename='by_year', sort=False)
        df.to_csv('by_year.csv', encoding = 'utf-8-sig')
        df = barplot(years, hits_by_year_selected_grants, title='acknowledging "{}", from {} to {}'.format(granters, year_from, year_to), xaxis='Year', filename='by_year_grant', sort=False)
        df.to_csv('by_year_grant.csv', encoding = 'utf-8-sig')
    else:
        eprint("ERROR: query string not specified")

If you wish to download the data from the previous plots (once they are completely generated):
[by_year.csv](./by_year.csv)
and
[by_year_grant.csv](./by_year_grant.csv)

In [None]:
#@title **Generate plots by grant agency** (in the timeframe specified) { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #grants plot
    if resource_searchstring and len(grants) and int(year_from) <= int(year_to):
        #hits_by_grant = []
        hits_by_grant_timeframe = []
        eprint('Querying publication database for {} funders'.format(len(grants)))
        for grant in tqdm(grants, unit='query', ascii=" ▖▘▝▗▚▞█"):
            grant_agency_constraint = ' and (GRANT_AGENCY:"{}")'.format(grant)
            #hits_by_grant.append(query2publications(resource_searchstring + grant_agency_constraint))
            hits_by_grant_timeframe.append(query2publications(resource_searchstring + grant_agency_constraint + query_time_constraint))
        #barplot(grants, hits_by_grant, title='by grant agency', xaxis='Grant agency', sort=True)
        df = barplot(grants, hits_by_grant_timeframe, title='by grant agency acknowledged, from {} to {}'.format(year_from, year_to), xaxis='Grant agency', filename='by_grant_agency', sort=True)
        df.to_csv('by_grant_agency.csv', encoding = 'utf-8-sig')

If you wish to download the data from the previous plot (once it is completely generated): 
[by_grant_agency.csv](./by_grant_agency.csv)

In [None]:
#@title **Generate plot by paper section** { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #paper section plot
    #hits_by_section = []
    hits_by_section_timeframe = []
    eprint('Querying publication database for {} paper sections'.format(len(paper_sections)))
    for section in tqdm(paper_sections, unit='query', ascii=" ▖▘▝▗▚▞█"):
        section_keywords = []
        for keyword in query_keywords:
            section_keywords.append('({}:"{}")'.format(section, keyword))
            section_resource_searchstring = '({})'.format(' OR '.join(section_keywords))
        #hits_by_section.append(query2publications(section_resource_searchstring, 'section {}'.format(section)))
        hits_by_section_timeframe.append(query2publications(section_resource_searchstring + query_time_constraint, 'section {}'.format(section)))
    #barplot(paper_sections, hits_by_section, title='by paper section', xaxis='Paper section', sort=True)
    df = barplot(paper_sections, hits_by_section_timeframe, title='by paper section, from {} to {}'.format(year_from, year_to), xaxis='Paper section', filename='by_paper_section', sort=True)
    df.to_csv('by_paper_section.csv', encoding = 'utf-8-sig')

If you wish to download the data from the previous plot (once it is completely generated): [by_paper_section.csv](./by_paper_section.csv)

In [None]:
#@title **Generate plots by year of patent mentions**
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    if resource_searchstring:
        if int(year_from) > int(year_to):
            eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
        else:
            years = list(range(int(year_from), int(year_to) + 1))
            patents_by_year = []
            eprint('Querying patent database for {} years'.format(len(years)))
            for year in tqdm(years, unit='query', ascii=" ▖▘▝▗▚▞█"):
                patent_hits = query2patents(resource_searchstring + ' AND (pd:[{0}0101 TO {0}1231])'.format(year), year)
                patents_by_year.append(patent_hits)
                if patent_hits == -1:
                    print("\"{}{}WARNING! Timeout when retrieving patent data for {}.\"{}".format(BOLD, RED, year, END))
            df = barplot(years, patents_by_year, title='from {} to {}'.format(year_from, year_to), xaxis='Year', filename='patents_by_year', resulttype='patents', sort=False)
            df.to_csv('patents_by_year.csv', encoding = 'utf-8-sig')
    else:
        eprint("ERROR: query string not specified")

If you wish to download the data from the previous plot (once it is completely generated): [patents_by_year.csv](./patents_by_year.csv)