# Molecular Biology Database Metrics Colab Page
At [UniProt](https://uniprot.org) we have developed some useful metrics over the years to measure and report our progress and impact to funders and policy makers. As part of an administrative supplement awarded by NIH [ODSS](https://datascience.nih.gov/about/odss) we have created this Google Colab page to enable other data resource providers to easily run these metrics.

This page allows you to investigate how often a resource is mentioned in the full text literature and how often it is mentioned in papers that cite specific funding sources. You need not limit yourself to only looking at database resources. You can use any keywords to investigate the impact of experimental techniques or anything else you can think of.

If you have specific feedback on this notebook then please contact Alex Bateman (<agb@ebi.ac.uk>).
We would like to fund the fantastic work of [Europe PubMed Central](https://europepmc.org/) and [ChEMBL](https://www.ebi.ac.uk/chembl/) for making the APIs that power much of this page.


## Contents

This notebook will perform queries in publication ([EuropePMC](https://europepmc.org/)) and patents ([SureCHEMBL](https://www.surechembl.org/)) databases to compute and display graphically the impact of a resource.
In particular plots will be generated that
- show how the number of mentions changes over time
- group publications by the grant agency they acknowledge
- find in which section of a publication a resource is mentioned.

# Instructions

Start by executing BOTH the (⏵︎) **Code** and the **Parameters** cells (wherein you can specify what to query for).

After doing that, to (re)generate the plots, execute (⏵︎) the corresponding cells.
Alternatively, after choosing what to query in the **Parameters** section, you can choose **Run all** from the **Runtime** menu at the top of the page.

If this is your first time using an interactive notebook or colab, you may want to check the [colab introductory page](https://colab.research.google.com/)

### Changelog:

    2023.03.12 first version with basic query capabilities
    2023.03.13 added plot generation via plotly and parameters selection via form
    2023.03.14 interface and plot improvements; added capability to download data as csv, introduction text and first tests of patent info retrieval
    2023.03.15 improved plot interface, also with possibility to draw on plots; plots can be saved with meaningful filenames; added retrieval of patent information from SureChEMBL
    2023.03.16 added retrieval of total patents in timeframe; added progress bars and error messages if Code has not been initialised
    2023.06.27 changed to ascii progress bar to avoid incompatibility with plotly display
    2023.08.25 converted from colab version to run independently as jupyter notebook

### Github repository

https://github.com/g-insana/MBDBMetrics
(if you want to download and run it independently in a jupyter enviroment)

### Credits:

Code by Alex Bateman, Alex Ignatchenko and Giuseppe Insana

© 2023- [UniProt consortium](https://www.uniprot.org/help/about)

In [None]:
#@title **Code** (execute this first) { display-mode: "form" }
#imports
import re
import sys
import pandas
import plotly.graph_objects as go
import requests
from requests.adapters import HTTPAdapter, Retry
import datetime
from time import sleep
from tqdm import tqdm
import plotly.io as pio
pio.renderers.default = 'colab'


#plotly config
plotly_config = {
    'displayModeBar': True,
    'toImageButtonOptions': {
        'format': 'png', # one of png, svg, jpeg, webp
        'filename': 'impact_plot',
        #'width': 1280,
        #'height': 1024,
    },
    'modeBarButtonsToRemove': ['lasso2d'],
    'modeBarButtonsToAdd': ['drawline', 'drawopenpath']
}

#helper functions
#paper sections:
paper_sections=['ABBR','ACK_FUND','APPENDIX','AUTH_CON','CASE','COMP_INT','FIG','OTHER','REF','TABLE','ABSTRACT','INTRO','KEYWORD','METHODS','RESULTS','DISCUSS','CONCL','SUPPL'];

def eprint(*myargs, **kwargs):
    """
    print to stderr, useful for error messages
    """
    print(*myargs, file=sys.stderr, **kwargs)


def query2publications(query, description='', debug=False):
    """
    return total number of hits of a query submitted to Europe PMC database
    """
    headers = {'user-agent': 'resourceimpact/0.5', 'Accept' : 'application/json'}
    params = {'resultType': "idlist", 'format': "json", 'pageSize': 1, 'query': query}
    api_url = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"

    #for retries
    s = requests.Session()
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
    s.mount('http://', HTTPAdapter(max_retries=retries))

    r = s.get(api_url, params=params, headers=headers)
    if r.status_code in (200, 206):
        parsed = r.json()
        hits = parsed['hitCount']
        if debug:
            eprint(r.url)
            eprint('{}: {}'.format(description, hits))
        return hits
    else:
        eprint("ERROR: problems querying EuropePMC, http status code is '{}'({})".format(r.status_code, r.reason))
        return -1


def query2patents(query, description='', debug=False):
    """
    return total number of hits of a query submitted to patent website
    query format example: query=(uniprot) AND ((pnctry:(US OR EP OR WO OR JP))) AND (pdates:2022)
    """
    json_data = {
        'query': '{} AND ((pnctry:(US OR EP OR WO OR JP)))'.format(query),
        'dev': "", 'child_of': "", 'data_source': 'PATENTS',
        'save_search': "yes", 'save_search_label': "", 'max_hits': 1#,
        #'search_appendix': [{"authorities":{"authorities-all-annotated":"checked"},"sure-query":"diabet*"},{"authorities":{"key":"Authorities+searched","values":["All+chemically+annotated+authorities"]},"sure-query":"Query"}]
    }
    headers = {'user-agent': 'resourceimpact/0.5', 'Accept' : 'application/json', 'Content-Type': 'application/x-www-form-urlencoded'}
    #for retries
    s = requests.Session()
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 400, 502, 503, 504 ])
    s.mount('http://', HTTPAdapter(max_retries=retries))

    api_url = "https://www.surechembl.org/api/search/content"
    if debug: eprint('querying surechembl with query {}'.format(query))
    r = s.get(api_url, headers=headers, data=json_data)
    attempt = 0
    max_attempts = 5
    search_hash = ''
    search_completed = False
    if r.status_code in (200, 206):
        search_hash = r.json()['data']['hash']
        status_url = 'https://www.surechembl.org/api/search/{}/status'.format(search_hash)
        while attempt < max_attempts:
            attempt += 1
            if debug: eprint('attempt {}/{}'.format(attempt, max_attempts))
            r = s.get(status_url, headers=headers)
            if r.status_code in (200, 206) and r.json()['status'] == 'OK' and r.json()['data']['message'] == 'Searching finished.':
                search_completed = True
                break;
            sleep(2) #sleep for 2 seconds and re-try to check whether search has completed
        if search_completed:
            results_url = 'https://www.surechembl.org/api/search/{}/results'.format(search_hash)
            r = s.get(results_url, headers=headers, params={'page': 1, 'max_results': 0})
            if r.status_code in (200, 206):
                parsed = r.json()
                hits = parsed['data']['results']['total_hits']
                if debug:
                    eprint('{}: {}'.format(description, hits))
                return hits
    eprint("ERROR: problems querying SureCHEMBL, http status code is '{}'({}), attempts: {}".format(r.status_code, r.reason, attempt))
    return -1


def barplot(x, y, title='', xaxis='', resulttype='papers', filename='resource_impact', sort=True):
    """
    plot a bar plot
    return a dataframe with the data used
    """
    layout = go.Layout(title='Number of {} mentioning {}<br />{}'.format(resulttype, query, title),
                   #barmode="stack",
                   xaxis_title = xaxis,
                   #xaxis_showticklabels = False,
                   yaxis_title = 'Number of {}'.format(resulttype))
    trace = go.Bar(x=x, y=y, marker_color=bar_colors, text=y)
    plot = go.Figure([trace], layout)
    plot.update_xaxes(type='category')

    if sort:
        plot.update_xaxes(categoryorder='total ascending')

    if filename:
        #update filename for downloadable plot image
        plotly_config['toImageButtonOptions']['filename']=filename


    data = {}
    data[xaxis] = x
    data[resulttype] = y
    df = pandas.DataFrame(data)
    plot.show(config=plotly_config, renderer='colab')
    return df

initialized_code = True

In [None]:
#@title **Parameters** (choose what to query)<br>*Note: for query keyword(s), grant agency and colour parameters you can select from the drop down menus or type in a custom value* { run: "auto" }
#query items:
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE cell (the one above this one) in order to use this notebook. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    query = 'UniprotKB,Uniprot,Trembl,SwissProt,Swiss-Prot,Uniparc,Uniref' #@param ['UniprotKB,Uniprot,Trembl,SwissProt,Swiss-Prot,Uniparc,Uniref', 'Ensembl,EnsemblGenomes,EnsemblPlants,EnsemblBacteria,EnsemblFungi,EnsemblProtists,EnsemblVertebrates', 'RefSeq', 'OMIM,Online Mendelian Inheritance in Man', 'SGD,Saccharomyces Genome Database,Yeastgenome.org,yeast genome database', 'Zfin,zfin.org,Zebrafish Information Network', 'MGI,MGD,Mouse Genome Informatics,informatics.jax.org', 'Rat genome database,rgd.mcw.edu', 'FlyBase,flybase.org', 'Gene Ontology,geneontology', 'Wormbase,wormbase.org', 'TAIR,arabidopsis.org,Arabidopsis information resource'] {allow-input: true}
    query_keywords = query.split(',')
    if len(query_keywords) == 2:
        query_short = ', '.join(query_keywords[0:2])
    elif len(query_keywords) > 2:
        query_short = ', '.join(query_keywords[0:2]) + ', ...'
    else:
        query_short = query_keywords[0]
    resource_searchstring = '("{}")'.format('" OR "'.join(query_keywords))
    print('Your chosen parameters:\n\tquery string is: {}'.format(query))

    current_year = datetime.datetime.now().date().year

    #year_from = 2020 #@param {type:"slider", min:1980, max:2023, step:1}
    #year_to = 2022 #@param {type:"slider", min:1980, max:2023, step:1}
    year_from = "2018" #@param [1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023]
    year_to = "2023" #@param [1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023]

    if int(year_from) > int(year_to):
        eprint("\t\tATTENTION: 'year_to' cannot be smaller than 'year_from'! Defaulting year_to same as year_from: '{}'".format(year_from))
        year_to = year_from

    if int(year_to) > int(current_year):
        eprint("\t\tNOTICE: 'year_to' cannot be later than current year. Defaulting year_to to '{}'".format(current_year))
        year_to = str(current_year)


    query_time_constraint = ' and (FIRST_PDATE:[{}-01-01 TO {}-12-31])'.format(year_from, year_to)

    eprint('\ttime frame: from {} to {}'.format(year_from, year_to))

    granters = 'US funders' #@param ["US funders", "UK funders", "NIH Institutes", "UK and US incl. NIH", "NHGRI NIH HHS", "NIGMS NIH HHS", "NCI NIH HHS", "NHLBI NIH HHS", "NIAID NIH HHS", "NIDDK NIH HHS", "NINDS NIH HHS", "NIMH NIH HHS", "NCRR NIH HHS", "NICHD NIH HHS", "NIA NIH HHS", "NIDA NIH HHS", "NEI NIH HHS", "NIAMS NIH HHS", "NIEHS NIH HHS", "NIAAA NIH HHS", "NIDCR NIH HHS", "NIDCD NIH HHS", "NIBIB NIH HHS"] {allow-input: true}

    #grant agencies' lists:
    granters_nih = ['NIH','NHGRI NIH HHS','NIGMS NIH HHS','NCI NIH HHS',
                    'NHLBI NIH HHS','NIAID NIH HHS','NIDDK NIH HHS','NINDS NIH HHS',
                    'NIMH NIH HHS',	'NCRR NIH HHS','NICHD NIH HHS','NIA NIH HHS',
                    'NIDA NIH HHS','NEI NIH HHS','NIAMS NIH HHS','NIEHS NIH HHS',
                    'NIAAA NIH HHS','NIDCR NIH HHS','NIDCD NIH HHS','NIBIB NIH HHS']
    granters_us = ['Department of Energy','National Science Foundation','Bill and Melinda Gates',
                'Department of Defense','Department of Agriculture','Food and Drug Administration',
                'Environmental Protection Agency','Howard Hughes Medical Institute']
    granters_uk = ['Wellcome Trust','Medical Research Council','Biotechnology and Biological Sciences Research Council',
                'Engineering and Physical Sciences Research Council','Natural Environment Research Council',
                'Economic and Social Research Council','Arts and Humanities Research Council',
                'Science and Technology Facilities Council']

    if granters == 'US funders':
        grants = granters_nih + granters_us
    #elif granters == 'US not including NIH':
    #  grants = granters_us
    elif granters == 'UK funders':
        grants = granters_uk
    elif granters == 'NIH Institutes':
        grants = granters_nih
    elif granters == 'UK and US incl. NIH':
        grants = granters_nih + granters_us + granters_uk
    else:
        grants = [granters]

    all_grant_agencies_constraint = ' and (GRANT_AGENCY:"{}")'.format('" or GRANT_AGENCY:"'.join(grants))
    eprint('\tgrant agencies: {}'.format(', '.join(grants)))

    #colors validator
    bars_color_scheme = 'default(muted blue)' #@param ['default(muted blue)', 'black', 'multicolor', 'darkgreen', 'darkgray', 'goldenrod', 'teal', 'slateblue', 'blueviolet', '#ff7f0e'] {allow-input: true}
    css_colors = {'aliceblue', 'antiquewhite', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanchedalmond', 'blue', 'blueviolet', 'brown', 'burlywood', 'cadetblue', 'chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk', 'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod', 'darkgray', 'darkgrey', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen', 'darkslateblue', 'darkslategray', 'darkslategrey', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue', 'dimgray', 'dimgrey', 'dodgerblue', 'firebrick', 'floralwhite', 'forestgreen', 'fuchsia', 'gainsboro', 'ghostwhite', 'gold', 'goldenrod', 'gray', 'grey', 'green', 'greenyellow', 'honeydew', 'hotpink', 'indianred', 'indigo', 'ivory', 'khaki', 'lavender', 'lavenderblush', 'lawngreen', 'lemonchiffon', 'lightblue', 'lightcoral', 'lightcyan', 'lightgoldenrodyellow', 'lightgray', 'lightgrey', 'lightgreen', 'lightpink', 'lightsalmon', 'lightseagreen', 'lightskyblue', 'lightslategray', 'lightslategrey', 'lightsteelblue', 'lightyellow', 'lime', 'limegreen', 'linen', 'magenta', 'maroon', 'mediumaquamarine', 'mediumblue', 'mediumorchid', 'mediumpurple', 'mediumseagreen', 'mediumslateblue', 'mediumspringgreen', 'mediumturquoise', 'mediumvioletred', 'midnightblue', 'mintcream', 'mistyrose', 'moccasin', 'navajowhite', 'navy', 'oldlace', 'olive', 'olivedrab', 'orange', 'orangered', 'orchid', 'palegoldenrod', 'palegreen', 'paleturquoise', 'palevioletred', 'papayawhip', 'peachpuff', 'peru', 'pink', 'plum', 'powderblue', 'purple', 'red', 'rosybrown', 'royalblue', 'rebeccapurple', 'saddlebrown', 'salmon', 'sandybrown', 'seagreen', 'seashell', 'sienna', 'silver', 'skyblue', 'slateblue', 'slategray', 'slategrey', 'snow', 'springgreen', 'steelblue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'whitesmoke', 'yellow', 'yellowgreen'}
    HEX_COLOR_REGEX = re.compile(r'^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$')
    if bars_color_scheme == 'multicolor':
        #color list from plotly.express.colors.qualitative.Alphabet
        bar_colors = ['#AA0DFE','#3283FE','#85660D','#782AB6','#565656','#1C8356','#16FF32','#F7E1A0','#E2E2E2','#1CBE4F','#C4451C','#DEA0FD','#FE00FA','#325A9B','#FEAF16','#F8A19F','#90AD1C','#F6222E','#1CFFCE','#2ED9FF','#B10DA1','#C075A6','#FC1CBF','#B00068','#FBE426','#FA0087']
    elif bars_color_scheme == 'default(muted blue)':
        bar_colors = '#1f77b4'
    elif HEX_COLOR_REGEX.search(bars_color_scheme) or bars_color_scheme in css_colors:
        bar_colors = bars_color_scheme
    else:
        eprint("WARNING: specified color is not valid. Defaulting to muted blue")
        bar_colors = '#1f77b4'

Your chosen parameters:
	query string is: UniprotKB,Uniprot,Trembl,SwissProt,Swiss-Prot,Uniparc,Uniref


	time frame: from 2018 to 2023
	grant agencies: NIH, NHGRI NIH HHS, NIGMS NIH HHS, NCI NIH HHS, NHLBI NIH HHS, NIAID NIH HHS, NIDDK NIH HHS, NINDS NIH HHS, NIMH NIH HHS, NCRR NIH HHS, NICHD NIH HHS, NIA NIH HHS, NIDA NIH HHS, NEI NIH HHS, NIAMS NIH HHS, NIEHS NIH HHS, NIAAA NIH HHS, NIDCR NIH HHS, NIDCD NIH HHS, NIBIB NIH HHS, Department of Energy, National Science Foundation, Bill and Melinda Gates, Department of Defense, Department of Agriculture, Food and Drug Administration, Environmental Protection Agency, Howard Hughes Medical Institute


In [None]:
#@title **Generate plot of total results** { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #total result(s)
    hits = []
    names = []
    hits.append(query2publications(resource_searchstring, 'Total papers'))
    names.append('Total papers')
    if int(year_from) > int(year_to):
        eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
    else:
        eprint('Querying publication and patent databases')
        t = tqdm(total=4, unit='query', ascii=" ▖▘▝▗▚▞█")
        desc = 'Papers from {} to {}'.format(year_from, year_to)
        hits.append(query2publications(resource_searchstring + query_time_constraint, desc))
        names.append(desc); t.update(1)

        desc = 'Papers acknowledging "{}"'.format(granters)
        hits.append(query2publications(resource_searchstring + all_grant_agencies_constraint, desc))
        names.append(desc); t.update(1)

        desc = 'Acknowledging "{}", {} to {}'.format(granters, year_from, year_to)
        hits.append(query2publications(resource_searchstring + query_time_constraint + all_grant_agencies_constraint, desc))
        names.append(desc); t.update(1)

        desc = 'Patents from {} to {}'.format(year_from, year_to)
        hits.append(query2patents(resource_searchstring + ' AND (pdates:[{}0101 TO {}1231])'.format(year_from, year_to), desc))
        names.append(desc); t.update(1)

        t.close()
        df = barplot(names, hits, title='{}'.format('; '.join(names)), xaxis='Set', filename='totals', resulttype='papers/patents', sort=False)
        df.to_csv('totals.csv', encoding = 'utf-8-sig')

Querying publication and patent databases
100%|██████████| 4/4 [00:05<00:00,  1.36s/query]


If you wish to download the data from the previous plot (once it is completely generated): /content/totals.csv

In [None]:
#@title **Generate plots by year** (both in total and acknowledging any of the grant agencies specified) { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #time plot
    if resource_searchstring:
        if int(year_from) > int(year_to):
            eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
        else:
            years = list(range(int(year_from), int(year_to) + 1))
            hits_by_year = []
            hits_by_year_selected_grants = []
            eprint('Querying publication database for {} years'.format(len(years)))
        for year in tqdm(years, unit='query', ascii=" ▖▘▝▗▚▞█"):
            query_year_constraint = ' and (FIRST_PDATE:[{0}-01-01 TO {0}-12-31])'.format(year)
            hits_by_year.append(query2publications(resource_searchstring + query_year_constraint, '{}'.format(year)))
            hits_by_year_selected_grants.append(query2publications(resource_searchstring + query_year_constraint + all_grant_agencies_constraint, '{}'.format(year)))
        df = barplot(years, hits_by_year, title='from {} to {}'.format(year_from, year_to), xaxis='Year', filename='by_year', sort=False)
        df.to_csv('by_year.csv', encoding = 'utf-8-sig')
        df = barplot(years, hits_by_year_selected_grants, title='acknowledging "{}", from {} to {}'.format(granters, year_from, year_to), xaxis='Year', filename='by_year_grant', sort=False)
        df.to_csv('by_year_grant.csv', encoding = 'utf-8-sig')
    else:
        eprint("ERROR: query string not specified")

Querying publication database for 6 years
100%|██████████| 6/6 [00:05<00:00,  1.02query/s]


If you wish to download the data from the previous plots (once they are completely generated): /content/by_year.csv /content/by_year_grant.csv

In [None]:
#@title **Generate plots by grant agency** (in the timeframe specified) { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #grants plot
    if resource_searchstring and len(grants) and int(year_from) <= int(year_to):
        #hits_by_grant = []
        hits_by_grant_timeframe = []
        eprint('Querying publication database for {} funders'.format(len(grants)))
        for grant in tqdm(grants, unit='query', ascii=" ▖▘▝▗▚▞█"):
            grant_agency_constraint = ' and (GRANT_AGENCY:"{}")'.format(grant)
            #hits_by_grant.append(query2publications(resource_searchstring + grant_agency_constraint))
            hits_by_grant_timeframe.append(query2publications(resource_searchstring + grant_agency_constraint + query_time_constraint))
        #barplot(grants, hits_by_grant, title='by grant agency', xaxis='Grant agency', sort=True)
        df = barplot(grants, hits_by_grant_timeframe, title='by grant agency acknowledged, from {} to {}'.format(year_from, year_to), xaxis='Grant agency', filename='by_grant_agency', sort=True)
        df.to_csv('by_grant_agency.csv', encoding = 'utf-8-sig')

Querying publication database for 28 funders
100%|██████████| 28/28 [00:13<00:00,  2.03query/s]


If you wish to download the data from the previous plot (once it is completely generated): /content/by_grant_agency.csv

In [None]:
#@title **Generate plot by paper section** { display-mode: "form" }
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    #paper section plot
    #hits_by_section = []
    hits_by_section_timeframe = []
    eprint('Querying publication database for {} paper sections'.format(len(paper_sections)))
    for section in tqdm(paper_sections, unit='query', ascii=" ▖▘▝▗▚▞█"):
        section_keywords = []
        for keyword in query_keywords:
            section_keywords.append('({}:"{}")'.format(section, keyword))
            section_resource_searchstring = '({})'.format(' OR '.join(section_keywords))
        #hits_by_section.append(query2publications(section_resource_searchstring, 'section {}'.format(section)))
        hits_by_section_timeframe.append(query2publications(section_resource_searchstring + query_time_constraint, 'section {}'.format(section)))
    #barplot(paper_sections, hits_by_section, title='by paper section', xaxis='Paper section', sort=True)
    df = barplot(paper_sections, hits_by_section_timeframe, title='by paper section, from {} to {}'.format(year_from, year_to), xaxis='Paper section', filename='by_paper_section', sort=True)
    df.to_csv('by_paper_section.csv', encoding = 'utf-8-sig')

Querying publication database for 18 paper sections
100%|██████████| 18/18 [00:09<00:00,  1.98query/s]


If you wish to download the data from the previous plot (once it is completely generated): /content/by_paper_section.csv

In [None]:
#@title **Generate plots by year of patent mentions**
try:
    initialized_code
except:
    print('ERROR: You need first to execute the CODE and PARAMETERS cells (top of the page) in order to generate plots. After doing that, re-execute this cell.')
    #raise SystemExit
else:
    if resource_searchstring:
        if int(year_from) > int(year_to):
            eprint("ATTENTION: 'year_to' cannot be smaller than 'year_from'!")
        else:
            years = list(range(int(year_from), int(year_to) + 1))
            patents_by_year = []
            eprint('Querying patent database for {} years'.format(len(years)))
            for year in tqdm(years, unit='query', ascii=" ▖▘▝▗▚▞█"):
                patents_by_year.append(query2patents(resource_searchstring + ' AND (pdates:{})'.format(year), year))
            df = barplot(years, patents_by_year, title='from {} to {}'.format(year_from, year_to), xaxis='Year', filename='patents_by_year', resulttype='patents', sort=False)
            df.to_csv('patents_by_year.csv', encoding = 'utf-8-sig')
    else:
        eprint("ERROR: query string not specified")

Querying patent database for 6 years
100%|██████████| 6/6 [00:22<00:00,  3.67s/query]


If you wish to download the data from the previous plot (once it is completely generated): /content/patents_by_year.csv