# Count publications in the IDR

This notebook uses the IDR OMERO.web API to fetch the top-level study metadata for all published studies, and fetches Pubmed metadata for each study that contains a Pubmed ID.

This is used to show the journals and number of studies associated with that journal in the IDR.

In [1]:
from collections import Counter
import requests
try:
    from tqdm import tqdm
except ImportError:
    tqdm = iter

IDR_BASE_URL = "https://idr.openmicroscopy.org"
INDEX_PAGE = f"{IDR_BASE_URL}/webclient/?experimenter=-1"
MAP_URL = '{base}/webclient/api/annotations/?type=map&{type}={id}'
PUBMED_URL = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id={pubmedid}&retmode=json'

OMERO.web uses a default session backend authentication
scheme to authenticate.
First create a HTTP session using the
[`requests`](http://docs.python-requests.org/en/master/) library:

In [2]:
session = requests.Session()
request = requests.Request('GET', INDEX_PAGE)
prepped = session.prepare_request(request)
response = session.send(prepped)
response.raise_for_status()

Get all studies (projects and screens)

In [3]:
screens = session.get(f'{IDR_BASE_URL}/api/v0/m/screens/').json()
projects = session.get(f'{IDR_BASE_URL}/api/v0/m/projects/').json()
print(f"Found {screens['meta']['totalCount']} screens {projects['meta']['totalCount']} projects")
assert (screens['meta']['totalCount'] <= screens['meta']['limit']), 'Paging required'
assert (projects['meta']['totalCount'] <= projects['meta']['limit']), 'Paging required'

studies = {
    'screen': screens['data'],
    'project': projects['data'],
}

Found 61 screens 58 projects


Get study map annotations, we're interested in the PubMed IDs

In [4]:
study_pubmedids = {}

for (study_type, container) in studies.items():
    for c in container:
        url = MAP_URL.format(base=IDR_BASE_URL,type=study_type, id=c['@id'])
        annotations = session.get(url).json()['annotations']
        for a in annotations:
            if a['ns'] == 'idr.openmicroscopy.org/study/info':
                m = dict(a['values'])
                name = c['Name'].split('/')[0]
                try:
                    pubmedid = m['PubMed ID'].split()[0]
                    if study_pubmedids.get(name, None):
                        assert study_pubmedids[name] == pubmedid
                    else:
                        study_pubmedids[name] = pubmedid                        
                except KeyError:
                    study_pubmedids[name] = None
                break

In [5]:
missing = []
for (name, pid) in sorted(study_pubmedids.items()):
    if study_pubmedids[name]:
        print(name, pid)
    else:
        missing.append(name)

if missing:
    print('\nNo PubmedID found for:\n\t' + '\n\t'.join(missing))

idr0001-graml-sysgro 25373780
idr0002-heriche-condensation 24943848
idr0003-breker-plasticity 23509072
idr0004-thorpe-rad52 18085829
idr0005-toret-adhesion 24446484
idr0006-fong-nuclearbodies 24127217
idr0007-srikumar-sumo 23547032
idr0008-rohn-actinome 21893601
idr0009-simpson-secretion 22660414
idr0010-doil-dnadamage 19203579
idr0012-fuchs-cellmorph 20531400
idr0013-neumann-mitocheck 20360735
idr0015-colin-taraoceans 29087936
idr0016-wawer-bioactivecompoundprofiling 28327978
idr0017-breinig-drugscreen 26700849
idr0019-sero-nfkappab 26148352
idr0020-barr-chtog 26037491
idr0021-lawo-pericentriolarmaterial 23086237
idr0022-koedoot-cellmigration 31278301
idr0023-szymborska-nuclearpore 23845946
idr0025-stadler-proteinatlas 22361696
idr0026-weigelin-immunotherapy 26034288
idr0027-dickerson-chromatin 27609610
idr0028-pascualvargas-rhogtpases 28248929
idr0030-sero-yap 28065575
idr0032-yang-meristem 27212401
idr0033-rohban-pathways 28315521
idr0034-kilpinen-hipsci 28489815
idr0035-caie-drugre

Now fetch the Pubmed metadata (See https://www.ncbi.nlm.nih.gov/pmc/tools/get-metadata/)

In [6]:
study_pubmedinfo = {}

for (name, pmid) in tqdm(study_pubmedids.items()):
    if pmid:
        pubmed = requests.get(PUBMED_URL.format(pubmedid=pmid)).json()
        study_pubmedinfo[name] = pubmed['result'][pmid]

100%|██████████| 73/73 [00:32<00:00,  2.28it/s]


Finally list the journals and the number of studies

In [7]:
journals = Counter((pub['fulljournalname']) for pub in study_pubmedinfo.values())
journals = sorted(journals.items(), key=lambda x: (-x[1], x[0]))

for j in journals:
    print(f'{j[0]:>40} {j[1]:3}')

             The Journal of cell biology   7
                   Nature communications   6
                                   eLife   6
               Molecular systems biology   5
                                PloS one   4
                Science (New York, N.Y.)   4
                                  Nature   3
                          Nature methods   3
                         Scientific data   3
                                    Cell   2
                            Cell systems   2
                     Nature cell biology   2
                      Scientific reports   2
                        BMC cell biology   1
                        Cancer discovery   1
                            Cell reports   1
                    Current biology : CB   1
        Development (Cambridge, England)   1
                      Developmental cell   1
                             GigaScience   1
                      Journal of anatomy   1
                   Journal of proteomics   1
          