# PubMed API Search

This script performs a pubmed search and gathers the pubmed IDs of papers that meet all of our concept, affiliation, author, and other criteria.

Uses the Entrez E-utilities: 

https://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Introduction

https://pubmed.ncbi.nlm.nih.gov/help/

Tags:
https://www.ncbi.nlm.nih.gov/pmc/about/userguide/

In [104]:
from bs4 import BeautifulSoup
import requests

## 1) Define and tidy inputs

Originally copy/pasted from "TB Center Metrics Analysis - revised April 2023.docx". 

Things I chagned manually:

- Replaced `("Kenya Medical Research Institute"[ad] OR Hospital"[ad])` with `"Kenya Medical Research Institute"[ad]` to prevent 'Hospital' being a search term.

### Date (changes quarterly)

In [105]:
pubdate = '2023/07/01:2023/09/30[pdat]'

### Affiliations

In [106]:
affiliation = '("university of washington"[ad] OR "univ washington"[ad] OR "washington.edu"[ad] OR "uw.edu"[ad] OR "fred hutch*"[ad] OR "fred hutchinson"[ad] OR "fredhutch.org"[ad] OR "Tuberculosis Research and Training Center"[ad] OR "Center for Infectious Disease Research"[ad] OR "Infectious Diseases Research Institute"[ad] OR "Infectious Disease Research Institute"[ad] OR "Fhcrc.org"[ad] OR "path"[ad] OR "Veterans Affairs Puget Sound"[ad] OR "VA Puget Sound"[ad] OR "Center for Global Infectious Disease Research"[ad] OR "seattlechild*"[ad] OR "seattlechildrens.org"[ad] OR "Seattle child*"[ad] OR "cidresearch.org"[ad] OR "sbri.org"[ad] OR "bill and melinda"[ad] OR "bill melinda"[ad] OR "gates found*"[ad] OR "gatesfoundation.org"[ad] OR "Harborview"[ad] OR "UW Medical Center"[ad] OR "Seattle Cancer Care Alliance"[ad] OR "TB Discovery Research"[ad] OR "seattle"[ad] OR "puget sound"[ad] OR "seattlebiomed.org"[ad] OR "Seattle biomedical research"[ad] OR "E-Science Institute"[ad] OR "Institute for Systems Biology"[ad] OR "sytemsbiology.org"[ad] OR "Lawrence Berkeley National Laboratory"[ad] OR "isbscience.org"[ad] OR "Seattle Structural Genomics Center"[ad] OR "PAI Life Sciences"[ad] OR "King County"[ad] OR "Kwazulu"[ad] OR "Natal"[ad] OR "Edendale Hospital"[ad] OR "south Africa"[ad] OR "University of Nairobi"[ad] OR "Kenya"[ad] OR "treatmentactiongroup.org"[ad] OR "TAG"[ad] OR "treatment action group"[ad]  OR ("Kenya Medical Research Institute"[ad] OR Hospital"[ad])'

In [107]:
affil_list = affiliation.\
    replace('(', '').\
    replace(')', '').\
    replace('"Kenya Medical Research Institute"[ad] OR Hospital"[ad]', 
            '"Kenya Medical Research Institute"[ad]').\
    replace('  OR ', ' OR ').\
    split(' OR ')
affil_list

['"university of washington"[ad]',
 '"univ washington"[ad]',
 '"washington.edu"[ad]',
 '"uw.edu"[ad]',
 '"fred hutch*"[ad]',
 '"fred hutchinson"[ad]',
 '"fredhutch.org"[ad]',
 '"Tuberculosis Research and Training Center"[ad]',
 '"Center for Infectious Disease Research"[ad]',
 '"Infectious Diseases Research Institute"[ad]',
 '"Infectious Disease Research Institute"[ad]',
 '"Fhcrc.org"[ad]',
 '"path"[ad]',
 '"Veterans Affairs Puget Sound"[ad]',
 '"VA Puget Sound"[ad]',
 '"Center for Global Infectious Disease Research"[ad]',
 '"seattlechild*"[ad]',
 '"seattlechildrens.org"[ad]',
 '"Seattle child*"[ad]',
 '"cidresearch.org"[ad]',
 '"sbri.org"[ad]',
 '"bill and melinda"[ad]',
 '"bill melinda"[ad]',
 '"gates found*"[ad]',
 '"gatesfoundation.org"[ad]',
 '"Harborview"[ad]',
 '"UW Medical Center"[ad]',
 '"Seattle Cancer Care Alliance"[ad]',
 '"TB Discovery Research"[ad]',
 '"seattle"[ad]',
 '"puget sound"[ad]',
 '"seattlebiomed.org"[ad]',
 '"Seattle biomedical research"[ad]',
 '"E-Science Insti

In [108]:
# Turn back into a big string
affil_str = '(' + ' OR '.join(affil_list) + ')'
affil_str

'("university of washington"[ad] OR "univ washington"[ad] OR "washington.edu"[ad] OR "uw.edu"[ad] OR "fred hutch*"[ad] OR "fred hutchinson"[ad] OR "fredhutch.org"[ad] OR "Tuberculosis Research and Training Center"[ad] OR "Center for Infectious Disease Research"[ad] OR "Infectious Diseases Research Institute"[ad] OR "Infectious Disease Research Institute"[ad] OR "Fhcrc.org"[ad] OR "path"[ad] OR "Veterans Affairs Puget Sound"[ad] OR "VA Puget Sound"[ad] OR "Center for Global Infectious Disease Research"[ad] OR "seattlechild*"[ad] OR "seattlechildrens.org"[ad] OR "Seattle child*"[ad] OR "cidresearch.org"[ad] OR "sbri.org"[ad] OR "bill and melinda"[ad] OR "bill melinda"[ad] OR "gates found*"[ad] OR "gatesfoundation.org"[ad] OR "Harborview"[ad] OR "UW Medical Center"[ad] OR "Seattle Cancer Care Alliance"[ad] OR "TB Discovery Research"[ad] OR "seattle"[ad] OR "puget sound"[ad] OR "seattlebiomed.org"[ad] OR "Seattle biomedical research"[ad] OR "E-Science Institute"[ad] OR "Institute for Syste

### Concepts

Don't change regularly. Copy/pasted from "TB Center Metrics Analysis - revised April 2023.docx". 

In [109]:
concept = '("tubercul*"[tw] OR "Antitubercul*"[tw] OR "Anti-Tubercul*"[tw] OR "osteotubercul*"[tw] OR "nephrotubercul*"[tw] OR "anthracosilicotubercul*"[tw] OR "coniotubercul*"[tw] OR "Tuberculin"[tw] OR "tb"[tw] OR "xdr-tb"[tw] OR "xdrtb"[tw] OR "mdr-tb"[tw] OR "mdrtb"[tw] OR "phthisis"[tw] OR "pneumonophthisis"[tw] OR "pneumophthisiology"[tw] OR "silicotubercul*"[tw] OR "bazin disease"[tw] OR "erythema induratum"[tw] OR "white swelling"[tw] OR "king`s evil"[tw] OR "scrofula"[tw] OR "pott disease"[tw] OR "koch`s disease"[tw] OR "Interferon-gamma Release Test"[tw] OR "Tuberculosis"[Mesh] OR "Mycobacterium tuberculosis"[Mesh] OR "Antitubercular Agents"[Mesh] OR "Tuberculin Test"[Mesh] OR "Interferon-gamma Release Tests"[Mesh] OR "Tuberculosis Vaccines"[Mesh])'
concept_list = concept.\
    replace('[Mesh]', '[mh]').\
    split(' OR ')

concept_list

['("tubercul*"[tw]',
 '"Antitubercul*"[tw]',
 '"Anti-Tubercul*"[tw]',
 '"osteotubercul*"[tw]',
 '"nephrotubercul*"[tw]',
 '"anthracosilicotubercul*"[tw]',
 '"coniotubercul*"[tw]',
 '"Tuberculin"[tw]',
 '"tb"[tw]',
 '"xdr-tb"[tw]',
 '"xdrtb"[tw]',
 '"mdr-tb"[tw]',
 '"mdrtb"[tw]',
 '"phthisis"[tw]',
 '"pneumonophthisis"[tw]',
 '"pneumophthisiology"[tw]',
 '"silicotubercul*"[tw]',
 '"bazin disease"[tw]',
 '"erythema induratum"[tw]',
 '"white swelling"[tw]',
 '"king`s evil"[tw]',
 '"scrofula"[tw]',
 '"pott disease"[tw]',
 '"koch`s disease"[tw]',
 '"Interferon-gamma Release Test"[tw]',
 '"Tuberculosis"[mh]',
 '"Mycobacterium tuberculosis"[mh]',
 '"Antitubercular Agents"[mh]',
 '"Tuberculin Test"[mh]',
 '"Interferon-gamma Release Tests"[mh]',
 '"Tuberculosis Vaccines"[mh])']

In [110]:
# Turn back into a big string
concept_str = ' OR '.join(concept_list)
concept_str

'("tubercul*"[tw] OR "Antitubercul*"[tw] OR "Anti-Tubercul*"[tw] OR "osteotubercul*"[tw] OR "nephrotubercul*"[tw] OR "anthracosilicotubercul*"[tw] OR "coniotubercul*"[tw] OR "Tuberculin"[tw] OR "tb"[tw] OR "xdr-tb"[tw] OR "xdrtb"[tw] OR "mdr-tb"[tw] OR "mdrtb"[tw] OR "phthisis"[tw] OR "pneumonophthisis"[tw] OR "pneumophthisiology"[tw] OR "silicotubercul*"[tw] OR "bazin disease"[tw] OR "erythema induratum"[tw] OR "white swelling"[tw] OR "king`s evil"[tw] OR "scrofula"[tw] OR "pott disease"[tw] OR "koch`s disease"[tw] OR "Interferon-gamma Release Test"[tw] OR "Tuberculosis"[mh] OR "Mycobacterium tuberculosis"[mh] OR "Antitubercular Agents"[mh] OR "Tuberculin Test"[mh] OR "Interferon-gamma Release Tests"[mh] OR "Tuberculosis Vaccines"[mh])'

### Author list 

Was split by author speciality and had redundancies. It seems that " ` " and " ' " are interchangable, and that a space in front of the name doesn't matter.

Manually replace:
- ' with `
- `” [Au] “` with `” [Au] OR “`
- `[Au]"` with `"[Au]`

In [111]:
# Copy/pasted from word doc
authors_clinic = '“Nag D”[Au] OR “Duffy F”[Au] OR “Kearney J”[Au] OR “Trehan I”[Au] OR “Cangelosi G”[Au] OR “Andriesen J”[Au] OR “Coler R”[Au] OR “Kgoadi K”[Au] OR “Fiore-Gartland  A”[Au] OR “Johnson A”[Au] OR “Rais M”[Au] OR “Sanders L”[Au] OR “Attia E”[Au] or “Horne D”[Au] OR “Okwaro E”[Au] OR “Tapley A”[Au] OR “Stein G”[Au] OR “Sorri Y”[Au] OR “Bender Ignacio R”[Au] OR “Wald A”[Au] OR “Jaspan H”[Au] OR “Kumar A”[Au] OR “Pecor T”[Au] OR “Zhu Z”[Au] OR “Mudrock E”[Au] OR “Olson A”[Au] OR “Kublin J”[Au] OR “Randhawa A”[Au] OR “Andersen-N E”[Au] OR “Mayer-Blackwell K”[Au] OR “Kieswetter N”[Au] OR “Harband M”[Au] OR “Harne R”[Au] OR “Akins S”[Au] OR “Larsen Akins S”[Au] OR “Valerie R”[Au] OR “Gottlieb G”[Au] OR “Shapiro A”[Au] OR “LaCourse S”[Au] OR “Graham S”[Au] OR “Smytheman T”[Au] OR “Cowan J”[Au] OR “Iribarren S”[Au]'
authors_compbio = '“Nag D” [Au] OR “Duffy F” [Au] OR “Fiore-Gartland A” [Au] OR “Mayer-Blackwell, K” [Au] OR “Kieswetter N” [Au] OR “Abdelaal H” [Au] OR “Segnitz M” [Au] OR “Ma S” [Au] OR “Pepper E” [Au] OR “Allen R” [Au] OR “Mixon T” [Au] OR “Himmelfarb T” [Au] OR “Bishop E” [Au] OR “Bustad E” [Au] OR “Grosvenor D” [Au]'
authors_epi = '“Nag D” [Au] OR “Kearney J” [Au] OR “Trehan I” [Au] OR “Cangelosi G” [Au] or “Johnson A” [Au] or “Johnson A” [Au] OR “Rais M” [Au] or “Attia E” [Au] OR “Horne D” [Au] OR “Okwaro E” [Au] OR “Sorri Y” [Au] OR “Bender Ignacio R” [Au] OR “Wald A” [Au] OR “Kublin J” [Au] OR “Gottlieb G” [Au] OR “Shapiro A” [Au] OR “LaCourse S” [Au] OR “Graham S” [Au] OR “Motiri F” [Au] OR “LeGrand K” [Au] OR “Cherkos A” [Au] OR “Fajans M” [Au] OR “Tram K” [Au] OR “Basting A” [Au] OR “Ross J” [Au] OR “King`ori B” [Au] OR “Ndoti A” [Au] OR “Black D” [Au] OR “Magomere R” [Au] OR “Kerani R” [Au] OR “Church E” [Au] OR  “Escudero J” [Au] OR “John-Stewart G” [Au] OR “Dalmat R” [Au] OR “Berkoh H” [Au]'
authors_globalhealth = '“Nag D” [Au] OR “Kearney J” [Au] OR “Trehan I” [Au] OR “Cangelosi G” [Au] or “Johnson A” [Au] or “Johnson A” [Au] OR “Rais M” [Au] or “Attia E” [Au] OR “Horne D” [Au] OR “Okwaro E” [Au] OR “Kublin J” [Au] OR “Gottlieb G” [Au] OR “Shapiro A” [Au] OR “LaCourse S” [Au] OR “Graham S” [Au] OR “Motiri F” [Au] OR “LeGrand K” [Au] OR “Cherkos A” [Au] OR “Fajans M” [Au] OR “Tram K” [Au] OR “Basting A” [Au] OR “Ndoti A” [Au] OR “Black D” [Au] OR “Escudero J” [Au] OR “John-Stewart G” [Au] OR “Abdelaal H” [Au] OR “Coler R” [Au] OR “Sanders L” [Au] OR “Tapley A” [Au] OR “Stein G” [Au] OR “Jaspan H” [Au] OR “Kumar A” [Au] OR “Pecor T” [Au] OR “Randhawa A” [Au] OR “Andersen-Nissen E” [Au] OR “Smytheman T” [Au] OR “Cowan J” [Au] OR “Iribarren S” [Au] OR “Sharma M” [Au] OR “Henry N” [Au] OR “Brumwell A” [Au] OR “Buckner F” [Au] OR “Duncombe C” [Au] OR “Armistead B” [Au] OR “Panpradist N” [Au] OR “Hill D” [Au] OR “Miller D” [Au] OR “Haynes A” [Au] OR “Chohan B” [Au] OR “Seshadri C” [Au] OR “Williams B” [Au] OR “Barrett H” [Au] OR “West E” [Au]'
authors_immun = '“Nag D” [Au] OR “Kearney J” [Au] OR “Johnson A” [Au] OR “Rais M” [Au] or “Kublin J” [Au] OR “Ndoti A” [Au] OR “Abdelaal H” [Au] OR “Coler R” [Au] OR “Jaspan H” [Au] OR “Kumar A” [Au] OR “Pecor T” [Au] OR “Randhawa A” [Au] OR “Andersen-Nissen E” [Au] OR “Duncombe C” [Au] OR “Armistead B” [Au] OR “Miller D” [Au] OR “Seshadri C” [Au] OR “Williams B” [Au] OR “Barrett H” [Au] OR “West E” [Au] OR “Church E” [Au] OR  “Duffy F” [Au] OR “Fiore-Gartland A” [Au] OR “Mayer-Blackwell K” [Au] OR “Kieswetter N” [Au] OR “Allen R” [Au] OR “Himmelfarb T” [Au] OR “Bishop E” [Au] OR “Kgoadi K” [Au] OR “Zhu Z” [Au] OR “Harband M” [Au] OR “Harne R” [Au] OR “Akins S” [Au] OR “Larsen Akins S” [Au] OR “Reese V” [Au] OR “Pepple K” [Au] OR “Chendi B” [Au] OR “Subramanian N” [Au] OR “Bhuiyan M” [Au] OR “Kaushansky A” [Au] OR “Anterasian C” [Au] OR “Maciag K” [Au] OR “Fleming L” [Au] OR “Huynh T H” [Au] OR “Harrington W” [Au] OR “Harding C” [Au] OR “Kim H” [Au] OR “Ho K” [Au] OR “Klas J” [Au] OR “Gasper M" [Au] OR “Simmons J” [Au] OR “Fernandez M” [Au] OR “Krug S” [Au] OR “Le C” [Au] OR “Weigel K” [Au] OR “Gern B” [Au] OR “Cross L” [Au] OR “Urdahl K” [Au] OR “Plumlee C” [Au] OR “Cohen S” [Au] OR “Chen S” [Au] OR “Baldwin S” [Au] OR “Gerner M” [Au] OR “Shah J” [Au] OR “Reynolds A” [Au] OR “Winter C” [Au] OR “Shamskhou E” [Au] OR “ Foster K” [Au] OR “Maerz M” [Au] OR “Villagrana P” [Au] OR “Tappen V” [Au] OR “Files M” [Au] OR “Makatsa M” [Au] OR “Hawn T” [Au] OR “Koelle D” [Au] OR “ Ramirez E” [Au] OR “Layton E” [Au] OR “Phan J” [Au] OR  “Yu K” [Au] OR “ Richardson S” [Au]'
authors_micro = '“Nag D” [Au] OR “Kearney J” [Au] OR “Johnson A” [Au] OR “Rais M” [Au] OR “Abdelaal H” [Au] OR “Coler R” [Au] OR “Jaspan H” [Au] OR “Kumar A” [Au] OR “Pecor T” [Au] OR “Armistead B” [Au] OR “Miller D” [Au] OR “Allen R” [Au] OR “Kgoadi K” [Au] OR “Zhu Z” [Au] OR “Subramanian N” [Au] OR “Bhuiyan M” [Au] OR “Kaushansky A” [Au] OR “Fleming L” [Au] OR “Huynh T H” [Au] OR “Harrington W” [Au] OR “Harding C” [Au] OR “Kim H” [Au] OR “Klas J” [Au] OR “ Gasper M” [Au] OR “Simmons J” [Au] OR “Fernandez M” [Au] OR “Krug S” [Au] OR “Le C” [Au] OR “Weigel K” [Au] Or “Cangelosi G” [Au] OR “Sanders L” [Au] OR “Buckner F” [Au] OR “Panpradist N” [Au] OR “Haynes A” [Au] OR “Chohan B” [Au] OR “King`ori B” [Au] OR “Ma S” [Au] OR “Pepper E” [Au] OR “Mixon T” [Au] OR “Olson A” [Au] OR “Galina L” [Au] OR “Boradia V” [Au] OR “Parish T” [Au] OR “Chowdhury S” [Au] OR “Peterson E” [Au] OR “Sherman D” [Au] OR “Hernandez R” [Au] OR “Tieu E” [Au] OR “Deshpande A” [Au] OR “Bhagwat A” [Au] OR “Frando A” [Au] OR “Rodrigues da Costa F” [Au] OR “Eydinova A” [Au] OR “Butts A” [Au] OR “Berube B” [Au] OR “Ames L” [Au] OR “Coldren M” [Au] OR “Lial R” [Au] OR “Jones M” [Au] OR “Grundner C” [Au] OR “Carroll B” [Au] OR “Lamot E” [Au] OR “Fredricks L” [Au] OR “Eldesouky H” [Au] OR “Brache J” [Au] OR “Adams K” [Au] OR “Pruneda A” [Au] OR “Nilles E” [Au] OR “Wier J” [Au] OR “Thong Q” [Au]'
authors_modeling = '“Nag D” [Au] OR “Kearney J” [Au] OR “Abdelaal H” [Au] OR “Duffy F” [Au] OR “Pepple K” [Au] OR “Motiri F” [Au] OR “LeGrand K” [Au] OR “Cherkos A” [Au] OR “Fajans M” [Au] OR “Tram K” [Au] OR “Basting A” [Au] OR “Sharma M” [Au] OR “Henry N” [Au] OR “Ross J” [Au]'
authors_pubhealth = '“Nag D” [Au] OR “Kearney J” [Au] OR “Motiri F” [Au] OR “LeGrand K” [Au] OR “Cherkos A” [Au] OR “Fajans M” [Au] OR “Tram K” [Au] OR “Basting A” [Au] OR “Johnson A” [Au] OR “Rais M” [Au] OR “Armistead B” [Au] OR “Sanders L” [Au] OR “Panpradist N” [Au] OR “ King`ori B” [Au] OR “Ndoti A” [Au] OR “Trehan I” [Au] OR “Attia E” [Au] OR “Horne D” [Au] OR “Okwaro E” [Au] OR “Black D” [Au] OR “Tapley A” [Au] OR “Stein G” [Au] OR “Brumwell A” [Au] OR “Hill D” [Au] OR “Sorri Y” [Au] OR “Bender Ignacio R” [Au] OR “Wald A” [Au] OR “Magomere R” [Au] OR “Kerani R” [Au] OR “Ranade D” [Au] OR “Segura P” [Au] OR “Budd K” [Au] OR “Ghassemieh B” [Au] OR “Wood R” [Au]'
authors_sysbio = '“Nag D” [Au] OR “Abdelaal H” [Au] OR “Duffy F” [Au] OR “Coler R” [Au] OR “Kgoadi K” [Au] OR “Subramanian N” [Au] OR “Bhuiyan M” [Au] OR “Kaushansky A” [Au] OR “Ma S” [Au] OR “Pepper E” [Au] OR “Mortiz R” [Au] OR “Peterson E” [Au] OR “Sherman D” [Au] OR “Duncombe C” [Au] OR “Fiore-Gartland A” [Au] OR “Anterasian C” [Au] OR “Maciag K” [Au] OR “Ong S” [Au]'
authors_other = '"Buckner FS"[AU] OR "Buckner Fred*"[AU] OR “Mortiz R” [Au] OR “Trehan I” [Au] OR “Brumwell A” [Au] OR “Ranade D” [Au] OR “Pepple K” [Au] OR “Sharma M” [Au] OR “Cangelosi G” [Au] OR “Galina L” [Au] OR “Boradia V” [Au] OR “Parish T” [Au] OR “Chowdhury S” [Au] OR “Chendi B” [Au] OR “Segnitz M” [Au] OR “Andriesen J” [Au]'

Tidy up more

In [112]:
# Combine into one long string, then split into a list of names
authors = ' OR '.join([authors_clinic, authors_compbio, authors_epi, authors_globalhealth,
            authors_immun, authors_micro, authors_modeling, authors_pubhealth,
            authors_sysbio, authors_other]).\
                replace(' or ', ' OR ').\
                replace(' Or ', ' OR ').\
                replace('AU', 'au').\
                replace('Au', 'au').\
                replace('\" [au] \"', '[au] OR ').\
                split('[au] OR ')

# Remove whitespace
author_names = [s.strip() for s in authors]  # leading and trailing

# Handle special (last) case
author_names2 = [s.replace('“Andriesen J” [au]', '“Andriesen J”') for s in author_names] 

# Remove duplicates
author_names3 = list(set(author_names2))

print(len(author_names3))
author_names3

175


['“Urdahl K”',
 '“Boradia V”',
 '“Frando A”',
 '“Berube B”',
 '“Cherkos A”',
 '“Williams B”',
 '“Segnitz M”',
 '“Seshadri C”',
 '“Lamot E”',
 '“Himmelfarb T”',
 '“Fajans M”',
 '“Gerner M”',
 '“Zhu Z”',
 '“Akins S”',
 '“Tieu E”',
 '“Nag D”',
 '“Wier J”',
 '“Wood R”',
 '“Pecor T”',
 '“Bhagwat A”',
 '“Krug S”',
 '“Simmons J”',
 '“Ames L”',
 '“West E”',
 '“Files M”',
 '“Chohan B”',
 '“Mayer-Blackwell, K”',
 '“Rais M”',
 '“Nilles E”',
 '“Hawn T”',
 '“Grundner C”',
 '“Duffy F”',
 '“Kumar A”',
 '“Abdelaal H”',
 '“Kearney J”',
 '“Cross L”',
 '“Barrett H”',
 '“Kieswetter N”',
 '“Tappen V”',
 '“LeGrand K”',
 '“Harband M”',
 '“Layton E”',
 '“Harrington W”',
 '“Magomere R”',
 '“Budd K”',
 '“Carroll B”',
 '“Church E”',
 '“Okwaro E”',
 '“Gottlieb G”',
 '“Graham S”',
 '“Brumwell A”',
 '“Larsen Akins S”',
 '“Ranade D”',
 '“Maerz M”',
 '“Rodrigues da Costa F”',
 '“Reynolds A”',
 '“Fiore-Gartland  A”',
 '“Shapiro A”',
 '“Haynes A”',
 '“King`ori B”',
 '“Coler R”',
 '“Bishop E”',
 '“Weigel K”',
 '“Yu K”',

Split into two big strings. 99+ and it the URL request doesn't return anything.

In [113]:
authors_str1 = '(' + '[au] OR '.join(author_names3[:88]) + ')'
authors_str2 = '(' + '[au] OR '.join(author_names3[88:]) + ')'

authors_str1

'(“Urdahl K”[au] OR “Boradia V”[au] OR “Frando A”[au] OR “Berube B”[au] OR “Cherkos A”[au] OR “Williams B”[au] OR “Segnitz M”[au] OR “Seshadri C”[au] OR “Lamot E”[au] OR “Himmelfarb T”[au] OR “Fajans M”[au] OR “Gerner M”[au] OR “Zhu Z”[au] OR “Akins S”[au] OR “Tieu E”[au] OR “Nag D”[au] OR “Wier J”[au] OR “Wood R”[au] OR “Pecor T”[au] OR “Bhagwat A”[au] OR “Krug S”[au] OR “Simmons J”[au] OR “Ames L”[au] OR “West E”[au] OR “Files M”[au] OR “Chohan B”[au] OR “Mayer-Blackwell, K”[au] OR “Rais M”[au] OR “Nilles E”[au] OR “Hawn T”[au] OR “Grundner C”[au] OR “Duffy F”[au] OR “Kumar A”[au] OR “Abdelaal H”[au] OR “Kearney J”[au] OR “Cross L”[au] OR “Barrett H”[au] OR “Kieswetter N”[au] OR “Tappen V”[au] OR “LeGrand K”[au] OR “Harband M”[au] OR “Layton E”[au] OR “Harrington W”[au] OR “Magomere R”[au] OR “Budd K”[au] OR “Carroll B”[au] OR “Church E”[au] OR “Okwaro E”[au] OR “Gottlieb G”[au] OR “Graham S”[au] OR “Brumwell A”[au] OR “Larsen Akins S”[au] OR “Ranade D”[au] OR “Maerz M”[au] OR “Rodri

# 2) Build the URLs

When combining all the terms the URL is too large, so will need to break them up and then look for intersecting pubmed IDs.

Return up to 1 million records (retmax=1000000). All papers published by UW in a 3 month span is < 5000.

In [114]:
# Use my NCBI account API key
api_key = 'e8c1951d5b35792885126d8a9597398a1a07'

prefix = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=' + pubdate + '+AND+'
suffix = '&retmax=1000000&usehistory=y'

In [115]:
affil_url = prefix + affil_str + suffix
concept_url = prefix + concept_str + suffix
author1_url = prefix + authors_str1 + suffix
author2_url = prefix + authors_str2 + suffix

# 3) Submit the searches

In [116]:
def get_pubmed_ids(url):
    page = requests.get(url).text
    result = BeautifulSoup(page, 'xml')
    IDlist = [i.text for i in result.find_all('Id')]
    print(len(IDlist))
    return IDlist

In [117]:
affil_ids = get_pubmed_ids(affil_url)
concpet_ids = get_pubmed_ids(concept_url)
author1_ids = get_pubmed_ids(author1_url)
author2_ids = get_pubmed_ids(author2_url)

# Combine author results
author_ids = list(set(author1_ids + author2_ids))
print(len(author_ids))

0


2876
2163
4075
6178


# 4) Overlap the filters

Which pubmed IDs appear in all the lists?

In [118]:
filtered = set.intersection(set(affil_ids),
                            set(concpet_ids),
                            set(author_ids))
len(filtered)

0

In [119]:
out = ';'.join(list(filtered))
print(out)


