**Step 1:** Get text from [PubMed](https://www.ncbi.nlm.nih.gov/research/coronavirus)
You can use the requests library to do this.
Outputting all the javascript, CSS, and text may overload the space available to load this notebook, so we omit a print statement here.


In [1]:
# import statements
import requests
import pandas as pd

In [2]:
# fetch web page
r = requests.get("https://www.ncbi.nlm.nih.gov/research/coronavirus-api/search/?filters=%7B%7D&sort=score%20desc")

**Step 2:** Use requests to get json of the API

In [12]:
print(r.json().keys())
articles = r.json()['results']
print(articles[0])

dict_keys(['results', 'facets', 'page_size', 'current', 'count', 'total_pages'])
{'pmid': 32338254, 'title': 'Olfactory and rhinological evaluations in SARS-CoV-2 patients complaining of olfactory loss.', 'journal': 'Rhinology', 'authors': ['Ottaviano, G', 'Carecchio, M', 'Scarpa, B', 'Marchese-Ragona, R'], 'date': '2020-04-28T11:00:00Z', '_id': '32338254', 'countries': ['China'], 'topics': ['Diagnosis'], 'text_hl': None}


**Step 3:** Find all course summaries

In [12]:
# Extract article PMID
articles[0].get('pmid')

32335184

In [13]:
# Extract article title
articles[0].get('title')

'Coronavirus Disease 2019 (COVID-19) and Radiology Education-Strategies for Survival.'

In [14]:
# Extract article journal
articles[0].get('journal')

'J Am Coll Radiol'

In [15]:
# Extract article authors
articles[0].get('authors')

['Slanetz, Priscilla J', 'Parikh, Ujas', 'Chapman, Teresa', 'Moutzas, Cari']

In [16]:
# Extract published date
articles[0].get('date').split("T")[0]

'2020-04-27T11:00:00Z'

In [17]:
# Extract topics
articles[0].get('topics')

['Prevention']

In [18]:
# Extract text Highlights
articles[0].get('text_hl')

## Create dataset from All Articles

In [25]:
# Get Details (abstract and doi) from Detail page (link)
def get_details(docid: int):
    print("getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/" + str(docid))
    abstract = ""
    try:
        d = requests.get("https://www.ncbi.nlm.nih.gov/research/coronavirus-api/publication/" + str(docid), timeout=10)
    except:
        return "N/A"
    return d.json().get('text')[1], d.json().get('doi')

In [26]:
# Create data list
data = list()
for article in articles:
    details, doi = get_abstract(article.get('pmid'))
    data.append(
        [article.get('pmid'),
         doi,
         article.get('title'),
         article.get('journal'),
         article.get('authors'),
         details,
         article.get('date'),
         article.get('topics'),
         article.get('text_hl')
    ])

getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32338254
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32337113
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32336594
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32336833
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32336243
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32339089
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32337139
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32336725
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32336398
getting abstract from https://www.ncbi.nlm.nih.gov/research/coronavirus/publication/32337584


In [28]:
# Create pandas dataframe
df = pd.DataFrame(data, columns = ['PMID', 'DOI', 'Title', 'Journal', 'Authors', 'Abstract', 'PublishedDate', 'Topics', 'HighLights'])
df.head()

Unnamed: 0,PMID,DOI,Title,Journal,Authors,Abstract,PublishedDate,Topics,HighLights
0,32338254,10.4193/Rhin20.136,Olfactory and rhinological evaluations in SARS...,Rhinology,"[Ottaviano, G, Carecchio, M, Scarpa, B, Marche...","Since December 2019, a novel coronavirus SARS-...",2020-04-28T11:00:00Z,[Diagnosis],
1,32337113,10.7759/cureus.7386,Brief Review on COVID-19: The 2020 Pandemic Ca...,Cureus,"[Valencia, Damian N]",Severe acute respiratory syndrome coronavirus ...,2020-04-28T11:00:00Z,"[Diagnosis, Treatment, Transmission, Mechanism]",
2,32336594,10.1016/j.resinv.2020.03.006,COVID-19 outbreak: An elusive enemy.,Respir Investig,"[Kikuchi, Toshiaki]","A novel coronavirus, officially termed as seve...",2020-04-28T11:00:00Z,[General Info],
3,32336833,10.1590/0100-3984.2020.53.2e1,Information about the new coronavirus disease ...,Radiol Bras,"[Lima, Claudio Marcio Amaral de Oliveira]",Radiol Bras,2020-04-28T11:00:00Z,[Diagnosis],
4,32336243,10.1177/1089253220921590,Anesthesia and COVID-19: What We Should Know a...,Semin Cardiothorac Vasc Anesth,"[Tang, Linda Y, Wang, Jingping]","Coronavirus disease 2019 (COVID-19), caused by...",2020-04-28T11:00:00Z,"[Diagnosis, Prevention, Treatment]",


In [29]:
# Export CSV
df.to_csv('data/pubmed_articles.csv')