**Step 1:** Get text from [PubMed](https://www.ncbi.nlm.nih.gov/research/coronavirus)
You can use the requests library to do this.
Outputting all the javascript, CSS, and text may overload the space available to load this notebook, so we omit a print statement here.


In [9]:
# import statements
import requests
import pandas as pd

In [10]:
# fetch web page
r = requests.get("https://www.ncbi.nlm.nih.gov/research/coronavirus-api/search/?filters=%7B%7D&sort=score%20desc")

**Step 2:** Use requests to get json of the API

In [11]:
articles = r.json()['results']
print(articles[0])

{'pmid': 32335184, 'title': 'Coronavirus Disease 2019 (COVID-19) and Radiology Education-Strategies for Survival.', 'journal': 'J Am Coll Radiol', 'authors': ['Slanetz, Priscilla J', 'Parikh, Ujas', 'Chapman, Teresa', 'Moutzas, Cari'], 'date': '2020-04-27T11:00:00Z', '_id': '32335184', 'topics': ['Prevention'], 'text_hl': None}


**Step 3:** Find all course summaries

In [12]:
# Extract article PMID
articles[0].get('pmid')

32335184

In [13]:
# Extract article title
articles[0].get('title')

'Coronavirus Disease 2019 (COVID-19) and Radiology Education-Strategies for Survival.'

In [14]:
# Extract article journal
articles[0].get('journal')

'J Am Coll Radiol'

In [15]:
# Extract article authors
articles[0].get('authors')

['Slanetz, Priscilla J', 'Parikh, Ujas', 'Chapman, Teresa', 'Moutzas, Cari']

In [16]:
# Extract published date
articles[0].get('date')

'2020-04-27T11:00:00Z'

In [17]:
# Extract topics
articles[0].get('topics')

['Prevention']

In [18]:
# Extract text Highlights
articles[0].get('text_hl')

## Create dataset from All Articles

In [19]:
# Create data list
data = list()
for article in articles:
    data.append(
        [article.get('pmid'),
         article.get('title'),
         article.get('journal'),
         article.get('authors'),
         article.get('date'),
         article.get('topics'),
         article.get('text_hl')
    ])

In [20]:
# Create pandas dataframe
df = pd.DataFrame(data, columns = ['PMID', 'Title', 'Joirnal', 'Authors', 'PublishedDate', 'Topics', 'HighLights'])
df.head()

Unnamed: 0,PMID,Title,Joirnal,Authors,PublishedDate,Topics,HighLights
0,32335184,Coronavirus Disease 2019 (COVID-19) and Radiol...,J Am Coll Radiol,"[Slanetz, Priscilla J, Parikh, Ujas, Chapman, ...",2020-04-27T11:00:00Z,[Prevention],
1,32335585,The Coronavirus Pandemic: What Does the Eviden...,J Nepal Health Res Counc,"[Paudel, Shishir, Dangal, Ganesh, Chalise, Ani...",2020-04-27T11:00:00Z,"[Diagnosis, Mechanism, Treatment, Prevention]",
2,32335168,Dynamics of anti-SARS-Cov-2 IgM and IgG antibo...,J Infect,"[Lee, Yu-Lin, Liao, Chia-Hung, Liu, Po-Yu, Che...",2020-04-27T11:00:00Z,"[Treatment, Diagnosis]",
3,32335416,Coronavirus Disease 2019 (COVID-19) and dermat...,Ecotoxicol Environ Saf,"[Emadi, Seyed-Naser, Abtahi-Naeini, Bahareh]",2020-04-27T11:00:00Z,[Prevention],
4,32335233,Personal protective equipment in the response ...,Int J Surg,"[Ali, Yousif, Alradhawi, Mohammad, Shubber, No...",2020-04-27T11:00:00Z,[Prevention],


In [21]:
# Export CSV
df.to_csv('data/pubmed_articles.csv')