**Step 1:** Get articles from [ACP](https://www.acpjournals.org/topic/category/coronavirus).
We can use the requests library to do this.

In [17]:
# import statements
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [18]:
# fetch web page
r = requests.get("https://www.acpjournals.org/topic/category/coronavirus")

**Step 2:** Use BeautifulSoup to remove HTML tags.
Use "lxml" rather than "html5lib".
Outputting all the results may overload the space available to load this notebook, so we omit a print statement here.


In [19]:
soup = BeautifulSoup(r.text, "lxml")

**Step 3:** Find all course summaries
Use the BeautifulSoup's find_all method to select based on tag type and class name. On Chrome, you can right click on the item, and click "Inspect" to view its html on a web page.

In [20]:
# Find all articles
articles = soup.findAll("li", {"class":"search__item"})

In [21]:
# Extract Type
articles[1].select_one(".meta__heading").get_text()

'Web Exclusives'

In [22]:
# Extract article title
articles[1].select_one(".issue-item__title").select_one("a").get_text()

'Annals Graphic Medicine - What I Learned From COVID-19 (Until Now)'

In [23]:
# Extract article Availability
articles[1].select_one(".issue-item__title").select_one("span").get_text()

'FREE'

In [24]:
# Extract article Pub Date
articles[1].select_one(".meta__epubDate").get_text()

'12 May 2020'

In [25]:
# Extract article DOI
articles[1].select_one(".issue-item__title").select_one("a")["href"].replace("/doi/", "")

'10.7326/G20-0045'

In [26]:
# Extract article Authors
articles[1].select_one(".issue-item__authors").get_text()

'Lucia Briatore, MD, PhD, Ilaria Pozzi'

In [42]:
# Extract article Link
link = articles[4].select_one(".issue-item__title").select_one("a")["href"]
link

'/doi/10.7326/L20-0354'

## Create dataset from All Articles

In [31]:
# Create data list
    data = list()
    for article in articles:
        data.append(
            [article.select_one(".issue-item__title").select_one("a").get_text(),
             article.select_one(".issue-item__title").select_one("a")["href"].replace("/doi/", ""),
             article.select_one(".meta__heading").get_text(),
             article.select_one(".issue-item__title").select_one("span").get_text(),
             article.select_one(".meta__epubDate").get_text(),
             article.select_one(".issue-item__authors").get_text(),
             "",
             article.select_one(".issue-item__title").select_one("a")["href"]

        ])

In [32]:
# Create pandas dataframe
df = pd.DataFrame(data, columns = ['Title', 'DOI', 'Type', 'Availability', 'PublishedDate', 'Authors', 'Abstract', 'Link'])
df.head()

Unnamed: 0,Title,DOI,Type,Availability,PublishedDate,Authors,Abstract,Link
0,Annals On Call - Clinical Reasoning and Testin...,10.7326/A19-0031,Web Exclusives,FREE,12 May 2020,"Robert M. Centor, MD, Rabih Geha, MD, Reza Man...",,/doi/10.7326/A19-0031
1,Annals Graphic Medicine - What I Learned From ...,10.7326/G20-0045,Web Exclusives,FREE,12 May 2020,"Lucia Briatore, MD, PhD, Ilaria Pozzi",,/doi/10.7326/G20-0045
2,Cytokine Levels in the Body Fluids of a Patien...,10.7326/L20-0354,Letters,FREE,12 May 2020,"Changsong Wang, PhD, Kai Kang, MD, Yan Gao, Ph...",,/doi/10.7326/L20-0354
3,Pharmacokinetics of Lopinavir and Ritonavir in...,10.7326/M20-1550,Letters,FREE,12 May 2020,"Christian Schoergenhofer, MD, PhD, Bernd Jilma...",,/doi/10.7326/M20-1550
4,Annals Consult Guys - Hydroxychloroquine: Upda...,10.7326/W19-0038,Web Exclusives,FREE,12 May 2020,"Geno J. Merli, MD, Howard H. Weitz, MD",,/doi/10.7326/W19-0038


In [57]:
# Export CSV
df.to_csv('data/emea_articles.csv')

In [59]:
# Get Abstract from link
def get_abstract(link):
    d = requests.get("https://www.acpjournals.org" + link)
    content_soup = BeautifulSoup(d.text, "lxml")
    abst = content_soup.find("div", {"class":"hlFld-Fulltext"}) if content_soup.find("div", {"class":"hlFld-Fulltext"}) else None
    parr = abst.find_all("p")[:5]
    for par in parr[:5]:
        print(par.get_text())
     

In [60]:
get_abstract(link)

Background: Some patients with coronavirus disease 2019 (COVID-19) progress rapidly to acute respiratory distress syndrome (ARDS), septic shock, and multiple organ failure (1). Some experts attribute this sequence of events to a large increase in cytokines (cytokine storm) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or a secondary infection by another organism.
Objective: To report cytokine levels in multiple body fluids from a patient with COVID-19 and ARDS, septic shock, and multiple organ failure.
Case Report: On 20 January 2020, a 66-year-old man who had been exposed to a patient with COVID-19 developed cough and fever and treated himself at home. On 2 February, his cough and fever gradually worsened, his body temperature reached 38.7 °C, and he developed diarrhea and vomiting. He was treated at a local hospital, where his medical history included vitiligo, gastric ulcer, coronary heart disease, and chronic obstructive pulmonary disease. He developed 