**Step 1:** Get articles from [EMA](https://www.ema.europa.eu/en/).
We can use the requests library to do this.

In [35]:
# import statements
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [36]:
# fetch web page
r = requests.get("https://www.ema.europa.eu/en/human-regulatory/overview/public-health-threats/coronavirus-disease-covid-19/covid-19-whats-new")

**Step 2:** Use BeautifulSoup to remove HTML tags.
Use "lxml" rather than "html5lib".
Outputting all the results may overload the space available to load this notebook, so we omit a print statement here.


In [37]:
soup = BeautifulSoup(r.text, "lxml")

**Step 3:** Find all course summaries
Use the BeautifulSoup's find_all method to select based on tag type and class name. On Chrome, you can right click on the item, and click "Inspect" to view its html on a web page.

In [49]:
# Find all articles
table = soup.find("table", {"class":"ecl-table"}).select_one("tbody")
articles=list()
for row in table.findAll("tr"):
   articles.append(row)

In [50]:
# Extract date
articles[1].find_all("td")[0].get_text().strip()

'24 April'

In [51]:
# Extract article title
articles[1].find_all("td")[1].get_text().strip()

'Availability of medicines during COVID-19 pandemic'

In [52]:
# Extract citation
articles[1].find_all("td")[2].get_text().strip()

'EMA provided an update on the measures EU authorities are putting in place to support the continued availability of medicines during the pandemic, following a meeting of the EU Executive Steering Group on Shortages of Medicines Caused by Major Events.'

In [53]:
# Extract link
articles[1].find_all("td")[1].select_one("a")['href']

'https://www.ema.europa.eu/en/human-regulatory/overview/public-health-threats/coronavirus-disease-covid-19/availability-medicines-during-covid-19-pandemic'

In [54]:
# More Information Link
articles[1].find_all("td")[3].get_text()

'EU actions to support availability of medicines during COVID-19 pandemic – update #3\xa0(24/04/2020)'

## Create dataset from All Articles

In [55]:
# Create data list
data = list()
for article in articles:
    data.append(
        [article.find_all("td")[1].get_text().strip(),
         article.find_all("td")[1].select_one("a")['href'],
         article.find_all("td")[2].get_text().strip(),
         article.find_all("td")[0].get_text().strip(),
         article.find_all("td")[3].get_text()
    ])

In [56]:
# Create pandas dataframe
df = pd.DataFrame(data, columns = ['Title', 'Link', 'Abstract', 'PublishedDate', 'MoreInfoLink'])
df.head()

Unnamed: 0,Title,Link,Abstract,PublishedDate,MoreInfoLink
0,International Coalition of Medicines Regulator...,https://www.ema.europa.eu/en/partners-networks...,EMA endorsed a joint statement by the members ...,28 April,International regulators pledge collective sup...
1,Availability of medicines during COVID-19 pand...,https://www.ema.europa.eu/en/human-regulatory/...,EMA provided an update on the measures EU auth...,24 April,EU actions to support availability of medicine...
2,Public-health advice during COVID-19 pandemic,https://www.ema.europa.eu/en/human-regulatory/...,EMA and the national competent authorities rem...,24 April,Reporting suspected side effects of medicines ...
3,EMA’s governance during COVID-19 pandemic,https://www.ema.europa.eu/en/human-regulatory/...,EMA announced that essential work to combat th...,23 April,Essential work to combat the COVID-19 pandemic...
4,Public-health advice during COVID-19 pandemic,https://www.ema.europa.eu/en/human-regulatory/...,EMA reminded patients and healthcare professio...,23 April,COVID-19: reminder of risk of serious side eff...


In [57]:
# Export CSV
df.to_csv('data/emea_articles.csv')