**Step 1:** Get articles from [Elsevier](https://www.sciencedirect.com/search?qs=%22COVID-19%22%20OR%20Coronavirus%20OR%20%22Corona%20virus%22%20OR%20Coronaviruses%20OR%20%222019-nCoV%22%20OR%20%22SARS-CoV%22%20OR%20%22MERS-CoV%22%20OR%20%E2%80%9CSevere%20Acute%20Respiratory%20Syndrome%E2%80%9D%20OR%20%E2%80%9CMiddle%20East%20Respiratory%20Syndrome%E2%80%9D&show=100&ent=true).
Elsevier provides an [API](https://dev.elsevier.com/apikey/manage) to access it's sciencedirect articles.
We can use the requests library to do this.

In [45]:
# import statements
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [5]:
# Fetch web page
r = requests.get("https://api.elsevier.com/content/search/sciencedirect?query=covid&apiKey=7f59af901d2d86f78a1fd60c1bf9426a")

**Step 2:** Use requests to get json of the API

In [24]:
articles = r.json().get('search-results').get('entry')
print(articles[0].keys())

dict_keys(['@_fa', 'load-date', 'link', 'dc:identifier', 'prism:url', 'dc:title', 'dc:creator', 'prism:publicationName', 'prism:coverDate', 'prism:doi', 'openaccess', 'pii', 'authors'])


**Step 3:** Find all course summaries

In [28]:
# Extract Article ID
articles[0].get('dc:identifier')

'DOI:10.1016/j.neurol.2020.04.004'

In [29]:
# Extract Published Date
articles[0].get('load-date').split("T")[0]

'2020-04-20T00:00:00.000Z'

In [30]:
# Extract Title
articles[0].get('dc:title')

'Guidance for the care of neuromuscular patients during the COVID-19 pandemic outbreak from the French Rare Health Care for Neuromuscular Diseases Network'

In [33]:
# Extract Pub Name
articles[0].get('prism:publicationName')

'Revue Neurologique'

In [39]:
# Extract Creator
articles[0].get('dc:creator')

'G. Solé'

In [38]:
# Extract Authors
articles[0].get('authors')

{'author': [{'$': 'G. Solé'},
  {'$': 'E. Salort-Campana'},
  {'$': 'Y. Pereon'},
  {'$': 'T. Stojkovic'},
  {'$': 'FILNEMUS COVID-19 study group'}]}

In [34]:
# Extract Link
articles[0].get('prism:url')

'https://api.elsevier.com/content/article/pii/S0035378720305233'

In [35]:
# Extract DOI
articles[0].get('prism:doi')

'10.1016/j.neurol.2020.04.004'

In [36]:
# Extract openaccess
articles[0].get('openaccess')

False

In [37]:
# Extract PII
articles[0].get('pii')

'S0035378720305233'

## Create dataset from All Articles

In [94]:
def get_abstract(url: str):
    print("getting abstract from " + url)
    d = requests.get(str(url) + "?apiKey=7f59af901d2d86f78a1fd60c1bf9426a",
                     headers={'Accept': 'application/json'},
                     timeout=10)
    return d.json()["full-text-retrieval-response"]["coredata"]["dc:description"]

In [95]:
# Create data list
data = list()
for article in articles:
    data.append(
        [article.get('dc:identifier'),
         article.get('load-date'),
         article.get('dc:title'),
         get_abstract(article.get('prism:url')),
         article.get('prism:publicationName'),
         article.get('dc:creator'),
         article.get('authors'),
         article.get('prism:url'),
         article.get('prism:doi'),
         'Open' if bool(article.get('openaccess')) else 'Private',
         article.get('pii')
    ])

getting abstract from https://api.elsevier.com/content/article/pii/S0035378720305233
getting abstract from https://api.elsevier.com/content/article/pii/S0003497520305877
getting abstract from https://api.elsevier.com/content/article/pii/S0049384820301407
getting abstract from https://api.elsevier.com/content/article/pii/S2468024920311700
getting abstract from https://api.elsevier.com/content/article/pii/S0003426620300627
getting abstract from https://api.elsevier.com/content/article/pii/S1525861020303479
getting abstract from https://api.elsevier.com/content/article/pii/S001502822030385X
getting abstract from https://api.elsevier.com/content/article/pii/S0022522320310114
getting abstract from https://api.elsevier.com/content/article/pii/S0040595720300688
getting abstract from https://api.elsevier.com/content/article/pii/S1201971220302770
getting abstract from https://api.elsevier.com/content/article/pii/S1386653220301219
getting abstract from https://api.elsevier.com/content/article/pi

In [96]:
# Create pandas dataframe
df = pd.DataFrame(data, columns = ['ID', 'PublishedDate', 'Title', 'Abstract', 'PublicationName', 'Creator', 'Authors', 'Link', 'DOI', 'Availability', 'PII'])
df.head()

Unnamed: 0,ID,PublishedDate,Title,Abstract,PublicationName,Creator,Authors,Link,DOI,Availability,PII
0,DOI:10.1016/j.neurol.2020.04.004,2020-04-20T00:00:00.000Z,Guidance for the care of neuromuscular patient...,\n Abstract\n \n ...,Revue Neurologique,G. Solé,"{'author': [{'$': 'G. Solé'}, {'$': 'E. Salort...",https://api.elsevier.com/content/article/pii/S...,10.1016/j.neurol.2020.04.004,Private,S0035378720305233
1,DOI:10.1016/j.athoracsur.2020.04.007,2020-04-27T00:00:00.000Z,Adult Cardiac Surgery and the COVID-19 Pandemi...,\n Abstract\n \n ...,The Annals of Thoracic Surgery,Daniel T. Engelman,"{'author': [{'$': 'Daniel T. Engelman'}, {'$':...",https://api.elsevier.com/content/article/pii/S...,10.1016/j.athoracsur.2020.04.007,Private,S0003497520305877
2,DOI:10.1016/j.thromres.2020.04.024,2020-04-23T00:00:00.000Z,Venous and arterial thromboembolic complicatio...,\n Abstract\n \n ...,Thrombosis Research,Corrado Lodigiani,"{'author': [{'$': 'Corrado Lodigiani'}, {'$': ...",https://api.elsevier.com/content/article/pii/S...,10.1016/j.thromres.2020.04.024,Private,S0049384820301407
3,DOI:10.1016/j.ekir.2020.04.001,2020-04-04T00:00:00.000Z,Management of Patients on Dialysis and With Ki...,The severe acute respiratory syndrome coronavi...,Kidney International Reports,Federico Alberici,"{'author': [{'$': 'Federico Alberici'}, {'$': ...",https://api.elsevier.com/content/article/pii/S...,10.1016/j.ekir.2020.04.001,Open,S2468024920311700
4,DOI:10.1016/j.ando.2020.04.005,2020-04-21T00:00:00.000Z,Renin-angiotensin-aldosterone system and COVID...,\n Abstract\n \n ...,Annales d'Endocrinologie,J Alexandre,"{'author': [{'$': 'J Alexandre'}, {'$': 'JL Cr...",https://api.elsevier.com/content/article/pii/S...,10.1016/j.ando.2020.04.005,Private,S0003426620300627


In [97]:
# Export CSV
df.to_csv('data/elsevier_articles.csv')