<p style="padding: 10px;
          color:#FFA500;
          font-weight: bold;
          text-align: center;
          background-color:#008000;
          font-size:260%;">
Extract data from Pubmed using Python
     </p>

![Image%202-3-22%20at%2010.27%20AM.jpg](attachment:Image%202-3-22%20at%2010.27%20AM.jpg)

The U.S. government’s PubMed website[https://pubmed.ncbi.nlm.nih.gov] is a treasure of biomedical information.<br>  Articles can be searched from the website using keywords as below.

![Image%202-3-22%20at%2010.36%20AM.jpg](attachment:Image%202-3-22%20at%2010.36%20AM.jpg)

<font size="5">This notebook uses Metapub library to extract data from pubmed.</font>

![image.png](attachment:image.png)

<p style="padding: 10px;
          color:#FFA500;
          font-weight: bold;
          text-align: center;
          background-color:#008000;
          font-size:260%;">
Install METAPUB python library
     </p>

In [1]:
pip install metapub

<p style="padding: 10px;
          color:#FFA500;
          font-weight: bold;
          text-align: center;
          background-color:#008000;
          font-size:260%;">
Importing Libaries
     </p>

In [1]:
import pandas as pd

<p style="padding: 10px;
          color:#FFA500;
          font-weight: bold;
          text-align: center;
          background-color:#008000;
          font-size:150%;">
Initialise keyword to be searched and number of search results
     </p>

In [1]:
#initialise the keyword to be searched and number of articles to be retrieved

keyword="sepsis"
num_of_articles=3


<p style="padding: 10px;
          color:#FFA500;
          font-weight: bold;
          text-align: center;
          background-color:#008000;
          font-size:150%;">
Fetch PMID which is unique ID for each article
     </p>

In [1]:
from metapub import PubMedFetcher
fetch = PubMedFetcher()

# get the  PMID for first 3 articles with keyword sepsis
pmids = fetch.pmids_for_query(keyword, retmax=num_of_articles)

# get  articles
articles = {}
for pmid in pmids:
    articles[pmid] = fetch.article_by_pmid(pmid)

In [1]:
# get title for each article:
titles = {}
for pmid in pmids:
    titles[pmid] = fetch.article_by_pmid(pmid).title
Title = pd.DataFrame(list(titles.items()),columns = ['pmid','Title'])
Title

In [1]:
# get abstract for each article:
abstracts = {}
for pmid in pmids:
    abstracts[pmid] = fetch.article_by_pmid(pmid).abstract
Abstract = pd.DataFrame(list(abstracts.items()),columns = ['pmid','Abstract'])
Abstract

In [1]:
# get author for each article:
authors = {}
for pmid in pmids:
    authors[pmid] = fetch.article_by_pmid(pmid).authors
Author = pd.DataFrame(list(authors.items()),columns = ['pmid','Author'])
Author

In [1]:
# get year for each article:
years = {}
for pmid in pmids:
    years[pmid] = fetch.article_by_pmid(pmid).year
Year = pd.DataFrame(list(years.items()),columns = ['pmid','Year'])
Year

In [1]:
# get volume for each article:
volumes = {}
for pmid in pmids:
    volumes[pmid] = fetch.article_by_pmid(pmid).volume
Volume = pd.DataFrame(list(volumes.items()),columns = ['pmid','Volume'])
Volume

In [1]:
# get issue for each article:
issues = {}
for pmid in pmids:
    issues[pmid] = fetch.article_by_pmid(pmid).issue
Issue = pd.DataFrame(list(issues.items()),columns = ['pmid','Issue'])
Issue

In [1]:
# get journal for each article:
journals = {}
for pmid in pmids:
    journals[pmid] = fetch.article_by_pmid(pmid).journal
Journal = pd.DataFrame(list(journals.items()),columns = ['pmid','Journal'])
Journal

In [1]:
# get citation for each article:
citations = {}
for pmid in pmids:
    citations[pmid] = fetch.article_by_pmid(pmid).citation
Citation = pd.DataFrame(list(citations.items()),columns = ['pmid','Citation'])
Citation

In [1]:
links={}
for pmid in pmids:
    links[pmid] = "https://pubmed.ncbi.nlm.nih.gov/"+pmid+"/"
Link = pd.DataFrame(list(links.items()),columns = ['pmid','Link'])
Link

In [1]:
data_frames = [Title,Abstract,Author,Year,Volume,Issue,Journal,Citation,Link]
from functools import reduce
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['pmid'],
                                            how='outer'), data_frames)
df_merged

In [1]:
df_merged.to_csv('pubmed_articles.csv')

In [1]:
#import webbrowser
#webbrowser.open('https://pubmed.ncbi.nlm.nih.gov/35089989/') 