# Crawling PUBMED toy example

## Collecting pmids from abstract list

In [60]:
import requests

# Fetch a web page
r = requests.get("https://www.ncbi.nlm.nih.gov/pubmed/trending/")


In [61]:
from bs4 import BeautifulSoup

# Remove HTML tags using Beautiful Soup library
soup = BeautifulSoup(r.text, "html5lib")
#print(soup.get_text())

In [62]:
# Find paper list in the result
paper_list = soup.find_all("div", class_="rslt")
print("size: ", len(paper_list))
print(paper_list[0])
print("------------")
print(paper_list[1])

size:  20
<div class="rslt"><p class="title"><a href="/pubmed/29785052" ref="ordinalpos=1&amp;ncbi_uid=29785052&amp;link_uid=29785052&amp;linksrc=docsum_title">Observation of anisotropic magneto-Peltier effect in nickel.</a></p><div class="supp"><p class="desc">Uchida KI, Daimon S, Iguchi R, Saitoh E.</p><p class="details"><span class="jrnl" title="Nature">Nature</span>. 2018 May 21. doi: 10.1038/s41586-018-0143-x. [Epub ahead of print]</p></div><div class="aux"><div class="resc"><dl class="rprtid"><dt>PMID:</dt> <dd>29785052</dd> </dl></div><p class="links nohighlight"><a href="/pubmed?linkname=pubmed_pubmed&amp;from_uid=29785052" ref="ordinalpos=1">Similar articles</a> </p></div></div>
------------
<div class="rslt"><p class="title"><a href="/pubmed/29785878" ref="ordinalpos=2&amp;ncbi_uid=29785878&amp;link_uid=29785878&amp;linksrc=docsum_title">Five-Year Outcomes with PCI Guided by Fractional Flow Reserve.</a></p><div class="supp"><p class="desc">Xaplanteris P, Fournier S, Pijls NHJ

In [63]:
# Extract outlink in the first item in the list
tag = paper_list[0].select_one("a")
print ("extracted hyperlink: ", tag.get('href'))
link = "https://www.ncbi.nlm.nih.gov"+tag.get('href')
print ("URL to visit: ", link)

extracted hyperlink:  /pubmed/29785052
URL to visit:  https://www.ncbi.nlm.nih.gov/pubmed/29785052


### Requesting the abstract page

In [64]:
# Request the abstract page from the parsed hyperlink
abstr = requests.get(link)
soup_abstr = BeautifulSoup(abstr.text, "html5lib")

### Parsing the title and abstract texts from the page 

In [65]:
# Extracting HTML texts containing abstract title
h1s = soup_abstr.find_all("h1")
print (h1s)

[<h1 class="img_logo"><a class="pmlogo offscreen" href="/pubmed" title="PubMed">PubMed</a></h1>, <h1>Observation of anisotropic magneto-Peltier effect in nickel.</h1>]


In [66]:
#Parsing abstract title
title = str(h1s[1])[4:-5]
print ("Title: ", title)

Title:  Observation of anisotropic magneto-Peltier effect in nickel.


In [67]:
#Extracting HTML texts containing abstract text
abstr_text = soup_abstr.find_all("div", class_="abstr")
print (abstr_text)



[<div class="abstr"><h3>Abstract</h3><div class=""><p>The Peltier effect, discovered in 1834, converts a charge current into a heat current in a conductor, and its performance is described by the Peltier coefficient, which is defined as the ratio of the generated heat current to the applied charge current<sup>1,2</sup>. To exploit the Peltier effect for thermoelectric cooling or heating, junctions of two conductors with different Peltier coefficients have been believed to be indispensable. Here we challenge this conventional wisdom by demonstrating Peltier cooling and heating in a single material without junctions. This is realized through an anisotropic magneto-Peltier effect in which the Peltier coefficient depends on the angle between the directions of a charge current and magnetization in a ferromagnet. By using active thermography techniques<sup>3-10</sup>, we observe the temperature change induced by this effect in a plain nickel slab. We find that the thermoelectric properties o

In [68]:
#Parsing abstract text
fin_text = abstr_text[0].select_one("p").get_text().strip()
print("Abstract: ", fin_text)

Abstract:  The Peltier effect, discovered in 1834, converts a charge current into a heat current in a conductor, and its performance is described by the Peltier coefficient, which is defined as the ratio of the generated heat current to the applied charge current1,2. To exploit the Peltier effect for thermoelectric cooling or heating, junctions of two conductors with different Peltier coefficients have been believed to be indispensable. Here we challenge this conventional wisdom by demonstrating Peltier cooling and heating in a single material without junctions. This is realized through an anisotropic magneto-Peltier effect in which the Peltier coefficient depends on the angle between the directions of a charge current and magnetization in a ferromagnet. By using active thermography techniques3-10, we observe the temperature change induced by this effect in a plain nickel slab. We find that the thermoelectric properties of the ferromagnet can be redesigned simply by changing the config