# <a>Longevity InTime BioTech - Biomark Tracker<a/>
    
This notebook performs searches in the [PubMed website](https://pubmed.ncbi.nlm.nih.gov/) for diseases and the respective biomarkers.
    
The list of biomarkers was extracted from [here](http://www.cirion.com/DirectDownload.aspx?nav_id=714&lang_id=E)
    
Once the methodology for the extraction is validated, with better biomarkers list, one could create a simple application using Streamlit of Flask (both with Python backend) to have a functional system ready to production

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import re
import requests

In [2]:
list_biomarkers = pd.read_csv('./data/raw/biomarkers.csv')

list_biomarkers.head()

Unnamed: 0,adrenal corticotrophic hormone (acth)
0,alkaline phosphatase (alp)
1,alpha-foetoprotein
2,alpha-gst
3,aminoterminal propeptide type 1 collagen
4,anti hbs


In [3]:
def return_biomarkers_for_disease(biomarker_list, disease, start_date, end_date, pages):
    main_url = 'https://pubmed.ncbi.nlm.nih.gov/?term={}&filter=years.{}-{}&format=abstract&size={}'
    
    search_disease = disease.replace(" ", "+")
    
    # Send first request
    req = requests.get(main_url.format(search_disease, str(start_date), str(end_date), pages),
                       headers={'User-Agent': 'Mozilla/5.0'})
    
    search_page = req.text
    
    soup = BeautifulSoup(search_page, 'html.parser')
    
    abstracts = soup.find_all('div', {"class": "abstract"} )
    
    result = dict((biomarker,0) for biomarker in biomarker_list)
    
    for biomarker in result.keys():
        for abstract in abstracts:
            abstract = str(abstract).lower().split()
            result[biomarker] += abstract.count(biomarker)
    
    return [(biomarker, count) for biomarker, count in result.items() if count > 0]
    

In [4]:
disease = 'diabetes mellitus'
biomarkers_disease = return_biomarkers_for_disease(list_biomarkers.values.flatten(), disease, 2020, 2022, 10)

f'Biomarkers for {disease}: {biomarkers_disease}'

"Biomarkers for diabetes mellitus: [('glucose', 2), ('insulin', 1)]"

In [5]:
disease = 'pneumonia'
biomarkers_disease = return_biomarkers_for_disease(list_biomarkers.values.flatten(), disease, 2020, 2022, 50)

f'Biomarkers for {disease}: {biomarkers_disease}'

"Biomarkers for pneumonia: [('hiv', 1)]"