Skip to content

nfflow/pubmedflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PUBMED-FLOW

Open source data collection tool to fetch data from pubmed

Contribute and Support

License:MIT GitHub commit PRs Welcome Open All Collab

🎮 Features

  • fetch pubmed ids (pmids) based on keyword query (supports multiple keywords query)
  • Fetch Abstract of research papers from pubmed based on pmids
  • Download the full pdf of respective pmid -> if available on pubmedcentral (pmc)
  • if pdf not available on pmc -> download from scihub internally

How to obtain ncbi key?

Installation

From pypi

pip install pubmedflow

From source

python setup.py install

OR

pip install git+https://github.com/nfflow/pubmedflow

How to use api?

Arguments:

Name Input Description
folder_name Optional, str path to store output data

Quick Start:

Download pubmed articles as PDF and DataFrame -

import eutils
from pubmedflow import LazyPubmed


pb        = LazyPubmed(title_query,
                 folder_name='pubmed_data',
                 api_key='',
                 max_documents=None,
                 download_pdf=True,
                 scihub=False)
                    

Perform unsupervised learning to make a pre-trained model from the collected data:

pb.pubmed_train(model_name='sentence-transformers/all-mpnet-base-v2',
                                     model_output_path='pubmedflow_model',
                                     model_architecture='ct')

Do question answering on the downloaded text to get answer spans from each article:

qa_results = pb.pubmed_qa(qa_query = 'What are the chronic diseases',)
 print(qa_results)

Summarise each of them

summ_results = pb.pubmed_summarise()
print(summ_results)

Perform entity extraction on each of them

ents = pb.pubmed_entity_extraction()
print(ents)