Pubmed Toolkit

A bundle of python scripts searching, downloading pdf, and analyzing from Pubmed.

Installation

Before installation, ensure that you had installed the Python 3 and Pip tool.

Clone the repository

git clone https://github.com/kaaass/PubmedToolkit.git
cd PubmedToolkit

Install the dependency using pip

pip install -r requirements.txt

pubmed_central.py

Download PDF from pubmed central by PMIDs or PMID Source File. See "PMID Source File Schema" for more detail about the source file schema.

Support resuming from break point
Support retrying failed tasks
Support proxy pool against anti-spider

Usage

usage: pubmed_central.py [-h] [-o OUTPUT_DIR] [--resume] [--retry] [--use-proxy]
                         [PMIDs or PMID source file [PMIDs or PMID source file ...]]

Download PDFs from pubmed central by PMIDs

positional arguments:
  PMIDs or PMID source file
                        PMIDs to download, or filepath of PMID source file.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        output directory
  --resume              Allow resume from an exist lock file
  --retry               Retry the tasks in the failed file
  --use-proxy           Use proxy pool to access Pubmed Central

Examples

Download pdf from pmid 29138661, 29123944

python pubmed_central.py 29138661 29123944

Download pdf from pmid source file

python pubmed_central.py data.json

Resume from an interrupted task

python pubmed_central.py data.json

PMID Source File Schema

PMID Source File is a JSON file stores an array of objects. This file could be generated by pubmed_search.py.

[
    {
    	"pmid": 0, // PMID
        // Other attributes will be ignored
	},
    // ...
]

For other formats, you might need to edit the function load_source_file.

Todo

Support schema: each pmid a line
Support schema: bibtex library

pubmed_search.py

WARNING: This is an incomplete script, you might need to edit the source code for using it.

Search entries from pubmed using a given query, saving the information as a JSON file.

Usage

Change the variable query to your favor. The query could be built by https://www.ncbi.nlm.nih.gov/pubmed/advanced.

Change the parameter max_results to specify the maximum number of result.

Run python pubmed_search.py, the result will be stored in data.json.

[WIP] pubmed_info.py

Download metadata, figures and extract text from PDFs.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pubmed_central.py		pubmed_central.py
pubmed_info.py		pubmed_info.py
pubmed_info.reader.py		pubmed_info.reader.py
pubmed_search.py		pubmed_search.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pubmed_central.py

pubmed_central.py

pubmed_info.py

pubmed_info.py

pubmed_info.reader.py

pubmed_info.reader.py

pubmed_search.py

pubmed_search.py

requirements.txt

requirements.txt

Repository files navigation

Pubmed Toolkit

Installation

pubmed_central.py

Usage

Examples

PMID Source File Schema

Todo

pubmed_search.py

Usage

[WIP] pubmed_info.py

Thanks

About

Releases

Packages

Contributors 2

Languages

License

kaaass/PubmedToolkit

Folders and files

Latest commit

History

Repository files navigation

Pubmed Toolkit

Installation

pubmed_central.py

Usage

Examples

PMID Source File Schema

Todo

pubmed_search.py

Usage

[WIP] pubmed_info.py

Thanks

About

Resources

License

Stars

Watchers

Forks

Languages