Skip to content

fccoelho/pubmed_downloader

Repository files navigation

Pubmed Downloader

This code helps you to start a thematic collection of articles by automatically downloading them from PubMed.

It uses Pubmed official API, so there is no scraping involved and its completely in accordance with its usage policy.

Downloading

To start using it, follow the steps below.

from fetch import SearchAndCapture
S = SearchAndCapture('your@email.com', '((zika microcephaly) NOT zika[author])')
S.update_multiple_searches()
S.update_citations_concurrently()

That's all! You may want to run it once a day to include recently published articles. This will also update a separate collection with the citation network for all the PMIDs in the article collection

You need to have a local mongodb server running.

Downloading to postgresql

If you don't want to work with Mongodb,

You can create your corpus of articles instead on PostgreSQL.

For that you can customize your query strings on the fetch2psql.py script and run it on the command line.

$ ./fetch2pgsql.py

This script also enables full text indexing on the corpus.

Export network data

You can export csv files with data structured for network analysis:

$ python3 network_analysis.py

This script only works for the mongodb databases. I should not be difficult to adapt it to work with corpora stored in Postgres.

About

This code makes it easier to batch download article records from pubmed using its open API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published