Paper Fetcher

Given a list of DOIs, one per line in a file named dois.txt, php fetch.php will read each DOI, attempt to fetch bibliographic JSON from dx.doi.org and HTML from the publisher via dx.doi.org, then parse the HTML to find a PDF URL and fetch that PDF.

If successful, the files are stored in the data folder.

Alongside each file a JSON file is stored that contains metadata for the HTTP request/response.

The XPath selectors to find the PDF URL (or next HTML URL, if it's an interstitial page) are in selectors.json. They work for most publishers, but not all - it will almost certainly be necessary to add domain-specific rules to get 100% coverage.

Note that if page content is generated dynamically or fetched with JavaScript, it won't be captured.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
Article.php		Article.php
README.md		README.md
Util.php		Util.php
fetch.php		fetch.php
selectors.json		selectors.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Fetcher

About

Releases

Packages

Languages

hubgit/paper-fetcher

Folders and files

Latest commit

History

Repository files navigation

Paper Fetcher

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages