Automatically Search and Download Papers
Python JavaScript HTML TeX CSS Shell Other
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
manage update setup Jan 4, 2016
report for soa Jun 26, 2014
sopaper sanitize file name (#7) Jun 2, 2017
webapi move things to webapi Jan 6, 2016
.gitattributes add gitattr Nov 22, 2015
.gitignore report Jun 10, 2014
LICENSE.txt fix multiprocess bug Jan 4, 2016
MANIFEST.in update setup Jan 4, 2016
README.md fix #9 Jun 2, 2017
TODO update log Mar 5, 2016
pdf-compress.py fix import Jan 4, 2016
setup.cfg add setup Jan 4, 2016
setup.py add python-magic to requirements (#8) Jun 2, 2017
sopaper.py update readme Jan 6, 2016

README.md

SoPaper, So Easy

This is a project designed for researchers to conveniently access papers they need.

The command line tool sopaper can automatically search and download paper from Internet, given the title. The downloaded paper will thus have a readable file name (I wrote it at the beginning because I'm tired of seeing the file name being random strings). It mainly supports searching papers in computer science.

How to Use

Install command line dependencies:

  • pdftk command line executable.
    • Using pdftk on OSX10.11 might lead to hangs. See here for more info.
  • poppler-utils (optional)

Install python package: pip install --user sopaper

Usage:

$ sopaper --help
$ sopaper "Distinctive image features from scale-invariant keypoints"
$ sopaper "https://arxiv.org/abs/1606.06160"

NOTE: If you are not in school, you may need proxy by environment variable http_proxy and https_proxy, to be able to download from certain sites (such as 'dl.acm.org').

Features

The searcher module will fuzzy search and analyse results in

  • Google Scholar
  • Google

and the fetcher module will further analyse the results and download papers from the following possible sources:

Searcher and Fetcher are extensible to support more websites.

The command line tool will directly download the paper with a clean filename. All downloaded paper will be compressed using ps2pdf from poppler-utils, if available.

TODO

  • Fetcher dedup: when arxiv abs/pdf apperas both in search results, page would be downloaded twice (maybe add a cache for requests)
  • Don't trust arxiv link from google scholar
  • Is title correctly updated for dlacm?
  • Extract title from bibtex -- more accurate?
  • Fetcher for other sites