shakespeare

Identify relevant scientific papers with simple machine learning techniques

Installation

Copy shakespeare.py, data and content_sources to your pythonpath.

To intsall an example knowledge set, copy examples' contents to $HOME/.shakespeare

Depends on bibtexparser, feedparser scikit-learn packages, which can be installed via pip

pip install bibtexparser scikit-learn feedparser

Features

fetch functions for the following journals
- Phys Rev A-X
- PRL
- PNAS
- Nature + Nature:Stuff
- Science
- Small
- ACS Nano, Nano Letters
- Soft Matter
- Langmuir
- Angewandte Chemie
- JCP, JCP B
Fetch functions for arXiv
support for BibTex Files
Naive bayes training and classification

Usage

The very first thing to do is to let the code know where 'bad stuff' is

./shakespeare.py -g good.bib -k examples/ --overwrite-knowledge --train

Train naive_bayes algorithm

./shakespeare -g thegoodstuff.bib -b thebadstuff.bib -k examples --train

Find papers from nature nano and PNAS

./shakespeare.py -j natnano pnas -o cool_papers.md

Find papers from the arxiv cond-mat.soft and math, then review the algorithms selection

./shakespeare.py -a cond-mat.soft math --feedback

Help printout

usage: shakespeare.py [-h] [-o OUTPUT] [-b [BIBFILES [BIBFILES ...]]]
                      [-j [JOURNALS [JOURNALS ...]]] [-a [ARXIV [ARXIV ...]]]
                      [--all_sources] [--all_good_sources] [--train]
                      [-g GOOD_SOURCE] [-m METHOD] [-k KNOWLEDGE]
                      [--overwrite-knowledge] [--feedback] [--review_all]
optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file name. only supports markdown right now.
  -b [BIBFILES [BIBFILES ...]], --bibtex [BIBFILES [BIBFILES ...]]
                        bibtex files to fetch
  -j [JOURNALS [JOURNALS ...]], --journals [JOURNALS [JOURNALS ...]]
                        journals to fetch. Currently supports physreve
                        physrevd jchemphysb physreva physrevc pnas nature
                        jchemphys science natmat physrevb acsnano jphyschem
                        nanoletters natphys prl small angewantechemie langmuir
                        physrevx natnano.
  -a [ARXIV [ARXIV ...]], --arXiv [ARXIV [ARXIV ...]]
                        arXiv categories to fetch
  --all_sources         flag to search from all sources.
  --all_good_sources    flag to search from good sources. Specfied in your
                        config file.
  --train               flag to train. All sources beside "--train-input-good"
                        are treated as bad/irrelevant papers
  -g GOOD_SOURCE, --train_input_good GOOD_SOURCE
                        bibtex file containing relevant articles.
  -m METHOD, --method METHOD
                        Methods to try to find relevent papers. Right now,
                        only all, title, author, and abstract are valid fields
  -k KNOWLEDGE, --knowledge KNOWLEDGE
                        path to database containing information about good and
                        bad keywords. If you are training, you must specifiy
                        this, as it will be where your output is written
  --overwrite-knowledge
                        flag to overwrite knowledge,if training
  --feedback            flag to give feedback after sorting content
  --review_all          review all the new selections. Otherwise, you will
                        only review the good selections

TODO

Train a bunch and see if this is worth any more time
Make an nice installer
Add support for a config file for setting defaults (which journals to search, etc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

shakespeare

Installation

Features

Usage

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

shakespeare

Installation

Features

Usage

TODO