Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suggestions from our JOSS Submission Review #164

Merged
merged 10 commits into from
Aug 5, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright 2016 MSMBuilder

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Osprey
[![Build Status](https://travis-ci.org/msmbuilder/osprey.svg?branch=master)](https://travis-ci.org/msmbuilder/osprey)
[![Coverage Status](https://coveralls.io/repos/github/msmbuilder/osprey/badge.svg?branch=master)](https://coveralls.io/github/msmbuilder/osprey?branch=master)
[![PyPi version](https://badge.fury.io/py/osprey.svg)](https://pypi.python.org/pypi/osprey/)
[![License](https://img.shields.io/badge/license-ASLv2.0-red.svg?style=flat)] (https://pypi.python.org/pypi/osprey/)
[![License](https://img.shields.io/badge/license-ASLv2.0-red.svg?style=flat)] (http://www.apache.org/licenses/LICENSE-2.0)
[![DOI](https://zenodo.org/badge/9890/msmbuilder/osprey.svg)](https://zenodo.org/badge/latestdoi/9890/msmbuilder/osprey)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg?style=flat)] (http://msmbuilder.org/osprey)

Expand All @@ -19,7 +19,7 @@ parallel optimization of model hyperparameters.

Documentation
------------
For full documentation, please visit the [Osprey homepage](http://msmbuilder.org/osprey/development).
For full documentation, please visit the [Osprey homepage](http://msmbuilder.org/osprey/).

Installation
------------
Expand All @@ -29,15 +29,16 @@ If you have an Anaconda Python distribution, installation is as easy as:
$ conda install -c omnia osprey
```

You can also install with `pip`:
You can also install Osprey with `pip`:
```
$ pip install git+git://github.com/pandegroup/osprey.git
$ pip install osprey
```

Alternatively, you can install directly from this GitHub repo:
```
$ git clone https://github.com/msmbuilder/osprey.git
$ cd osprey && python setup.py install
$ cd osprey && git checkout 1.1.0
$ python setup.py install
```


Expand Down Expand Up @@ -113,16 +114,32 @@ You can dump the database to JSON or CSV with `osprey dump`.

Dependencies
------------
- `six`
- `pyyaml`
- `numpy`
- `scikit-learn`
- `sqlalchemy`
- `six>=1.10.0`
- `pyyaml>=3.11`
- `numpy>=1.10.4`
- `scipy>=0.17.0`
- `scikit-learn>=0.17.0`
- `sqlalchemy>=1.0.10`
- `bokeh>=0.12.0`
- `matplotlib>=1.5.0`
- `GPy` (optional, required for `gp` strategy)
- `scipy` (optional, required for `gp` strategy)
- `hyperopt` (optional, required for `hyperopt_tpe` strategy)
- `nose` (optional, for testing)


Contributing
------------

In case you encounter any issues with this package, please consider submitting
a ticket to the [GitHub Issue Tracker](https://github.com/msmbuilder/osprey/issues).
We also welcome any feature requests and highly encourage users to
[submit pull requests](https://help.github.com/articles/creating-a-pull-request/)
for bug fixes and improvements.

For more detailed information, please refer to our
[documentation](http://msmbuilder.org/osprey/contributing.html).


Citing
------

Expand Down
6 changes: 4 additions & 2 deletions devtools/conda-recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ requirements:
- scipy
- scikit-learn
- sqlalchemy
- bokeh
- matplotlib

test:

Expand All @@ -34,8 +36,8 @@ test:
- mdtraj
- coverage
- python-coveralls
- matplotlib
- bokeh



imports:
- osprey
Expand Down
54 changes: 48 additions & 6 deletions docs/config_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Search Space
The search space describes the space of hyperparameters to search over
to find the best model. It is specified as the product space of
bounded intervals for different variables, which can either be of type
``int``, ``float``, ``jump``, or ``enum``. Variables of type ``float`` can also
``int``, ``float``, or ``enum``. Variables of type ``float`` can also
be warped into log-space, which means that the optimization will be
performed on the log of the parameter instead of the parameter itself.

Expand All @@ -67,15 +67,35 @@ Example: ::
type: enum


You can also transform ``float`` and ``int`` variables into enumerables by
declaring a ``jump`` variable:

Example: ::

search_space:
logistic__C:
min: 1e-3
max: 1e3
num: 10
type: jump
var_type: float
warp: log

In the example above, we have declared a ``jump`` variable ``C`` for the
``logistic`` estimator. This variable is essentially an ``enum`` with
10 possible ``float`` values that are evenly spaced apart in log-space within
the given ``min`` and ``max`` range.


.. _strategy:

Strategy
--------

Three probablistic search strategies are supported. First, random search
(``strategy: {name: random}``) can be used, which samples hyperparameters randomly
from the search space at each model-building iteration. Random search has
`been shown to be <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`_ significantly more effiicent than pure grid search. Example: ::
Three probablistic search strategies and grid search are supported. First,
random search (``strategy: {name: random}``) can be used, which samples
hyperparameters randomly from the search space at each model-building iteration.
Random search has `been shown to be <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`_ significantly more effiicent than pure grid search. Example: ::

strategy:
name: random
Expand All @@ -87,14 +107,22 @@ package `hyperopt <https://github.com/hyperopt/hyperopt>`_ be installed. Example
strategy:
name: hyperopt_tpe

Finally, ``osprey`` supports a Gaussian process expected improvement search
``osprey`` supports a Gaussian process expected improvement search
strategy, using the package `GPy <https://github.com/SheffieldML/GPy>`_, with
``strategy: {name: gp}``.
``url`` param. Example: ::

strategy:
name: gp

Finally, and perhaps simplest of all, is the
`grid search strategy <https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search>`_
(``strategy: {name: grid}``). Example: ::

strategy:
name: grid

Please note that grid search only supports ``enum`` and ``jump`` variables.

.. _dataset_loader:

Expand Down Expand Up @@ -126,6 +154,7 @@ To access the other iterators, use the ``name`` and ``params`` keywords: ::
params:
n_iter: 5
test_size: 0.5
random_state: 42

Here's a complete list of supported iterators, along with their ``name`` mappings:

Expand All @@ -137,6 +166,19 @@ Here's a complete list of supported iterators, along with their ``name`` mapping

.. _trials:


Random Seed
----------------
In case you need reproducible Osprey trials, you can also include an
optional random seed as seen below:

Example: ::

random_seed: 42

Please note that this makes parallel trials redundant and, thus, not
recommended when scaling across multiple jobs.

Trials Storage
--------------

Expand Down
14 changes: 14 additions & 0 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Contributing
============

In case you encounter any issues with this package, please consider submitting
a ticket to the `GitHub Issue Tracker <https://github.com/msmbuilder/osprey/issues>`_.
We also welcome any feature requests and highly encourage users to
`submit pull requests <https://help.github.com/articles/creating-a-pull-request/>`_
for bug fixes and improvements.

When submitting a pull request, please include a brief summary of any bug fixes
or added features. Also be sure to include relevant unit tests, as this
greatly speeds up the submission process. We strive to adhere to current
best-practices in code development, using `Travis CI <https://travis-ci.com>`_
to run our unit tests and ensure code quality.
16 changes: 11 additions & 5 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ Osprey
Osprey is a tool for practical hyperparameter optimization of machine learning
algorithms. It's designed to provide a practical, **easy to use** way for
application scientists to find parameters that maximize the cross-validation
score of a model on their dataset. Osprey is being developed by researchers at
Stanford University with primary application areas in computational protein
dynamics and drug design.
score of a model on their dataset. Osprey is actively being developed by
researchers around the world, with primary application areas in computational
protein dynamics and drug design, and distributed under the
`Apache License (v2.0) <https://www.apache.org/licenses/LICENSE-2.0>`_.
All development takes place on `GitHub <https://github.com/msmbuilder/osprey>`_.

Overview
--------
Expand All @@ -18,14 +20,18 @@ and storage for the :ref:`results <trials>`.


Related tools include and `spearmint <https://github.com/JasperSnoek/spearmint>`_,
`hyperopt <http://hyperopt.github.io/hyperopt/>`_, and
`GPy <http://sheffieldml.github.io/GPy/>`_. Both hyperopt and GPy can serve as backend
`hyperopt <https://hyperopt.github.io/hyperopt/>`_, and
`GPy <https://sheffieldml.github.io/GPy/>`_. Both hyperopt and GPy can serve as backend
:ref:`search strategies <strategy>` for osprey.


To get started, run ``osprey skeleton`` to create an example config file, and
then boot up one or more parallel instances of ``osprey worker``.

If you happen to run into any issues while using Osprey or would like suggest a
new feature, please take a moment to read our :ref:`Contributing <contributing>`
section.


.. raw:: html

Expand Down
72 changes: 46 additions & 26 deletions docs/installation.rst
Original file line number Diff line number Diff line change
@@ -1,51 +1,71 @@
Installation
============

Osprey is written in Python, and can be installed with standard python
machinery
Osprey is written in Python, and can be installed with standard Python
machinery; we highly recommend using an
`Anaconda Python distribution <https://www.continuum.io/downloads>`_.


Development Version
-------------------
Release Version
---------------


With Anaconda, installation is as easy as:

.. code-block:: bash

# grab the latest version from github
$ pip install git+git://github.com/pandegroup/osprey.git
$ conda install -c omnia osprey

You can also install Osprey with `pip`:

.. code-block:: bash

# or clone the repo yourself and run `setup.py`
$ git clone https://github.com/pandegroup/osprey.git
$ cd osprey && python setup.py install
$ pip install osprey

Release Version
---------------
Alternatively, you can install directly our
`GitHub repository <https://github.com/msmbuilder/osprey>`_.:

.. code-block:: bash

$ git clone https://github.com/msmbuilder/osprey.git
$ cd osprey && git checkout 1.1.0
$ python setup.py install

Currently, **we recommend that you use the development version**, since things are
moving fast. However, release versions from PyPI can be installed using ``pip``.

Development Version
-------------------

To grab the latest version from github, run:

.. code-block:: bash

# grab the release version from PyPI
$ pip install osprey
$ pip install git+git://github.com/pandegroup/osprey.git

Or clone the repo yourself and run `setup.py`:

.. code-block:: bash

$ git clone https://github.com/pandegroup/osprey.git
$ cd osprey && python setup.py install


Dependencies
------------
- ``six``
- ``pyyaml``
- ``numpy``
- ``scikit-learn``
- ``sqlalchemy``
- ``hyperopt`` (recommended, required for ``engine=hyperopt_tpe``)
- ``GPy`` (recommended, required for ``engine=gp``)
- ``scipy`` (optional, for testing)
- ``nose`` (optional, for testing)
- `six>=1.10.0`
- `pyyaml>=3.11`
- `numpy>=1.10.4`
- `scipy>=0.17.0`
- `scikit-learn>=0.17.0`
- `sqlalchemy>=1.0.10`
- `bokeh>=0.12.0`
- `matplotlib>=1.5.0`
- `GPy` (optional, required for `gp` strategy)
- `hyperopt` (optional, required for `hyperopt_tpe` strategy)
- `nose` (optional, for testing)

You can grab most of them with conda. ::

$ conda install six pyyaml numpy scikit-learn sqlalchemy nose
$ conda install six pyyaml numpy scikit-learn sqlalchemy nose bokeh matplotlib

Hyperopt can be installed with pip. ::

Expand All @@ -63,4 +83,4 @@ To use ``gp`` search, you must install GPy on the machines you use to run
osprey. For easy installation, use the conda binary packages that
we've compiled. ::

conda install -c omnia gp
conda install -c omnia gpy
4 changes: 3 additions & 1 deletion osprey/cli/parser_skeleton.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
from argparse import ArgumentDefaultsHelpFormatter


from ..execute_skeleton import TEMPLATES

def func(args, parser):
# delay import of the rest of the module to improve `osprey -h` performance
from ..execute_skeleton import execute
Expand All @@ -15,7 +17,7 @@ def configure_parser(sub_parsers):
p.add_argument('-t', '--template', help=(
"which skeleton to create. 'msmbuilder' is a skeleton config file for"
"MSMBuilder molecular dynamics / Markov state model based "
"projects."), choices=['msmbuilder', 'sklearn'], default='msmbuilder',)
"projects."), choices=TEMPLATES.keys(), default='msmbuilder')
p.add_argument('-f', '--filename', help='config filename to create',
default='config.yaml')
p.set_defaults(func=func)