In [None]:
from __future__ import print_function
from matplotlib import pyplot as plt
%matplotlib inline

<link rel="stylesheet" href="reveal.js/css/theme/simple.css" id="theme">
<link rel="stylesheet" href="custom.css" id="custom-tweaks">

## VOEventDB
### and 
## Sustainable Software

### Tim Staley  / [4pisky.org](http://4pisky.org)

Hotwiring the Transient Universe V

Villanova, PA, Oct 2016

Hi, I'm... For the past five years I've been working on 4 Pi Sky, which is a project focused on detection of radio-transients, and co-ordination of their follow-up. For this talk I was originally just going to present our recently publishing archiving and query tool VOEventDB, but: 
* Since the committee were kind of enough to give me an extended slot, I thought I should perhaps try to talk on a slightly broader theme
* And: there's a load of documentation online already, which you can digest at your leisure!

So, instead I thought I'd try to use VOEventDB as a sort of window onto a more general theme I've been thinking about recently, which I've put under the vague heading of sustainable software.

## Sustainable software? 

![Buzzword alert](img/sust.png)

## Sustainable software? 

* can be used by others
* can be reused outside original context
* can be modified by other devs
* is robust to changing dependencies

### NB: kind of a Platonic ideal.

Definitions are many, fuzzy and varied. Here's mine:
Sustainable software:
* can be used by someone other than the dev that wrote it
 (Without the dev sat looking over their shoulder). This includes installation!
* can be reused outside of its original context 
(perhaps with minor modification) 
* can be modified by someone other than the dev that wrote it
* is robust to changing dependencies - i.e. it won't break because some other library releases a new 'latest-and-greatest' version with an altered interface.

It's a set of goals to bear in mind - no project is perfect, and there's a cost/reward trade-off.


## Why is this relevant?


* There have always been large multi-person / long-term software projects in astronomy.
* (I think) we're seeing an explosion in the number of public 'smaller' codes (cf [ASCL](http://ascl.net/))
* This is a **really good thing**, but comes with difficulties of success
* The easier it is to evaluate, re-use, modify and recycle these codes, the better.

* There have always been a few large-multi person software projects in astronomy.
* (I think) we're seeing an explosion in the number of 'small' codes, (previously, projects that would have been written by a PhD student and then disappeared?)
* This is a **really good thing**, but comes with difficulties of success - hard to know what's out there, and whether it's any good.
* The easier it is to evaluate, re-use, modify and recycle these codes, the better.

## What I'll try to cover

* VOEventDB - what it's for, what it does
* A few items on the 'sustainable software checklist'
* (Python) Tooling to make your life easier



... try to persuade you that opening up your code can be done with minimal time investment.

## VOEventDB, in brief



### Context
* [VOEvent](http://voevent.rtfd.io/) is a standardised format for astronomical transient alerts.
* [NASA-GCN](http://gcn.gsfc.nasa.gov/) have been transmitting alerts in this format for over 2 years.
* Previously, there was no public archive for alerts in this format.

* VOEvent standard has always referred to a 'registry' of 'repositories' - clear gap to fill.

### VOEventDB: Spec
 * Store raw VOEvent XML, provide XML content at a persistent URL
 * Store a common subset of VOEvent metadata in regular database schema
 * Make queries based on this common subset
 * Including spatial (cone-search) and citation-based queries
 * 'RESTful' web-API
 * Python client-library for remote-queries

## Schema
![schema](img/dbschema.png)



### Implementation
 * Postgres + SQLAlchemy
 * Spatial queries powered by [qc3](https://github.com/segasai/q3c) Postgres extension.
 * Flask-powered RESTful interface
 * Partially-autogenerated [documentation](http://voeventdb.rtfd.io).
 * Extensive test-suite using pytest fixtures.

http://voeventdb.4pisky.org/

In [16]:
!yes | pip uninstall -q voeventdb.remote requests 

Proceed (y/n)? Proceed (y/n)? yes: standard output: Broken pipe


## Getting started: Client installation

In [17]:
!pip install voeventdb.remote

Collecting voeventdb.remote
Collecting requests (from voeventdb.remote)
  Using cached requests-2.11.1-py2.py3-none-any.whl
Installing collected packages: requests, voeventdb.remote
Successfully installed requests-2.11.1 voeventdb.remote-1.0.0


In [26]:
import voeventdb.remote.apiv1 as api
api.count()

1271241

In [19]:
api.map_stream_count()

{u'com.dc3/dc3.broker': 22570,
 u'nasa.gsfc.gcn/AGILE': 6174,
 u'nasa.gsfc.gcn/AMON': 4,
 u'nasa.gsfc.gcn/CALET': 79,
 u'nasa.gsfc.gcn/CAlet': 1,
 u'nasa.gsfc.gcn/COUNTERPART': 113,
 u'nasa.gsfc.gcn/Fermi': 41556,
 u'nasa.gsfc.gcn/GRO': 6569,
 u'nasa.gsfc.gcn/HETE': 6014,
 u'nasa.gsfc.gcn/INTEGRAL': 33120,
 u'nasa.gsfc.gcn/IPN': 486,
 u'nasa.gsfc.gcn/KONUS': 449,
 u'nasa.gsfc.gcn/MAXI': 6369,
 u'nasa.gsfc.gcn/MOA': 1553,
 u'nasa.gsfc.gcn/SNEWS': 44,
 u'nasa.gsfc.gcn/SUZAKU': 17,
 u'nasa.gsfc.gcn/SWIFT': 1117763,
 u'nasa.gsfc.gcn/UNRECOGNIZED_TYPE': 2,
 u'nvo.caltech/voeventnet/catot': 66,
 u'nvo.caltech/voeventnet/mlsot': 147,
 u'svomcgft.naoc/VOEVENTTEST': 3091,
 u'voevent.4pisky.org/ALARRM-OBSTEST': 5780,
 u'voevent.4pisky.org/ALARRM-REQUEST': 42,
 u'voevent.4pisky.org/ASASSN': 1747,
 u'voevent.4pisky.org/GAIA': 1272,
 u'voevent.4pisky.org/TEST': 10,
 u'voevent.4pisky.org/TEST-RESPONSE': 14,
 u'voevent.4pisky.org/TEST-TRIGGER': 14,
 u'voevent.4pisky.org/voevent-broadcast': 7761,
 u'v

In [20]:
filters={api.FilterKeys.role:'observation'}
api.map_stream_count(filters)

{u'nasa.gsfc.gcn/AMON': 4,
 u'nasa.gsfc.gcn/CALET': 79,
 u'nasa.gsfc.gcn/CAlet': 1,
 u'nasa.gsfc.gcn/COUNTERPART': 113,
 u'nasa.gsfc.gcn/Fermi': 8072,
 u'nasa.gsfc.gcn/INTEGRAL': 1111,
 u'nasa.gsfc.gcn/IPN': 486,
 u'nasa.gsfc.gcn/KONUS': 449,
 u'nasa.gsfc.gcn/MAXI': 269,
 u'nasa.gsfc.gcn/MOA': 1553,
 u'nasa.gsfc.gcn/SUZAKU': 17,
 u'nasa.gsfc.gcn/SWIFT': 1042121,
 u'nvo.caltech/voeventnet/catot': 66,
 u'nvo.caltech/voeventnet/mlsot': 147,
 u'voevent.4pisky.org/ASASSN': 1747,
 u'voevent.4pisky.org/GAIA': 1272}

### Extensive examples:
http://voeventdbremote.readthedocs.io/


Back up a moment...

In [None]:
!pip install voeventdb.remote

### What just happened?
* Pip fetched the relevant source-code package from the Python Package Index
* Read *setup.py*, parsed the list of dependencies
* Checked what's currently installed, fetched those missing (possibly from the local cache).
* Installs each of those dependencies in turn, 
* then installs our package


### Packaging 
* Encourages re-use as a component
* Removes 'install friction': just add a package to your requirements list
* Adoption has historically been slowed due to fragmented ecosystem, lack of good docs.
* Good, short, up-to-date tutorial on packaging your code: http://python-packaging.readthedocs.io/

### One snag...

*setup.py*:

In [None]:
#!/usr/bin/env python
from setuptools import setup, find_packages

install_requires = [
    'iso8601',
    'pytz',
    'requests',
    'simplejson',
    'astropy',
    'six',
]
packages = find_packages()
setup(
    name="voeventdb.remote",
    version=0.1,
    description="Client-lib for remote queries...",
    author="Tim Staley",
    author_email="github@timstaley.co.uk",
    url="https://github.com/timstaley/voeventdb.remote",
    packages=packages,
    install_requires=install_requires,
)


* This is (a condensed version of) the setup.py for voeventdb.remote.
* It's almost all boilerplate which you could easily modify to any other project- author details, a list of dependencies, and so on.
* Mostly, you can set this up once and then forget about it.
* Howver, I'd like to draw attention to the version, because this can quickly become a pain.

### Package Versioning

[Versioneer](https://github.com/warner/python-versioneer/blob/master/INSTALL.md):
* Adds a standalone Python module to your codebase
* Automatically sets version number according to most recent git-tag
* Git commit-id also available as a string in your library.
* Super convenient, keeps everything in sync

*setup.cfg*:
```
[versioneer]
VCS = git
style = pep440
versionfile_source = voeventdb/remote/_version.py
versionfile_build = voeventdb/remote/_version.py
tag_prefix =
parentdir_prefix = voeventdb.server-
```

*setup.py* with Versioneer:

In [None]:
#!/usr/bin/env python
import versioneer

setup(
    name="voeventdb.remote",
    version=versioneer.get_version(),
    cmdclass=versioneer.get_cmdclass(),
    "...",
)

In [22]:
import voeventdb.remote
print("Git tag:", voeventdb.remote.__version__)
print("Git commit-id:", voeventdb.remote.__versiondict__['full-revisionid'])

Git tag: 1.0.0
Git commit-id: 02b727d168797a9ae9bc6835c15b37e384ea1557


## Documentation


### Minimal docs:
* Description of what your package does (+ links for context!)
* One or two brief usage examples
* One big README is typically fine


### Extended docs:
* Maybe also some autogenerated-API docs (see [sphinx](http://www.sphinx-doc.org/), [sphinx-napoleon](https://sphinxcontrib-napoleon.readthedocs.io)).
* **Put it on [ReadTheDocs](https://docs.readthedocs.io/)**


### Read The Docs:
* Free hosting for Sphinx-generated documentation
* Links to a Github repository
* Every git-push results in a new documentation build

* API documentation is semi-automated - write docs next to the code
* See also [sphinx-napoleon](https://sphinxcontrib-napoleon.readthedocs.io) (nicer formatting).


In [None]:
def valid_as_v2_0(voevent):
    """Tests if a voevent conforms to the schema.

    Args:
        voevent(:class:`Voevent`): Root node of a VOEvent etree.
    Returns:
        bool: True if VOEvent is valid
    """
    _return_to_standard_xml(voevent)
    valid_bool = voevent_v2_0_schema.validate(voevent)
    _remove_root_tag_prefix(voevent)
    return valid_bool

![sphinx](img/sphinx.gif)


### Documenting examples
* Examples are very useful... until the code changes and they go stale
* Python notebooks are a great format for writing examples - but tricky to publish.

### Documenting examples with nbsphinx & RTD
* [nbsphinx](https://nbsphinx.readthedocs.io/) lets you generate docs from notebooks.
* The notebooks are re-run with every docs-build - so if the examples are broken, you'll notice.
* This is how the [voeventdb client-docs](http://voeventdbremote.readthedocs.io/) are generated.

Note: The cake is a lie. This isn't **actually** how the voeventdb.remote docs are generated, because nbsphinx was only released around the same time as I was writing those docs. I hacked something together before finding nbsphinx. But if I were writing those docs today, I'd use nbsphinx, and I'll likely migrate the existing docs at some point.

### Deployment & Hosting
 

* Served via Apache + mod_wsgi
* Use [Comet](http://comet.readthedocs.io/) to receive latest VOEvents.
* Hosted on a virtual machine in the cloud (Digital Ocean)
* (This required a little bit of care and tracking down RAM usage) 

![deploy](img/vonode.gif)


* Deploments scripted with [Ansible](http://docs.ansible.com/ansible/index.html)
* Deployment scripts are [open-source!](https://github.com/4pisky/4pisky-voeventdb)
* Can test-drive locally using [VirtualBox](https://www.virtualbox.org/wiki/Downloads) and [Vagrant](https://www.vagrantup.com/)

For multi-component systems, deployment details are **crucial**.

## Related issues I've not had time to cover



(But you should be aware of it you're new to astronomy, or academia in general. And it's perhaps worth reminding those of you with tenure.)

* The basics / extended checklist for writing 'good' software - to all new grad students I would recommend   
    * [Wilson et al 2014, Best Practices for Scientific Computing](http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745), and 
    * [Wilson et al 2016, Good Enough Practices...](https://arxiv.org/abs/1609.00037).

* Lack of software development training for grad-students
 
 (What do we drop, to replace with [software-carpentry](http://software-carpentry.org/)?)

Grads get lectures in electromagnetism, GR, stellar evolution, etc. What do we cut-out, or make optional?

* Lack of long-term career-path for 'research software engineers'

(This is changing, slowly, e.g. [UCL's RSE team](https://www.ucl.ac.uk/research-it-services/research-software-development))

There have always been a few employed on large multi-year projects (space missions, telescopes, etc), though this is perhaps a slightly different line of work to software development for general research.

It's notable STSci now employs people to work solely on Astropy, though I don't know if those positions are permanent or time-limited.

A few universities / departments have started to employ 'departmental software specialists' or even 'in-house consultancy teams' (cf [UCL](https://www.ucl.ac.uk/research-it-services/research-software-development)).

## Summary

### VOEventDB
* Provides a 'turn-key' queryable repository for transient alerts.
* Extensive user-docs at http://voeventdbremote.rtfd.io/
* Overview paper:  [arXiv:1606.03735](https://arxiv.org/abs/1606.03735)

### Packaging
* Make use of your packaging ecosystem
* Think about use of your code as a component
* Keep versioning information in your version control system! - automate package versioning

### Documentation
* Minimum: description + example usage + install requirements
* Documentation goes stale - test your examples
* In Python, notebooks are a great format for this - try nbsphinx!

## Fin
Thanks!