Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG 18: Documentation #2463

Merged
merged 15 commits into from Nov 6, 2019
1 change: 1 addition & 0 deletions docs/development/pigs/index.rst
Expand Up @@ -27,5 +27,6 @@ label`_ .
pig-012
pig-013
pig-016
pig-018

.. _pull requests with the "pig" label: https://github.com/gammapy/gammapy/issues?q=label%3Apig
322 changes: 322 additions & 0 deletions docs/development/pigs/pig-018.rst
@@ -0,0 +1,322 @@
.. include:: ../../references.txt

.. _pig-018:

**********************
PIG 18 - Documentation
**********************

* Author: Christoph Deil, Axel Donath, José Enrique Ruiz
* Created: Oct 16, 2019
* Accepted: Nov 6, 2019
* Status: accepted
* Discussion: `GH 2463`_

Abstract
========

Over the past years the Gammapy package and documentation has grown organically,
at the moment there's a lot of duplicated and missing content, especially for
recently added functionality like datasets, the high-level interface, and the
new restructure of the gammapy package. We propose to spend significant effort
to reorganise and improve the Gammapy documentation in Nov and Dec 2019, roughly
following the plan outlined here. Further discussion and planning will occur in
Github issues and pull requests, and will be summarised on the `Documentation
Github project`_ board.

Introduction
============

Gammapy started in 2013 and since then the package and documentation has
continuously evolved (see `Gammapy changelog`_). The oldest version of the
documentation that is still readily available online is for Gammapy v0.6 from
April 2017 (https://docs.gammapy.org/0.6). The current version of the
documentation is for Gammapy v0.14 from September 2019
(https://docs.gammapy.org/0.14).

In 2018, following other projects such as Astropy or Sunpy, we created a
"project webpage" at https://gammapy.org which is not versioned and
hand-written, in addition to https://docs.gammapy.org which is versioned and
auto-generated by Sphinx. And we introduced a new setup for tutorials (written
as Jupyter notebooks, integrated into the Sphinx documentation) and ``gammapy
download`` as the way that users download versioned tutorial notebooks, example
python scripts and example datasets in a reproducible conda environment (see
:ref:`pig-004`, `ADASS XVIII proceedings`_).

Currently there are a 19 tutorial notebooks plus 7 listed as "extra topics").
Among the notebooks there is a lot of duplicated content, but on the other hand
there is also still a lot of missing documentation (e.g. recently implemented
large changes in Gammapy such as :ref:`pig-012` and :ref:`pig-016` are not
completely documented yet). In addition to the Jupyter notebook tutorials, we
have RST documentation pages for each Gammapy sub-package. In some cases there
is a lot of content and examples (e.g. `maps`_ or `modeling`_), in other cases
there is only a sentence or two and the API docs (e.g. `cube`_). The more
technical documentation related with the API classes, methods and objects is
autogenerated from Python docstrings written in their code.

The tutorials usually have the following structure: introduction, setup, main
content, and sometimes at the end a summary, exercises or links to other
documentation. The sub-package RST pages usually have the following structure,
following the Astropy docs: Introduction (overview description), Getting Started
(first examples), Using (links to tutorials and sometimes sub-pages), API
reference.

We will not discuss how other projects structure their documentation, but we did
look at a small list of projects and think it's useful to compare and contrast to
figure out a good documentation for Gammapy:

- https://www.tensorflow.org/tutorials/
- https://scikit-learn.org/stable/documentation.html
- https://jupyter.org/ and https://jupyterlab.readthedocs.io
- https://www.astropy.org/ and https://docs.astropy.org
- https://sunpy.org/ and https://docs.sunpy.org
- http://cxc.harvard.edu/sherpa/ and https://sherpa.readthedocs.io
- https://fermi.gsfc.nasa.gov/ssc/data/analysis/
- https://fermipy.readthedocs.io
- http://cta.irap.omp.eu/ctools
- https://cta-observatory.github.io/ctapipe
- https://www.djangoproject.com/ and https://docs.djangoproject.com

Generally one has to be aware that Gammapy is both a flexible and extensible
library with building blocks that experts can use to implement advanced
analyses, as well as a science tool package for IACTs (CTA, H.E.S.S.) with most
analysis use cases pre-implemented, that users just need to configure and run.
For some of the examples considered, that's also the case (e.g. JupyterLab),
some others (e.g. scikit-learn or Astropy) are just a library, and thus their
documentation is partly different.

Proposal
========

Guidelines and specific actions
-------------------------------

We propose to undertake a minor general restructure of the **Getting started**
section described below, mostly keeping the existing Gammapy documentation setup
(e.g. to maintain part of the documentation in RST pages and another part in
Jupyter notebooks), though we admit that there is no clear separation between
the content of both. We will take the following items as guidelines and actions
to improve the documentation:

- More content should be moved to Jupyter notebooks (e.g. currently the RST pages for
maps, modeling, catalog, detect, etc. have a few code examples). Those should
be moved to corresponding notebooks ``maps.ipynb``, ``modeling.ipynb``,
``catalog.ipynb`` and ``detect.ipynb``, since in many cases there would be a
hands-on tutorial introduction for each sub-package. More cross-links between IPYNB,
RST and API docs should be created.
- Sub-package RST pages will be kept short with links to relevant hands-on
tutorials or Python scripts at the top, and the API docs at the bottom. Some pages
have significant content, which is not related to code examples in between. (e.g.
for maps, modeling or IRFs there is a description of the design).
- When possible the notebooks should use the high-level interface everywhere it makes
sense (e.g. automatic data reduction), and the lower-level API at the end for the very
specific use case proposed, trying to have shorter notebooks going to the point.
- Add a Gammapy overview page to the RST docs, where the general data
analysis concepts are explained (DL3, Datasets, Model, Fitting). This page
would be similar to the description of Gammapy in the paper that we also plan
to write now, and the same figures would be used for both.
- Add a HowTo RST page with a list of short specific needs linking to subsections
of notebooks exposing the solution.
- Add a few examples of how to use Gammapy with Python scripts, and provide these
scripts with ``gammapy download``.
- Extend the `Glossary`_ present in the References section with some non-obvious but common
terms used through the documentation and tutorials.
- Some effort will be put in revisioning the completeness and consistency of the API docstrings.

Getting started section restructuring
-------------------------------------

Gammapy overview
~~~~~~~~~~~~~~~~

We suggest to add an overview page at the beginning of the section. That's a ten
minute read and non-hands-on introduction to Gammapy, explaining the details of
data analysis and giving an overview about concepts such as Datasets, Fit,
Models etc. and how those play together in a Gammapy analysis. People could skip
this section and go directly to the installation or hands-on tutorial notebooks
and come back to that page later if they prefer.

Installation
~~~~~~~~~~~~

Keep as it is.

Getting started
~~~~~~~~~~~~~~~

Keep as it is.

Tutorials
~~~~~~~~~

The tutorials will be reorganised in the following groups (items) and individual
notebooks (sub-items).

- First analysis
+ Config-driven 1D and 3D analysis of Crab (evolution of current ``hess.ipynb`` that could be renamed)
+ Extended source analysis, also showing the lower-level API with customisation options for background modeling (to be implemented)

The group below will be a "starting page" for people from CTA, HESS and Fermi,
and possibly other instruments in the future. We could remove
https://gammapy.org/cta.html (very few, and outdated infos), since it is better
to have one starting page for CTA users instead of two.

- What data can I analyse?
+ Observations and Datasets (to be implemented)
+ CTA, mention prod3 and DC1, show what the IRFs look like (``cta_data_analysis.ipynb``)
+ HESS, mention DR1, show what the IRFs look like (to be implemented)
+ Fermi-LAT, show how to create map and spectrum dataset using 3FHL example (``fermi_lat.ipynb``)

- What analyses can I do?
+ IACT data selection and grouping (to be implemented)
+ IACT 3D cube analysis (data reduction and fitting)
+ IACT 2D image analysis (``image_analysis.ipynb``)
+ IACT 1D spectrum analysis (data reduction and fitting)
+ IACT light curve computation (``light_curve.ipynb``)
+ Flux point fitting (``sed_fitting_gammacat_fermi.ipynb``)
+ Binned simulation 1D / 3D (``spectrum_simulation.ipynb`` and ``simulate_3d.ipynb`` combined)
+ Binned sensitivity computation (``cta_sensistivity.ipynb``)
+ Pulsar analysis (``pulsar_analysis.ipynb``)
+ Naima model fitting (to be implemented)
+ Joint Crab analysis using Fermi + HESS + some flux points (to be implemented)

For many Gammapy sub-packages, we plan to have a corresponding notebook that is
a hands-on, executable tutorial introduction that complements the description
and API docs on the sub-package RST page. These notebooks are listed in the
group below.

- Gammapy package
+ Overview (short section with one example for each sub-package, hands-on, an evolution of the current ``getting_started.ipynb``)
+ Maps (``gammapy.maps``) (``maps.ipynb``)
+ Models (``gammapy.modeling.models``) (``models.ipynb``)
+ Modeling (``gammapy.modeling``) (to be implemented)
+ Statistics (``gammapy.stats``) (to be implemented, explains about likelihood, TS values, significance, ...)
+ Source detection (``gammapy.detect``) (``detect_ts.ipynb``, ``cwt.ipynb``)
+ Source catalogs (``gammapy.catalog``) (to be implemented)

- **Scripts**

This group will contain a few examples on how to use Gammapy from Python scripts (i.e. make a CTA 1DC
survey counts map or some other long-running analysis or simulations). The Python scripts could be provided
as links and also in ``gammapy download``, as it is the case with the notebooks.

- Extra topics
+ MCMC sampling (``mcmc_sampling.ipynb``)
+ Dark matter models (``gammapy.astro.darkmatter``) (``astro_dark_matter.ipynb``)
+ Dark matter analysis (to be implemented)
+ Light curve simulation (to be implemented)
+ Source population modelling (``gammapy.astro.population``) (``source_population_model.ipynb``)
+ Background model making (``background_model.ipynb``)
+ Sherpa for Gammapy users (``image_fitting_with_sherpa.ipynb``, ``spectrum_fitting_with_sherpa.ipynb``)?
+ HESS Galactic Plane Survey data (``hgps.ipynb``)

More specialised notebooks, and in some cases of lower quality.

- **Basics**

Leave the basics section at the end of the tutorials page, pretty much as-is.

How To
~~~~~~

We suggest to add a HowTo RST file with short entries explaining how to do
something specific in Gammapy. Probably each HOWTO should be a short section
with just 1-2 sentences and links to tutorials, specific sections of the
tutorials, to the API docs, or if it should be small mini-tutorials with code
snippets, they could possibly go on sub-pages. The `HowTo documentation page`_
shows a preliminary version of the content of this page.

Reference
~~~~~~~~~
Keep as it is and extend the Glossary.

Changelog
~~~~~~~~~
Keep as it is.


Alternatives
============

We could try to change to a purely Jupyter notebook maintained documentation
(e.g. the "Python Data Science Handbook" is written just as Jupyter notebooks).
Or we could change documentation system and write all documentation as RST or
MD, and then have a documentation processor that auto-generates notebooks. E.g.
`Jupytext`_ does this, and partly e.g. the `scikit-learn`_ dos do that for their
tutorials, they maintain it in Python scripts and RST files.

There's a lot of ways to structure the documentation, or to put different focus.

Outlook
=======

This is a short-term proposal, to quickly improve the Gammapy documentation
within the next 1-2 months, with the limited contributors we already have. In
early 2020, we should run a Gammapy user survey and gather feedback on the
Gammapy package and documentation. Examples of previous user surveys exist, e.g.
from CTA 1DC or the `Scipy documentation user survey`_, that we can use as
reference how to get useful feedback.

We should also try to attract or hire contributors to Gammapy that have a strong
interest in documentation. Once concrete idea could be to participate in `Google
season of docs`_, to get a junior technical writer for a few months, if someone
from the Gammapy team has time to work and mentor the project.

Another thing to keep in mind is that we should work towards a setup and
structure for the Gammapy package that support CTA as well as possible, and that
makes it easy for CTAO to choose and adapt Gammapy as prototype of the CTA
science tools and evolve and maintain it. This PIG doesn't propose a solution
for this, that's for later.

Implementation
==============

Implementing this PIG is a lot of work, roughly 2 months of full-time work. We
suggest that, after the PIG is accepted, one coordinator spends a few days to do
quick additions / removals / renames / rearrangements, so that the structure of
the RST and IPYNB files we want is in place. For this we propose the coordinator
to fill the `Documentation Github project`_ with a list of 20-30 tasks that
should be done (each 1-2 days of work, not longer) and asks for help. Each task
is usually to edit one notebook or RST page, and needs one author and one
reviewer. It is then up to those two people to coordinate their work: they can
open a Github issue to discuss, or they can just do a phone call or meet.
Eventually there is a pull request and when it's in a state where both are
happy, and it's merged in. Whether to use "notes" or "issues" for each task will
be discussed during the development of the PIG and and will be basically up to
the documentation coordinator.

Decision
========

The PIG was extensively discussed and received a lot of feedback on `GH 2463`_.
The main suggestions received were incorporated. There was some controversy e.g.
whether we should have more shorter pages and notebooks, or fewer longer ones.
This PIG was accepted on Nov 6, 2016, although we'd like to note that the
outline and changes described above aren't set in stone, we expect the
documentation to evolve and improve in an interative fashion over the coming
weeks, but also in 2020 and after.

.. _GH 172: https://github.com/gammapy/gammapy/issues/172
.. _GH 241: https://github.com/gammapy/gammapy/issues/241
.. _GH 242: https://github.com/gammapy/gammapy/issues/242
.. _GH 1540: https://github.com/gammapy/gammapy/issues/1540
.. _GH 1577: https://github.com/gammapy/gammapy/issues/1577
.. _GH 1597: https://github.com/gammapy/gammapy/issues/1597
.. _GH 690: https://github.com/gammapy/gammapy/issues/690
.. _GH 2164: https://github.com/gammapy/gammapy/issues/2164
.. _GH 2175: https://github.com/gammapy/gammapy/issues/2175
.. _GH 2221: https://github.com/gammapy/gammapy/issues/2221
.. _GH 2463: https://github.com/gammapy/gammapy/pull/2463
.. _80-20 rule: https://en.wikipedia.org/wiki/Pareto_principle
.. _Scipy documentation user survey: https://forms.gle/eK3x2ohJs1sLPJEk8
.. _Google season of docs: https://developers.google.com/season-of-docs/
.. _documentation principles: https://www.writethedocs.org/guide/writing/docs-principles/
.. _Gammapy changelog: https://docs.gammapy.org/dev/changelog.html
.. _maps: https://docs.gammapy.org/0.14/maps/index.html
.. _modeling: https://docs.gammapy.org/0.14/modeling/index.html
.. _cube: https://docs.gammapy.org/0.14/cube/index.html
.. _ADASS XVIII proceedings: http://www.aspbooks.org/publications/523/357.pdf
.. _Glossary: https://docs.gammapy.org/0.14/references.html#glossary
.. _Jupytext: https://jupytext.readthedocs.io
.. _Documentation Github project: https://github.com/gammapy/gammapy/projects/1
.. _HowTo documentation page: https://docs.gammapy.org/dev/howto.html