doc: write section 3 and 4

misho104 · Mar 3, 2019 · 7bb51f1 · 7bb51f1
1 parent 42a031a
commit 7bb51f1
Show file tree

Hide file tree

Showing 7 changed files with 313 additions and 5 deletions.
diff --git a/docs/_templates/latex.tex_t b/docs/_templates/latex.tex_t
@@ -1,4 +1,4 @@
-\documentclass[a4paper,11pt,parskip=half-]{scrartcl}
+\documentclass[a4paper,11pt,DIV=12,parskip=half-]{scrartcl}
 \pdfoutput=1
 \hyphenpenalty=10000
 

diff --git a/docs/_themes/mysphinx.sty b/docs/_themes/mysphinx.sty
@@ -1603,6 +1603,7 @@
 \providecommand\sphinxtableofcontents{\tableofcontents}
 \newenvironment{sphinxtheindex}{\begin{theindex}}{\end{theindex}}
 \providecommand{\sphinxsymbolsname}{}
+\providecommand{\capstart}{}
 
 % index colors
 \protected\def\toreplace#1{#1}

diff --git a/docs/conf.py b/docs/conf.py
@@ -64,6 +64,7 @@
     "sphinx.ext.ifconfig",
     "sphinx.ext.viewcode",
     "sphinx.ext.githubpages",
+    "sphinx.ext.graphviz",
     "docs._themes.conf_bibtex",
     "sphinxcontrib.bibtex",
     "docs._themes.latex_writer",

diff --git a/docs/use_as_package.rst b/docs/use_as_package.rst
@@ -1,4 +1,194 @@
 Usage: as a Python package
 ==========================
 
-To be written...
+.. graphviz::
+   :caption: Conceptual structure of data and classes.
+
+   digraph class {
+     rankdir="LR";
+     {
+       rank=same;
+       File [shape=box];
+       filedata [label="pandas.DataFrame"];
+     };
+     {
+       rank=same;
+       Table [shape=record; label="<1>Table|<2>Table|..."]
+       FileInfo [shape=box]
+       ColumnInfo [shape=none; label=<<table BORDER="0" CELLBORDER="1" CELLSPACING="0"><tr><td>ColumnInfo</td></tr><tr><td>ColumnInfo</td></tr><tr><td>...</td></tr></table>>];
+     }
+     {
+       rank=same;
+       tabledata [label="pandas.DataFrame"]
+       ParameterInfo [shape=record; label="ParameterInfo|ParameterInfo|..."]
+       ValueInfo [shape=record; label="ValueInfo|ValueInfo|..."]
+       CrossSectionAttributes [shape=box]
+     };
+     File->Table [label=".tables (dict)"];
+     File->filedata [label="                        .raw_data"];
+     File->FileInfo [label=".info"]
+     Table:1->CrossSectionAttributes [label=".attributes"]
+     Table:1->tabledata[label="._df"]
+     FileInfo->ValueInfo [label="values (list)"];
+     FileInfo->ParameterInfo [label="parameters (list)"];
+     FileInfo->ColumnInfo:1 [label="columns (list)"];
+   }
+
+Grid-data file and Info file
+----------------------------
+
+The fundamental objects of this package are `File` and `Table` classes, representing the files and the cross-section grid tables, respectively.
+A `File` instance carries two files as paths: `!File.table_path` for grid-data file and `!File.info_path`.
+A grid-data file contains a table representing one or more cross sections.
+The content of a grid-data file is read and parsed by `pandas.read_csv`, which can parse most of table-format files [#dsv]_ with a proper `!reader_options` specified in the "info" file.
+The resulting `pandas.DataFrame` object is stored as-is in `!File.raw_table` for further interpretation.
+
+.. [#dsv] Parsable formats include comma-separated values (CSV), tab-separated values (TSV), and space-separated values (SSV); in addition, fixed-width formatted tables are usually parsable.
+
+A "info" file corresponds `FileInfo` instance and is provided in JSON format :cite:`json`.
+It has data for `ColumnInfo`, `ParameterInfo`, and `ValueInfo` objects in addition to `!reader_options`.
+Those three types of information is used to interpret the `!File.raw_table` data.
+Detailed specification of "info" files are described below.
+
+One grid table has multiple columns, where the name and unit of each column is specified by `ColumnInfo`.
+Some columns are "parameters" for cross sections, such as the mass of relevant particles, which are specified by `ParameterInfo`.
+Other columns are for "values" and `ValueInfo` is used to define the values.
+`ValueInfo` uses one column as a central value, and one or more columns as uncertainties, which can be relative or absolute and symmetric or asymmetric.
+Multiple columns for an uncertainty are combined in quadrature, i.e., :math:`\sigma_1\oplus\sigma_2 := \sqrt{\sigma_1^2 + \sigma_2^2}`.
+
+For each `ValueInfo`, the interpreter constructs one :class:`~pandas.DataFrame` object.
+It is parameterized by :py:class:`~pandas.Index` or :py:class:`~pandas.MultiIndex` and three columns, ``value``, ``unc+``, and ``unc-``, respectively containing the cross-section central value, positive combined absolute uncertainty, and (the absolute values of) negative combined absolute uncertainty.
+The :class:`~pandas.DataFrame` is wrapped by `Table` class and stored in `!File.tables` (:typ:`dict`) with keys being the ``name`` of the value columns.
+
+This is an example of data handling:
+
+.. code-block:: python
+
+   from susy_cross_section import utility
+   from susy_cross_section.table import File, Table
+
+   grid_path, info_path = utility.get_paths("13TeV.n2x1+.wino")
+   file = File(grid_path, info_path)
+
+   xsec_table = file.tables["xsec"]
+
+Here an utility function `get_paths` is used to look-up paths for the key ``13TeV.n2x1+.wino`` and from the passes a `File` instance is constructed.
+Then a table with the column name ``xsec`` is read from the `!tables` dictionary.
+
+Interpolation
+-------------
+
+The table interpolation is handled by `susy_cross_section.interp` subpackage.
+This package first performs axes transformation using `axes_wrapper` module, and then use one of the interpolators defined in `interpolator` module.
+Detail information is available in the API document of each module.
+
+The cross-section data with one mass parameter are usually fit well by a negative power of the mass, i.e., :math:`\sigma(m)\propto m^{-n}`.
+For such cases, interpolating the function by piece-wise lines in log-log axes would work well, which is implemented as
+
+.. code-block:: python
+
+   from susy_cross_section.interp.interpolator import Scipy1dInterpolator
+
+   xs = Scipy1dInterpolator(axes="loglog", kind="linear").interpolate(xsec_table)
+   print(xs(500), xs.fp(500), xs.fm(500), xs.unc_p_at(500), xs.unc_m_at(500))
+
+One can implement more complicated interpolators by extending `AbstractInterpolator`.
+
+A proposal for INFO file format
+-------------------------------
+
+An info file is a JSON file and its data is one dict object.
+The dict has six keys: ``document``, ``attributes`` (optional), ``columns``, ``reader_options`` (optional), ``parameters``, and ``values``.
+
+``document`` as :typ:`dict(str, str)`:
+
+  This dictionary may contain any values and no specification is given, but the content should be used only for documental purposes; i.e., programs should not change their behavior by the content of ``document``.
+  Data for such purposes should be stored not in ``document`` but in ``attributes``.
+
+  Possible keys are: ``title``, ``authors``, ``calculator``, ``source``, and ``version``.
+
+``attributes`` as :typ:`dict(str, str)`:
+
+  This dictionary contains *the default values* for `CrossSectionAttributes`, which is attached to each values.
+  These default values are overridden by the ``attributes`` defined in respective values.
+
+  `CrossSectionAttributes` stores, contrary to ``document``, non-documental information, based on which programs may change their behavior.
+  Therefore the content must be neat and in machine-friendly formats.
+  The proposed keys are: ``processes``, ``collider``, ``ecm``, ``order``, and ``pdf_name``.
+  For details, see the API document of `CrossSectionAttributes`.
+
+``columns`` as a list of :typ:`dict(str, str)`:
+
+  This is a list of dictionaries used to construct `ColumnInfo`; the :m:`n`-th element defines :m:`n`-th column in the grid-data file.
+  The length of this list thus matches the number of the columns.
+  Each dictionary must have two keys: ``name`` and ``unit``, respectively specify the name and unit of the column.
+  The names must be unique in one file.
+  For dimension-less column, ``unit`` is an empty string.
+
+``reader_options`` as :typ:`dict(str, Any)`:
+
+  This dictionary is directly passed to :func:`read_csv` and used as the keyword arguments.
+
+``parameters`` as a list of :typ:`dict(str, Any)`:
+
+  This list defines the parameters for indexing.
+  Each element is a dictionary, which has two keys ``column`` and ``granularity`` and constructs a `ParameterInfo` object.
+  The value for ``column`` is one of the ``name`` of ``columns``.
+  The value for ``granularity`` is a number used to quantize the parameter grid; for details see the API document of `ParameterInfo`.
+
+``values`` as a list of dictionary:
+
+  This list defines the cross-section values.
+  Each element is a dictionary and constructs a `ValueInfo` object.
+  The dictionary has possibly the keys ``column``, ``unc``, ``unc+``, ``unc-``, and ``attributes``.
+  ``column`` is mandatory and its value is one of the ``name`` of ``columns``, where the column is used as the central value of cross-section.
+  ``attributes`` is optional and its value is a :typ:`dict(str, Any)`; it is used to construct a `CrossSectionAttributes` object, overriding the file-wide default values.
+
+  The other three keys are used to specify uncertainties.
+  ``unc`` specifies symmetric uncertainty, while a pair of ``unc+`` and ``unc-`` specifies asymmetric uncertainty; ``unc`` will not be present together with ``unc+`` or ``unc-``.
+  Each value of ``unc``, ``unc+``, and ``unc-`` is *a list of dictionaries*, :typ:`list(dict(str, str))`.
+  Each element of the list, being a dictionary with two keys ``column`` and ``type``, describes one source of uncertainties.
+  The value for ``column`` is one of the ``name`` of ``columns``, where the column is used as the source.
+  The value for ``type`` specifies the type of uncertainty; for details see the API document of `ValueInfo`.
+
+
+How to use own tables
+---------------------
+
+Users may use this package to handle their own cross-section grid tables, once they provide an INFO file.
+The procedure is summarized as follows.
+
+1. Find proper `!reader_options` to read the table.
+
+   This package uses :func:`pandas.read_csv` to read the grid table, for which proper options should be specified.
+   The following script may be useful to find the proper option for your table.
+   Possible keys for `!reader_options` are found in the API document of :func:`pandas.read_csv`.
+
+   .. code-block:: python
+
+      import pandas
+
+      reader_options = {
+          "sep": ";",
+          "skiprows": 1
+      }
+      grid_path = "mydata/table_grid.txt"
+      data_frame = pandas.read_csv(grid_path, **reader_options)
+      print(data_frame)
+
+2. Write the INFO file.
+   One should be careful especially of "type" of uncertainties and "unit" of columns.
+
+3. Verify whether the file is correctly read.
+   :ref:`show sub-command <cmd_show>` is useful for this purpose; for example,
+
+
+   .. code-block:: console
+
+      $ susy-xs show mydata/table_grid.txt mydata/table_grid.info
+
+   After verifying with show sub-command, users can use :ref:`get sub-command <cmd_get>`, or read the data in their code as:
+
+   .. code-block:: python
+
+      my_grid = File("mydata/table_grid.txt", "mydata/table_grid.info")
diff --git a/docs/use_as_script.rst b/docs/use_as_script.rst
@@ -1,4 +1,120 @@
 Usage: as a command-line script
 ===============================
 
-To be written...
+The package provides one script for terminal, ``susy-xs``, which accepts the following flags and sub-commands.
+
+- ``susy-xs --help`` gives a short help and a list of sub-commands,
+- ``susy-xs --version`` returns the package version,
+- ``susy-xs list`` displays a list of available table-grid data files,
+- ``susy-xs show`` shows the information of a specified data file,
+- ``susy-xs get`` obtains a cross section value from a table, with interpolation if necessary.
+
+Details of these sub-commands are explained below, or available from the terminal with ``--help`` flag as, for example, ``susy-xs get --help``.
+
+
+.. _cmd_list:
+
+list
+----
+
+.. code-block:: console
+
+   $ susy-xs list (options) (substr substr ...)
+
+
+This sub-command displays a list of available cross-section tables.
+If `!substr` is specified, only the tables which includes it in the table name or file paths are displayed.
+
+By default, this command lists only the files with pre-defined table keys.
+In addition to these commonly-used table grids, this package contains much more cross-section data.
+One can find these additional files with an option ``--all``.
+
+With ``--full`` option, full paths to the files are displayed, which is useful for additional operations, for example,
+
+.. code-block:: console
+
+   $ susy-xs list --all --full gg 7TeV CTEQ
+   /Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_cteq6.grid
+   /Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_hm_cteq6.grid
+
+   $ susy-xs show /Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_hm_cteq6.grid
+   ------------------------------------------------------------------------
+   TABLE "xsec_lo" (unit: pb)
+   ------------------------------------------------------------------------
+                     value          unc+          unc-
+   ms   mgl
+   200  200   3.400000e+02  1.411437e+02  9.385184e+01
+   ...
+
+.. _cmd_show:
+
+show
+----
+
+.. code-block:: console
+
+   $ susy-xs show (options) table
+
+This sub-command shows data and information of the table specified by `!table`.
+`!table` can be one of pre-defined table keys, which can be displayed by :ref:`list sub-command <cmd_list>`, or a path to grid-data file.
+The displayed information includes grid-tables in the file, physical attributes associated to each of the tables, and the documenting information associated to the file.
+
+A grid-data file is read with an associated "info" file, whose name is by default resolved by replacing the suffix of the data file to ``.info``.
+One can override this default behavior with the ``--info`` option.
+
+.. _cmd_get:
+
+get
+---
+
+.. code-block:: console
+
+   $ susy-xs get (options) table (args ...)
+
+This sub-command gets a cross-section value from the table specified by `!table` and the option ``--name``, where `!args` are used as the physical parameters.
+Without `!args`, this sub-command displays the meanings of `!args` and ``--name`` option, such as
+
+.. code-block:: console
+
+   $ susy-xs get 8TeV.gg
+   Usage: get [OPTIONS] 8TeV.gg MS MGL
+
+   Parameters: MS   [unit: GeV]
+               MGL  [unit: GeV]
+
+   Table-specific options: --name=xsec_lo    [unit: pb]
+                           --name=xsec_nlo   [unit: pb]
+                           --name=xsec       [unit: pb]  (default)
+
+In this case, users are asked to specify the squark mass (which is assumed to be degenerate in this grid) as the first `!args` and gluino mass as the second `!args`, both in GeV.
+It is also shown here that users can get LO and NLO cross sections by using ``-name`` option or otherwise the default ``xsec`` grid is used.
+So, for example, the cross section :math:`\sigma_{8 \mathrm{TeV}}(pp\to\tilde g\tilde g)` with 1 TeV gluino and 1.2 TeV squark can be obtained by
+
+.. code-block:: console
+
+   $ susy-xs get 8TeV.gg 1200 1000
+   (0.0126 +0.0023 -0.0023) pb
+
+Here, the default ``xsec`` grid in the table file ``8TeV.gg`` is used.
+One can check with :ref:`show sub-command <cmd_show>` that this grid is calculated by NLL-fast collaboration at the NLO+NLL order with using MSTW2008nlo68cl as the parton distribution functions (PDFs), and thus this 12.6 fb is the NLO+NLL cross section.
+
+The value is calculated by an interpolation if necessary.
+This sub-command uses linear interpolation with all the parameters and values in logarithmic scale.
+For example, an interpolating function for one-parameter grid table is obtained as piece-wise lines in a log-log plot.
+To use other interpolating methods, users have to use this package by importing it to their Python codes as explained in `Section 4`_.
+For details, confer the API document of `susy_cross_section.interp`.
+
+`!table` can be one of pre-defined table keys, which can be displayed by :ref:`list sub-command <cmd_list>`, or a path to grid-data file.
+A grid-data file is read with an associated "info" file, whose name is by default resolved by replacing the suffix of the data file to ``.info``.
+One can override this default behavior with the ``--info`` option.
+
+Additionally, several options are provided to control the output format, which are found in the ``--help``.
+
+.. caution::
+
+    Theoretically, one can get cross sections for various model point by repeating this sub-command.
+    However, it is not recommended since this sub-command construct an interpolating function every time.
+    For such use-cases, users should use this package as a package, i.e., import this package in their Python codes, as explained in `Section 4`_.
+
+.. _Section 4:
+      use_as_package
diff --git a/susy_cross_section/base/table.py b/susy_cross_section/base/table.py
@@ -143,7 +143,7 @@ def _read_csv(self, path):
         # type: (pathlib.Path)->pandas.DataFrame
         """Read a csv file and return the content.
 
-        Internally, call :meth:`pandas.read_csv()` with `!reader_options`.
+        Internally, call `pandas.read_csv` with `!reader_options`.
         """
         reader_options = {
             "skiprows": [0],

diff --git a/susy_cross_section/interp/__init__.py b/susy_cross_section/interp/__init__.py
@@ -32,7 +32,7 @@
 satisfying :math:`f({\boldsymbol x}_n)=y_n`.
 
 
-Caution
+Warning
 -------
 
 One should distinguish an interpolation :math:`f` from fitting functions. An