Skip to content

Commit

Permalink
doc: write section 3 and 4
Browse files Browse the repository at this point in the history
  • Loading branch information
misho104 committed Mar 3, 2019
1 parent 42a031a commit 7bb51f1
Show file tree
Hide file tree
Showing 7 changed files with 313 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/_templates/latex.tex_t
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
\documentclass[a4paper,11pt,parskip=half-]{scrartcl}
\documentclass[a4paper,11pt,DIV=12,parskip=half-]{scrartcl}
\pdfoutput=1
\hyphenpenalty=10000

Expand Down
1 change: 1 addition & 0 deletions docs/_themes/mysphinx.sty
Original file line number Diff line number Diff line change
Expand Up @@ -1603,6 +1603,7 @@
\providecommand\sphinxtableofcontents{\tableofcontents}
\newenvironment{sphinxtheindex}{\begin{theindex}}{\end{theindex}}
\providecommand{\sphinxsymbolsname}{}
\providecommand{\capstart}{}
% index colors
\protected\def\toreplace#1{#1}
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
"sphinx.ext.ifconfig",
"sphinx.ext.viewcode",
"sphinx.ext.githubpages",
"sphinx.ext.graphviz",
"docs._themes.conf_bibtex",
"sphinxcontrib.bibtex",
"docs._themes.latex_writer",
Expand Down
192 changes: 191 additions & 1 deletion docs/use_as_package.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,194 @@
Usage: as a Python package
==========================

To be written...
.. graphviz::
:caption: Conceptual structure of data and classes.

digraph class {
rankdir="LR";
{
rank=same;
File [shape=box];
filedata [label="pandas.DataFrame"];
};
{
rank=same;
Table [shape=record; label="<1>Table|<2>Table|..."]
FileInfo [shape=box]
ColumnInfo [shape=none; label=<<table BORDER="0" CELLBORDER="1" CELLSPACING="0"><tr><td>ColumnInfo</td></tr><tr><td>ColumnInfo</td></tr><tr><td>...</td></tr></table>>];
}
{
rank=same;
tabledata [label="pandas.DataFrame"]
ParameterInfo [shape=record; label="ParameterInfo|ParameterInfo|..."]
ValueInfo [shape=record; label="ValueInfo|ValueInfo|..."]
CrossSectionAttributes [shape=box]
};
File->Table [label=".tables (dict)"];
File->filedata [label=" .raw_data"];
File->FileInfo [label=".info"]
Table:1->CrossSectionAttributes [label=".attributes"]
Table:1->tabledata[label="._df"]
FileInfo->ValueInfo [label="values (list)"];
FileInfo->ParameterInfo [label="parameters (list)"];
FileInfo->ColumnInfo:1 [label="columns (list)"];
}

Grid-data file and Info file
----------------------------

The fundamental objects of this package are `File` and `Table` classes, representing the files and the cross-section grid tables, respectively.
A `File` instance carries two files as paths: `!File.table_path` for grid-data file and `!File.info_path`.
A grid-data file contains a table representing one or more cross sections.
The content of a grid-data file is read and parsed by `pandas.read_csv`, which can parse most of table-format files [#dsv]_ with a proper `!reader_options` specified in the "info" file.
The resulting `pandas.DataFrame` object is stored as-is in `!File.raw_table` for further interpretation.

.. [#dsv] Parsable formats include comma-separated values (CSV), tab-separated values (TSV), and space-separated values (SSV); in addition, fixed-width formatted tables are usually parsable.
A "info" file corresponds `FileInfo` instance and is provided in JSON format :cite:`json`.
It has data for `ColumnInfo`, `ParameterInfo`, and `ValueInfo` objects in addition to `!reader_options`.
Those three types of information is used to interpret the `!File.raw_table` data.
Detailed specification of "info" files are described below.

One grid table has multiple columns, where the name and unit of each column is specified by `ColumnInfo`.
Some columns are "parameters" for cross sections, such as the mass of relevant particles, which are specified by `ParameterInfo`.
Other columns are for "values" and `ValueInfo` is used to define the values.
`ValueInfo` uses one column as a central value, and one or more columns as uncertainties, which can be relative or absolute and symmetric or asymmetric.
Multiple columns for an uncertainty are combined in quadrature, i.e., :math:`\sigma_1\oplus\sigma_2 := \sqrt{\sigma_1^2 + \sigma_2^2}`.

For each `ValueInfo`, the interpreter constructs one :class:`~pandas.DataFrame` object.
It is parameterized by :py:class:`~pandas.Index` or :py:class:`~pandas.MultiIndex` and three columns, ``value``, ``unc+``, and ``unc-``, respectively containing the cross-section central value, positive combined absolute uncertainty, and (the absolute values of) negative combined absolute uncertainty.
The :class:`~pandas.DataFrame` is wrapped by `Table` class and stored in `!File.tables` (:typ:`dict`) with keys being the ``name`` of the value columns.

This is an example of data handling:

.. code-block:: python
from susy_cross_section import utility
from susy_cross_section.table import File, Table
grid_path, info_path = utility.get_paths("13TeV.n2x1+.wino")
file = File(grid_path, info_path)
xsec_table = file.tables["xsec"]
Here an utility function `get_paths` is used to look-up paths for the key ``13TeV.n2x1+.wino`` and from the passes a `File` instance is constructed.
Then a table with the column name ``xsec`` is read from the `!tables` dictionary.

Interpolation
-------------

The table interpolation is handled by `susy_cross_section.interp` subpackage.
This package first performs axes transformation using `axes_wrapper` module, and then use one of the interpolators defined in `interpolator` module.
Detail information is available in the API document of each module.

The cross-section data with one mass parameter are usually fit well by a negative power of the mass, i.e., :math:`\sigma(m)\propto m^{-n}`.
For such cases, interpolating the function by piece-wise lines in log-log axes would work well, which is implemented as

.. code-block:: python
from susy_cross_section.interp.interpolator import Scipy1dInterpolator
xs = Scipy1dInterpolator(axes="loglog", kind="linear").interpolate(xsec_table)
print(xs(500), xs.fp(500), xs.fm(500), xs.unc_p_at(500), xs.unc_m_at(500))
One can implement more complicated interpolators by extending `AbstractInterpolator`.

A proposal for INFO file format
-------------------------------

An info file is a JSON file and its data is one dict object.
The dict has six keys: ``document``, ``attributes`` (optional), ``columns``, ``reader_options`` (optional), ``parameters``, and ``values``.

``document`` as :typ:`dict(str, str)`:

This dictionary may contain any values and no specification is given, but the content should be used only for documental purposes; i.e., programs should not change their behavior by the content of ``document``.
Data for such purposes should be stored not in ``document`` but in ``attributes``.

Possible keys are: ``title``, ``authors``, ``calculator``, ``source``, and ``version``.

``attributes`` as :typ:`dict(str, str)`:

This dictionary contains *the default values* for `CrossSectionAttributes`, which is attached to each values.
These default values are overridden by the ``attributes`` defined in respective values.

`CrossSectionAttributes` stores, contrary to ``document``, non-documental information, based on which programs may change their behavior.
Therefore the content must be neat and in machine-friendly formats.
The proposed keys are: ``processes``, ``collider``, ``ecm``, ``order``, and ``pdf_name``.
For details, see the API document of `CrossSectionAttributes`.

``columns`` as a list of :typ:`dict(str, str)`:

This is a list of dictionaries used to construct `ColumnInfo`; the :m:`n`-th element defines :m:`n`-th column in the grid-data file.
The length of this list thus matches the number of the columns.
Each dictionary must have two keys: ``name`` and ``unit``, respectively specify the name and unit of the column.
The names must be unique in one file.
For dimension-less column, ``unit`` is an empty string.

``reader_options`` as :typ:`dict(str, Any)`:

This dictionary is directly passed to :func:`read_csv` and used as the keyword arguments.

``parameters`` as a list of :typ:`dict(str, Any)`:

This list defines the parameters for indexing.
Each element is a dictionary, which has two keys ``column`` and ``granularity`` and constructs a `ParameterInfo` object.
The value for ``column`` is one of the ``name`` of ``columns``.
The value for ``granularity`` is a number used to quantize the parameter grid; for details see the API document of `ParameterInfo`.

``values`` as a list of dictionary:

This list defines the cross-section values.
Each element is a dictionary and constructs a `ValueInfo` object.
The dictionary has possibly the keys ``column``, ``unc``, ``unc+``, ``unc-``, and ``attributes``.
``column`` is mandatory and its value is one of the ``name`` of ``columns``, where the column is used as the central value of cross-section.
``attributes`` is optional and its value is a :typ:`dict(str, Any)`; it is used to construct a `CrossSectionAttributes` object, overriding the file-wide default values.

The other three keys are used to specify uncertainties.
``unc`` specifies symmetric uncertainty, while a pair of ``unc+`` and ``unc-`` specifies asymmetric uncertainty; ``unc`` will not be present together with ``unc+`` or ``unc-``.
Each value of ``unc``, ``unc+``, and ``unc-`` is *a list of dictionaries*, :typ:`list(dict(str, str))`.
Each element of the list, being a dictionary with two keys ``column`` and ``type``, describes one source of uncertainties.
The value for ``column`` is one of the ``name`` of ``columns``, where the column is used as the source.
The value for ``type`` specifies the type of uncertainty; for details see the API document of `ValueInfo`.


How to use own tables
---------------------

Users may use this package to handle their own cross-section grid tables, once they provide an INFO file.
The procedure is summarized as follows.

1. Find proper `!reader_options` to read the table.

This package uses :func:`pandas.read_csv` to read the grid table, for which proper options should be specified.
The following script may be useful to find the proper option for your table.
Possible keys for `!reader_options` are found in the API document of :func:`pandas.read_csv`.

.. code-block:: python
import pandas
reader_options = {
"sep": ";",
"skiprows": 1
}
grid_path = "mydata/table_grid.txt"
data_frame = pandas.read_csv(grid_path, **reader_options)
print(data_frame)
2. Write the INFO file.
One should be careful especially of "type" of uncertainties and "unit" of columns.

3. Verify whether the file is correctly read.
:ref:`show sub-command <cmd_show>` is useful for this purpose; for example,


.. code-block:: console
$ susy-xs show mydata/table_grid.txt mydata/table_grid.info
After verifying with show sub-command, users can use :ref:`get sub-command <cmd_get>`, or read the data in their code as:

.. code-block:: python
my_grid = File("mydata/table_grid.txt", "mydata/table_grid.info")
118 changes: 117 additions & 1 deletion docs/use_as_script.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,120 @@
Usage: as a command-line script
===============================

To be written...
The package provides one script for terminal, ``susy-xs``, which accepts the following flags and sub-commands.

- ``susy-xs --help`` gives a short help and a list of sub-commands,
- ``susy-xs --version`` returns the package version,
- ``susy-xs list`` displays a list of available table-grid data files,
- ``susy-xs show`` shows the information of a specified data file,
- ``susy-xs get`` obtains a cross section value from a table, with interpolation if necessary.

Details of these sub-commands are explained below, or available from the terminal with ``--help`` flag as, for example, ``susy-xs get --help``.


.. _cmd_list:

list
----

.. code-block:: console
$ susy-xs list (options) (substr substr ...)
This sub-command displays a list of available cross-section tables.
If `!substr` is specified, only the tables which includes it in the table name or file paths are displayed.

By default, this command lists only the files with pre-defined table keys.
In addition to these commonly-used table grids, this package contains much more cross-section data.
One can find these additional files with an option ``--all``.

With ``--full`` option, full paths to the files are displayed, which is useful for additional operations, for example,

.. code-block:: console
$ susy-xs list --all --full gg 7TeV CTEQ
/Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_cteq6.grid
/Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_hm_cteq6.grid
$ susy-xs show /Users/misho/ (...) /data/nllfast/7TeV/gg_nllnlo_hm_cteq6.grid
------------------------------------------------------------------------
TABLE "xsec_lo" (unit: pb)
------------------------------------------------------------------------
value unc+ unc-
ms mgl
200 200 3.400000e+02 1.411437e+02 9.385184e+01
...
.. _cmd_show:

show
----

.. code-block:: console
$ susy-xs show (options) table
This sub-command shows data and information of the table specified by `!table`.
`!table` can be one of pre-defined table keys, which can be displayed by :ref:`list sub-command <cmd_list>`, or a path to grid-data file.
The displayed information includes grid-tables in the file, physical attributes associated to each of the tables, and the documenting information associated to the file.

A grid-data file is read with an associated "info" file, whose name is by default resolved by replacing the suffix of the data file to ``.info``.
One can override this default behavior with the ``--info`` option.

.. _cmd_get:

get
---

.. code-block:: console
$ susy-xs get (options) table (args ...)
This sub-command gets a cross-section value from the table specified by `!table` and the option ``--name``, where `!args` are used as the physical parameters.
Without `!args`, this sub-command displays the meanings of `!args` and ``--name`` option, such as

.. code-block:: console
$ susy-xs get 8TeV.gg
Usage: get [OPTIONS] 8TeV.gg MS MGL
Parameters: MS [unit: GeV]
MGL [unit: GeV]
Table-specific options: --name=xsec_lo [unit: pb]
--name=xsec_nlo [unit: pb]
--name=xsec [unit: pb] (default)
In this case, users are asked to specify the squark mass (which is assumed to be degenerate in this grid) as the first `!args` and gluino mass as the second `!args`, both in GeV.
It is also shown here that users can get LO and NLO cross sections by using ``-name`` option or otherwise the default ``xsec`` grid is used.
So, for example, the cross section :math:`\sigma_{8 \mathrm{TeV}}(pp\to\tilde g\tilde g)` with 1 TeV gluino and 1.2 TeV squark can be obtained by

.. code-block:: console
$ susy-xs get 8TeV.gg 1200 1000
(0.0126 +0.0023 -0.0023) pb
Here, the default ``xsec`` grid in the table file ``8TeV.gg`` is used.
One can check with :ref:`show sub-command <cmd_show>` that this grid is calculated by NLL-fast collaboration at the NLO+NLL order with using MSTW2008nlo68cl as the parton distribution functions (PDFs), and thus this 12.6 fb is the NLO+NLL cross section.

The value is calculated by an interpolation if necessary.
This sub-command uses linear interpolation with all the parameters and values in logarithmic scale.
For example, an interpolating function for one-parameter grid table is obtained as piece-wise lines in a log-log plot.
To use other interpolating methods, users have to use this package by importing it to their Python codes as explained in `Section 4`_.
For details, confer the API document of `susy_cross_section.interp`.

`!table` can be one of pre-defined table keys, which can be displayed by :ref:`list sub-command <cmd_list>`, or a path to grid-data file.
A grid-data file is read with an associated "info" file, whose name is by default resolved by replacing the suffix of the data file to ``.info``.
One can override this default behavior with the ``--info`` option.

Additionally, several options are provided to control the output format, which are found in the ``--help``.

.. caution::

Theoretically, one can get cross sections for various model point by repeating this sub-command.
However, it is not recommended since this sub-command construct an interpolating function every time.
For such use-cases, users should use this package as a package, i.e., import this package in their Python codes, as explained in `Section 4`_.

.. _Section 4:
use_as_package
2 changes: 1 addition & 1 deletion susy_cross_section/base/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def _read_csv(self, path):
# type: (pathlib.Path)->pandas.DataFrame
"""Read a csv file and return the content.
Internally, call :meth:`pandas.read_csv()` with `!reader_options`.
Internally, call `pandas.read_csv` with `!reader_options`.
"""
reader_options = {
"skiprows": [0],
Expand Down
2 changes: 1 addition & 1 deletion susy_cross_section/interp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
satisfying :math:`f({\boldsymbol x}_n)=y_n`.
Caution
Warning
-------
One should distinguish an interpolation :math:`f` from fitting functions. An
Expand Down

0 comments on commit 7bb51f1

Please sign in to comment.