Reformat docs

lsst · Nov 18, 2019 · 3d69605 · 3d69605
1 parent 1d992fa
commit 3d69605
Show file tree

Hide file tree

Showing 2 changed files with 61 additions and 57 deletions.
diff --git a/python/lsst/pipe/tasks/functors.py b/python/lsst/pipe/tasks/functors.py
@@ -23,13 +23,13 @@ def init_fromDict(initDict, basePath='lsst.pipe.tasks.functors', typeKey='functo
     ----------
     initDict : dictionary
         Dictionary describing object's initialization.  Must contain
-        an entry keyed by `typeKey` that is the name of the object,
-        relative to `basePath`.
+        an entry keyed by ``typeKey`` that is the name of the object,
+        relative to ``basePath``.
     basePath : str
         Path relative to module in which ``initDict[typeKey]`` is defined.
     typeKey : str
-        Key of `initDict` that is the name of the object
-        (relative to ``basePath``).
+        Key of ``initDict`` that is the name of the object
+        (relative to `basePath`).
     """
     initDict = initDict.copy()
     # TO DO: DM-21956 We should be able to define functors outside this module
@@ -44,15 +44,17 @@ def init_fromDict(initDict, basePath='lsst.pipe.tasks.functors', typeKey='functo
 
 
 class Functor(object):
-    """Define and execute a calculation on a deepCoadd_obj ParquetTable
+    """Define and execute a calculation on a ParquetTable
 
-    The `__call__` method accepts a `ParquetTable` object, and returns the result
-    of the calculation as a single column.  Each functor defines what columns are needed
-    for the calculation, and only these columns are read from the ``ParquetTable``.
+    The `__call__` method accepts a `ParquetTable` object, and returns the
+    result of the calculation as a single column.  Each functor defines what
+    columns are needed for the calculation, and only these columns are read
+    from the `ParquetTable`.
 
-    The action of  `__call__` consists of two steps: first, loading the necessary
-    columns from disk into memory as a ``pandas.DataFrame` object; and second, performing
-    the computation on this dataframe and returning the result.
+    The action of  `__call__` consists of two steps: first, loading the
+    necessary columns from disk into memory as a `pandas.DataFrame` object;
+    and second, performing the computation on this dataframe and returning the
+    result.
 
 
     To define a new `Functor`, a subclass must define a `_func` method,
@@ -63,40 +65,39 @@ class Functor(object):
     * `name`: A name appropriate for a figure axis label
     * `shortname`: A name appropriate for use as a dictionary key
 
-    On initialization, a `Functor` should declare what filter (`filt` kwarg) and dataset
-    (e.g. `'ref'`, `'meas'`, `'forced_src'`) it is intended to be applied to.
-    This enables the `_get_cols` method to extract the proper columns from the parquet file.
-    If not specified, the dataset will fall back on the `_defaultDataset` attribute.
-    If filter is not specified and `dataset` is anything other than `'ref'`, then an error
-    will be raised when trying to perform the calculation.
-
-    As currently implemented, `Functor` is only set up to expect a `ParquetTable`
-    of the format of the `deepCoadd_obj` dataset; that is, a `MultilevelParquetTable`
-    with the levels of the column index being `filter`, `dataset`, and `column`.
-    This is defined in the `_columnLevels` attribute, as well as being implicit in
-    the role of the `filt` and `dataset` attributes defined at initialization.
-    In addition, the `_get_cols` method that
-    reads the dataframe from the `ParquetTable` will return a dataframe with column
-    index levels defined by the `_dfLevels` attribute; by default, this is `column`.
+    On initialization, a `Functor` should declare what filter (`filt` kwarg)
+    and dataset (e.g. `'ref'`, `'meas'`, `'forced_src'`) it is intended to be
+    applied to. This enables the `_get_cols` method to extract the proper
+    columns from the parquet file. If not specified, the dataset will fall back
+    on the `_defaultDataset`attribute. If filter is not specified and `dataset`
+    is anything other than `'ref'`, then an error will be raised when trying to
+    perform the calculation.
+
+    As currently implemented, `Functor` is only set up to expect a
+    `ParquetTable` of the format of the `deepCoadd_obj` dataset; that is, a
+    `MultilevelParquetTable` with the levels of the column index being `filter`,
+    `dataset`, and `column`. This is defined in the `_columnLevels` attribute,
+    as well as being implicit in the role of the `filt` and `dataset` attributes
+    defined at initialization.  In addition, the `_get_cols` method that reads
+    the dataframe from the `ParquetTable` will return a dataframe with column
+    index levels defined by the `_dfLevels` attribute; by default, this is
+    `column`.
 
     The `_columnLevels` and `_dfLevels` attributes should generally not need to
     be changed, unless `_func` needs columns from multiple filters or datasets
     to do the calculation.
-    An example of this is the ``lsst.pipe.tasks.functors.Color` functor, for which
-    `_dfLevels = ('filter', 'column')`, and `_func` expects the dataframe it gets to
-    have those levels in the column index.
-
-    While not currently implemented, it would be
-    relatively straightforward to generalize the base `Functor` class to be able to
-    accept arbitrary `ParquetTable` formats (other than that of `deepCoadd_obj`).
+    An example of this is the `lsst.pipe.tasks.functors.Color` functor, for
+    which `_dfLevels = ('filter', 'column')`, and `_func` expects the dataframe
+    it gets to have those levels in the column index.
 
     Parameters
     ----------
     filt : str
         Filter upon which to do the calculation
 
     dataset : str
-        Dataset upon which to do the calculation (e.g., 'ref', 'meas', 'forced_src').
+        Dataset upon which to do the calculation
+        (e.g., 'ref', 'meas', 'forced_src').
 
     """
 
@@ -203,12 +204,13 @@ def shortname(self):
 class CompositeFunctor(Functor):
     """Perform multiple calculations at once on a catalog
 
-    The role of a `CompositeFunctor` is to group together computations from multiple
-    functors.  Instead of returning ``pandas.Series`` a `CompositeFunctor` returns
-    a ``pandas.Dataframe``, with the column names being the keys of `funcDict`.
+    The role of a `CompositeFunctor` is to group together computations from
+    multiple functors.  Instead of returning `pandas.Series` a
+    `CompositeFunctor` returns a `pandas.Dataframe`, with the column names
+    being the keys of `funcDict`.
 
-    The `columns` attribute of a `CompositeFunctor` is the union of all columns in all
-    the component functors.
+    The `columns` attribute of a `CompositeFunctor` is the union of all columns
+    in all the component functors.
 
     A `CompositeFunctor` does not use a `_func` method itself; rather,
     when a `CompositeFunctor` is called, all its columns are loaded
@@ -513,7 +515,8 @@ class Mag(Functor):
     col : `str`
         Name of flux column from which to compute magnitude.  Can be parseable
         by `lsst.pipe.tasks.functors.fluxName` function---that is, you can pass
-        `'modelfit_CModel'` instead of `'modelfit_CModel_instFlux'`) and it will understand.
+        `'modelfit_CModel'` instead of `'modelfit_CModel_instFlux'`) and it will
+        understand.
     calib : `lsst.afw.image.calib.Calib` (optional)
         Object that knows zero point.
     """
@@ -639,7 +642,7 @@ class Color(Functor):
     ----------
     col : str
         Name of flux column from which to compute; same as would be passed to
-        ``lsst.pipe.tasks.functors.Mag``.
+        `lsst.pipe.tasks.functors.Mag`.
 
     filt2, filt1 : str
         Filters from which to compute magnitude difference.
@@ -913,7 +916,8 @@ def getFilterAliasName(row):
 
 
 class Photometry(Functor):
-    AB_FLUX_SCALE = (0 * u.ABmag).to_value(u.nJy)  # AB to NanoJansky (3631 Jansky)
+    # AB to NanoJansky (3631 Jansky)
+    AB_FLUX_SCALE = (0 * u.ABmag).to_value(u.nJy)
     LOG_AB_FLUX_SCALE = 12.56
     FIVE_OVER_2LOG10 = 1.085736204758129569
     # TO DO: DM-21955 Replace hard coded photometic calibration values

diff --git a/python/lsst/pipe/tasks/parquetTable.py b/python/lsst/pipe/tasks/parquetTable.py
@@ -151,20 +151,22 @@ class MultilevelParquetTable(ParquetTable):
     because there is not a convenient way to request specific table subsets
     by level via Parquet through pyarrow, as there is with a `pandas.DataFrame`.
 
-    Additionally, pyarrow stores multilevel index information in a very strange way.
-    Pandas stores it as a tuple, so that one can access a single column from a pandas
-    dataframe as `df[('ref', 'HSC-G', 'coord_ra')]`.  However, for some reason
-    pyarrow saves these indices as "stringified" tuples, such that in order to read this
-    same column from a table written to Parquet, you would have to do the following:
+    Additionally, pyarrow stores multilevel index information in a very strange
+    way. Pandas stores it as a tuple, so that one can access a single column
+    from a pandas dataframe as `df[('ref', 'HSC-G', 'coord_ra')]`.  However, for
+    some reason pyarrow saves these indices as "stringified" tuples, such that
+    in order to read thissame column from a table written to Parquet, you would
+    have to do the following:
 
         pf = pyarrow.ParquetFile(filename)
         df = pf.read(columns=["('ref', 'HSC-G', 'coord_ra')"])
 
-    See also https://github.com/apache/arrow/issues/1771, where I've raised this issue.
-    I don't know if this is a bug or intentional, and it may be addressed in the future.
+    See also https://github.com/apache/arrow/issues/1771, where we've raised
+    this issue.
 
-    As multilevel-indexed dataframes can be very useful to store data like multiple filters'
-    worth of data in the same table, this case deserves a wrapper to enable easier access;
+    As multilevel-indexed dataframes can be very useful to store data like
+    multiple filters' worth of data in the same table, this case deserves a
+    wrapper to enable easier access;
     that's what this object is for.  For example,
 
         parq = MultilevelParquetTable(filename)
@@ -175,8 +177,8 @@ class MultilevelParquetTable(ParquetTable):
 
     will return just the coordinate columns; the equivalent of calling
     `df['meas']['HSC-G'][['coord_ra', 'coord_dec']]` on the total dataframe,
-    but without having to load the whole frame into memory---this reads just those
-    columns from disk.  You can also request a sub-table; e.g.,
+    but without having to load the whole frame into memory---this reads just
+    those columns from disk.  You can also request a sub-table; e.g.,
 
         parq = MultilevelParquetTable(filename)
         columnDict = {'dataset':'meas',
@@ -185,14 +187,12 @@ class MultilevelParquetTable(ParquetTable):
 
     and this will be the equivalent of `df['meas']['HSC-G']` on the total dataframe.
 
-
     Parameters
     ----------
-    filename : str
+    filename : str, optional
         Path to Parquet file.
-
+    dataFrame : dataFrame, optional
     """
-
     def __init__(self, *args, **kwargs):
         super(MultilevelParquetTable, self).__init__(*args, **kwargs)