# Analysis Tools Examples

Introduction to generating science performance diagnostic plots and metrics with the `analysis_tools` package using a small test dataset from HSC.

Contact author: TBD

Last verified to run: 2022-07-28

LSST Science Piplines version: Weekly 2022_31

Container size: medium

**Targeted learning level:** intermediate

**Description:** An introduction to generating science performance diagnostic plots and metrics using the `analysis_tools` package.

**Skills:** 
* Generate a science performance diagnostic plot and corresponding metric values interactively in a notebook and as part of a pipeline (simple pipeline executor). 
* Adjust the configuration used to produce these diagnostics. 
* Retrieve persisted plots and metrics with the Bulter. 
* Reconstitute input data products that were used to create plots and metrics for further investigation.

## Preliminaries

In [None]:
# Basic imports
import numpy as np

import matplotlib
import matplotlib.pyplot as plt

# Interactive plots in notebook
%matplotlib widget

### Getting set up at USDF

* Point to documentation

### Processing rc2_subset

* Point to documentation about the rc2_dataset and processing instructions
* Might need to point to a shell script

### Setting up the analysis_tools package

Check the version of the stack you are using

In [None]:
!eups list -s | grep lsst_distrib

The `analysis_tools` package was added to `lsst_distrib` in August 2022, and accordingly, if you have set up the LSST Stack version `w_2022_32` or later, then you should be able to import `analysis_tools` directly in the notebook.

In [None]:
import lsst.analysis.tools
print(lsst.analysis.tools.__file__)

If you are doing development on the `analysis_tools` package and want to test in a notebook, follow the guidance [here](https://nb.lsst.io/science-pipelines/development-tutorial.html). Brief version below (for work on the RSP at USDF):

1. In the termal, clone the [analysis_tools](https://github.com/lsst/analysis_tools) repo and set up the package

```
source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib

# Choose file location for your repo
cd ~/repos/
git clone https://github.com/lsst/analysis_tools.git
cd analysis_tools
setup -k -r .
scons
```

2. Add the following line to `~/notebooks/.user_setups`

```
setup -k -r ~/repos/analysis_tools
```

Your local version of `analysis_tools` should now be accessible in a notebook.

## Generating consistent metric values and visualizations

### Load data for testing

In [None]:
from lsst.daf.butler import Butler

repo = "/project/sandbox/bechtol/rc2_subset/SMALL_HSC"
collection = "u/bechtol/step3"
#repo = "/project/sandbox/jcarlin/repos/rc2_subset/SMALL_HSC"
#collection = "u/jcarlin/step4"
butler = Butler(repo, collections=[collection])
registry = butler.registry

In [None]:
for d in sorted(registry.queryDatasetTypes()): print(d.name)

In [None]:
#sorted(registry.queryDatasets("objectTable"))
sorted(registry.queryDatasets("objectTable_tract"))

In [None]:
dataId = {"tract": 9813, "instrument": "hsc"}
objectTable = butler.get("objectTable_tract", dataId=dataId)
objectTable

In [None]:
objectTable.columns.values

### Generate a metric

* Instantiate a butler, load some data
* Pass loaded data to an analysis_tool to generate a metric
* Change some configuration and generate the metric again

In [None]:
from lsst.analysis.tools.analysisMetrics import ShapeSizeFractionalMetric
from lsst.analysis.tools.tasks.base import _StandinPlotInfo

In [None]:
metric = ShapeSizeFractionalMetric()

In [None]:
results = metric(objectTable, band='i')

In [None]:
results

### Generate a plot

* Use same data from example above
* Pass data to an analysis_tool to generate a plot and visualize in notebook
* Confirm that displayed values are consistent

In [None]:
from lsst.analysis.tools.analysisPlots import ShapeSizeFractionalDiffScatterPlot

In [None]:
plot = ShapeSizeFractionalDiffScatterPlot()
# set some configs, we will go into this later
plot.produce.addSummaryPlot = False

In [None]:
# This throws an error I have not had a chance to look into yet
# later keyword arguments will not be required going forward
results = plot(objectTable, band='i', skymap=None, plotInfo=_StandinPlotInfo())

## How it works 

### Terminology
* ConfigurableAction - generic interface for function like objects (actions) that have state which can be set during configuration
* AnalysisAction - A ConfigurableAction subclass that is specialized for actions that function in analysis contexts
* AnalysisTool - A top level "container" of multiple AnalysisActions which performs one type of analysis

Below we dive into the later two in more detail

### Using AnalysisActions

* These are the atomic bits of analysis_tools; They can be combined together to make more complex actions, or used as part of an AnalysisTool
* Show some examples of using configurable actions like standalone functions. This is intended to provide users with more intution about how configurable actions work.
* Examples with KeyedDataActions, VectorActions (including selectors), and ScalarActions
* Show examples of configuration

In [None]:
from lsst.analysis.tools.actions.vector import CalcShapeSize, MagColumnNanoJansky

In [None]:
size = CalcShapeSize()(objectTable, band='i')
mag = MagColumnNanoJansky()

mag = MagColumnNanoJansky(vectorKey='{band}_psfFlux')(objectTable, band='i')
mag = MagColumnNanoJansky(vectorKey='i_psfFlux')(objectTable)

In [None]:
plt.figure()
plt.scatter(mag, size, s=1)
plt.xlim(17.5, 30.)
plt.ylim(0, 5)

In [None]:
from lsst.analysis.tools.actions.vector import StarSelector, DownselectVector

In [None]:
star_selection = StarSelector()(objectTable, band='i')

In [None]:
assert len(star_selection) == len(mag)

In [None]:
plt.figure()
plt.scatter(mag[star_selection], c.values[star_selection], s=1)
plt.xlim(17.5, 30.)
plt.ylim(0, 5)

In [None]:
# Compose a more advanced example
#DownselectVector(vectorKey='{band}_psfFLux', selector=StarSelector())(objectTable, band='i')

In [None]:
from lsst.analysis.tools.actions.keyedData import KeyedDataSelectorAction

### Three conceptual steps in an `AnalysisTool`: prep, process, produce

* Walk through the three stages of running an analysis tool in sequential lines of code, passing the output of one step as input to the next step
* Examine intermediate results

## Workflow examples

### Running analysis_tools as part of a pipeline

* **All examples in this notebook should use the simple pipeline executor** (here is how you do it in a notebook)
* We have a task for each data product. A pipeline can run multiple AnalysisTools that each produce a set of plots or set of metrics
* Discuss an example yaml pipeline file (load the yaml)
* Provide the command to run the pipeline
* Show how to configure the pipeline, e.g., turning on or off different metrics and plots or changing other parameters

## Pipeline
```
description: |
  Tier1 plots and metrics to assess coadd quality
tasks:
  analyzeObjectTableCore:
    class: lsst.analysis.tools.tasks.ObjectTableTractAnalysisTask
    config:
      connections.outputName: objectTableCore
      plots.shapeSizeFractionalDiffScatter: ShapeSizeFractionalDiffScatterPlot
      metrics.shapeSizeFractionalMetric: ShapeSizeFractionalMetric
      plots.e1DiffScatter: E1DiffScatterPlot
      metrics.e1DiffScatterMetric: E1DiffMetric
      plots.e2DiffScatter: E2DiffScatterPlot
      metrics.e2DiffScatterMetric: E2DiffMetric
      metrics.skyFluxStatisticMetric: SkyFluxStatisticMetric
      metrics.skyFluxStatisticMetric.applyContext: CoaddContext
      python: |
        from lsst.analysis.tools.analysisPlots import *
        from lsst.analysis.tools.analysisMetrics import *
        from lsst.analysis.tools.contexts import *
  catalogMatchTract:
    class: lsst.analysis.tools.tasks.catalogMatch.CatalogMatchTask
    config:
      bands: ['u', 'g', 'r', 'i', 'z', 'y']
  refCatObjectTract:
    class: lsst.analysis.tools.tasks.refCatObjectAnalysis.RefCatObjectAnalysisTask
    config:
      bands: ['u', 'g', 'r', 'i', 'z', 'y']
      ```

In [None]:
from lsst.ctrl.mpexec import SimplePipelineExecutor
from lsst.pipe.base import Pipeline

# set up an output collection with your username
outputCollection = "u/nate2/analysisToolsExample"

# this can be skipped if you already have a read writable butler setup
butlerRW = SimplePipelineExecutor.prep_butler(repo, inputs=[collection], output=outputCollection)

# load in the pipeline to run
pipeline = Pipeline.from_uri("$ANALYSIS_TOOLS_DIR/pipelines/coaddQualityCore.yaml")

# override a configuration within a certain AnalysisTool
configKey = "plots.shapeSizeFractionalDiffScatter.prep.selectors.snSelector.threshold"
pipeline.addConfigOverride("analizeObjectTableCore", configKey, 400)

# restrict processing to the same dataId used above
whereString = "tract=9813 AND instrument='hsc'"
executor = SimplePipelineExecutor.from_pipeline(pipeline, where=whereString, butler=butler)
quanta = spe.run(True)

### Inspect the results

* Inspect metrics and plots persisted in butler, display results

### Reconstitute the inputs to an analysis_tool

* Provide a few examples, including an example with callback from photometric repeatability
* Inspect results

## Make a custom analysis tool

* Import python file in the same directory that inherits from analysis_tools to define a custom analysis tool
* Add to a custom pipeline and run