Add FAQ guidance Re: writing injected outputs

lsst · Apr 26, 2024 · 62856f3 · 62856f3
1 parent d26fd9d
commit 62856f3
Showing 1 changed file with 61 additions and 0 deletions.
diff --git a/doc/lsst.source.injection/faqs.rst b/doc/lsst.source.injection/faqs.rst
@@ -344,6 +344,67 @@ For this reason, it is usually recommended that your input catalog is ingested f
 
 Please also note that it is not currently possible to perform source injection on the command line without first ingesting an input catalog into the data butler.
 
+.. _lsst.source.injection-faqs-write-into-butler:
+
+Is it possible to write injected exposures back into a data butler?
+===================================================================
+
+When working in a command line environment and using ``pipetask run``, injected outputs are automatically written back into the data butler.
+However, when working in a Python/Jupyter notebook environment, calling injection tasks directly and running them will not ingest any outputs back into the data butler.
+
+Whilst it is *possible* to write bespoke injected outputs into the data butler, e.g., to facilitate onward processing downstream, this is not normally recommended.
+Rather, injecting sources into data within a notebook envionment is designed for quick-look analyses alone.
+Datasets ingested into the butler have strict format requirements, without which it would be impossible to guarantee the integrity of a given pipeline.
+
+If you do wish to perform source injection inside a Python/Jupyter notebook environment and store the outputs in the data butler for subsequent processing, it is strongly recommended to make use of the `SimplePipelineExecutor`.
+The `SimplePipelineExecutor` is a lightweight high-level executor for running pipelines in Python.
+The example below demonstrates how to use the `SimplePipelineExecutor` to run a source injection task in a pipeline already constructed (see :ref:`lsst.source.injection-ref-make`) and write the outputs back into the data butler:
+
+    .. code-block:: python
+
+        import os
+        from lsst.ctrl.mpexec import SimplePipelineExecutor
+        from lsst.pipe.base import Pipeline
+
+        user = os.getenv("USER")
+
+        # Create a Butler instance with collections appropriate for processing.
+        butler = SimplePipelineExecutor.prep_butler(
+            REPO,
+            inputs=["HSC/runs/RC2/w_2024_10/DM-43178", "injection/defaults"],
+            output=f"u/{user}/my_injected_outputs",
+        )
+
+        # Load the pipeline from a YAML file.
+        pipeline = Pipeline.fromFile(os.path.join("PATH", "TO", "DRP-RC2+injection.yaml#inject_coadd"))
+
+        # Optionally configure a task.
+        pipeline.addConfigOverride(
+            label="inject_coadd",
+            key="process_all_data_ids",
+            value=True,
+        )
+
+        # Check that the config has been applied.
+        inject_coadd_task = pipeline.to_graph().tasks.get("inject_coadd")
+        print(inject_coadd_task.config.process_all_data_ids)  # returns: True
+
+        # Create an executor; build a QuantumGraph from an in-memory pipeline.
+        executor = SimplePipelineExecutor.from_pipeline(
+            pipeline=pipeline,
+            where="instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=42 AND band='i'",
+            butler=butler,
+        )
+
+        # Run all quanta in the QuantumGraph.
+        quanta = executor.run(register_dataset_types=True)
+        print(f"number of quanta executed: {len(quanta)}")
+
+        # Get the run outputs.
+        dataset_refs = {dtype.name:ref[0] for dtype, ref in quanta[0].outputs.items()}
+        print(f"available dataset types: {list(dataset_refs.keys())}")
+        injected_deepCoadd = butler.get(dataset_refs["injected_deepCoadd"])
+
 .. _lsst.source.injection-faqs-restrictions-guidelines-conventions:
 
 Are there restrictions or guidelines for ingesting source injection images and/or catalogs into the butler?