# Pipeline Context 

## Overview


**Pipeline Context** holds all pipeline states, consisting of metadata describing the observing project and individual executaon blocks (EB), objects describing the pipeline calibration state, the tree of `Results` class objects summarising the results of each Pipeline stage, and serials of other internal pipeline variables and objects.   

During the initialization of a Pipeline Pocessing Session (PPS), a [Pipeline Context class](https://open-bitbucket.nrao.edu/projects/PIPE/repos/pipeline/browse/pipeline/infrastructure/launcher.py#25) instance is created. The project-level and EB-level metadata is scrapped from on-disk files during during the `importdata` stage, and pipeline processing states are updated at a per-Pipeline-processing-stage basis.


## General Use Cases

Here below we use a realistic minimal live example of `Pipeline Context` instance to illustrate its general internal content and use cases, with a codesnippet near the end of this subsection. We use a variabke name of `ctx` to denote the exampple object.

In general, `context` severes the folowing main purposes:

* as the container/median for metadata of observing project, observing sesions, and indivudla execytation blocks from online system, archive: id, imaging desired goal: they are harvested upon data injecetion from of on-disk MeasurementsSets or ASDM, but mostly read-only eversince, e.g. `ctx.project_summary`.
* as in-memory caches for certain metadata , as python domain objects [`ctx.observing_run`](#observingrun). (so we can avoid revisiting casa/ms msmd tables frequently)
* processing session state tracking / configuration: 
    * `ctx.calimlist`, `ctx.callibrary`
    * casalog, which stage we are in, `ctx.products_dir`
    * context name `ctx.name`
    * which stage we are in, recipes, process job names
    * config and state of pipeline processing job, workflow instruction: e.g., what has been done, which stage in
    * current stage / last stage `ctx.stage`
    * information let as a cross-stage communcation:
* individual stage results, QA
    * results of individual pipeline data processing stages: stats / QA evaluation, etc. `ctx.results`
    * It's a mixture of "metadata" we traditionally called, plus "job-state/results", but practically speaking, it's just that class instance, but underneath meshed with various live objects (class/functions/arrays)... so not well-defined.

* cacheing certain data property from computational intensve tasks to avoid duplciation trasit of the data, `ctx.per_spw_cont_sensitivities_all_chan`, `synthesized_beams`.

* As the use case of inter-stage communication grows (i.e. stage B wants to know certain information from stage A): private attributes are also added in "on-demand" ways as a backdoor channel inside the context... In some instances, large array (e.g. stats / even "images") is attached into context, which also causes troubles (e.g. ~GiB np.arrays get under the Python object and serialized onto disk, see PIPE-1698). We do have a ticket to document the current state of various use / or misuse of context (PIPE-2160); maybe we will see some movement before this year's PL f2f meeting, `ctx.selfcal_targets`; cross-stage for late decion making `ctx.vla_skip_mfs_and_cube_imaging`.

In [82]:
import os, sys
for lib_ver in [os.path.join(wd,'../paris'),'~/Workspace/sync/nrao/gitlab/plutils/examples/scripts/']:
    sys.path.insert(0, os.path.abspath(os.path.expanduser(lib_ver)))
import pd_tools as pdt
os.chdir('/home/rxue/Workspace/nvme/nrao/tickets/PIPE-1669/working')

wd=os.environ.get('PWD')
ctx=pdt.read_context('pipeline-procedure_hifa_calimage.context')
from rich import inspect as rinspect
rinspect(ctx)

2024-10-21 10:58:47 INFO: Reading context: pipeline-procedure_hifa_calimage.context
2024-10-21 10:58:47 INFO: Tracking execution duration for context: pipeline-procedure_hifa_calimage
2024-10-21 10:58:47 INFO: Setting plot level to 'default'
2024-10-21 10:58:47 DEBUG: Found 1 MSes with type DataType.REGCAL_CONTLINE_SCIENCE


While some prperty and attribute ares natively in Python, we highlight some Pipeline class instances as below

### Pipeline IntrvalCalibrary (Link)

In [84]:
rinspect(ctx.callibrary)
rinspect(ctx.callibrary.active)

### Pipeline Calimlist (Link)

In [85]:
inspect(ctx.calimlist,methods=True,private=False,all=True,dunder=False)
print(ctx.calimlist)

<pipeline.infrastructure.imagelibrary.ImageLibrary object at 0x7a1ce690e9e0>


### <a name="observingrun"></a> Pipeline ObservingRun / MeasurementSet


In [91]:

inspect(ctx.observing_run)
inspect(ctx.observing_run.measurement_sets[0])

In [89]:
inspect(ctx.project_summary)
inspect(ctx.project_performance_parameters)

In [66]:
import objgraph, inspect
objgraph.show_refs(ctx, max_depth=3, filter=lambda x: not inspect.isclass(x), highlight=inspect.isclass, filename='ctx_graph.dot')
# objgraph.show_refs(ctx, max_depth=3, filter=None, highlight=None, filename='ctx_graph.dot')

import pydot

#(graph,) = pydot.graph_from_dot_file('ctx_graph.dot')
#graph.write_png('somefile.png')
#help(graph.write_png)
from subprocess import check_call
check_call(['dot','-Gsize=30,5\!', '-Gdpi=200', '-Gratio=compress', '-Tpng','ctx_graph.dot', '-o', 'ctx_graph.png'])

Graph written to ctx_graph.dot (21 nodes)


0

In [None]:
import pydot

#(graph,) = pydot.graph_from_dot_file('ctx_graph.dot')
#graph.write_png('somefile.png')
#help(graph.write_png)
from subprocess import check_call
check_call(['dot','-Gsize=30,5\!', '-Gdpi=200', '-Gratio=compress', '-Tpng','ctx_graph.dot', '-o', 'ctx_graph.png'])





### Pipeline Domain Objects

Pipeline obecjtifiel observing run, ms, etc. (see hehe)

### Pipeine calibrary/imagelibrary

### Pipelien Task Results

### runtime/on-disk

context can be serialize and deserizely (via Python pickle) for various resume: save and resume, debugging, mpiserver prcessing (as a communication change via share storge system as a workaround for certain limitation of openmpi messaing protocol)

For persistence, the Context can be serialized and deserialized, on disk. yes... pickle... so come with the usual quirks: certain objects are not really pickable; 

## Limitation

* since the context class and classes of underneath objects (e.g. domain object classes) are loosely defined and not offocial maintained as public APIs, with the on-going development ad underline is changing, the retreive and cross-branche/ interppperatin breaks from time to time. Beause the context mainly serve as runtime/session transiante objects/datapool, this primiarly only cause small inconvienent during the development process. 

* size: you can't really deserialize back after some time; we also essentially use a pickle file for cross node communication: needs a shared file system; no concurrent write, etc. Overall the current implementation is really for internal use, not really interface/API. this's probably where Jeff's ideas came in: put everything into database; Strip out loosely defined class instances/methods, and only record data... But that will likely mean ful-rewrite / partially rewrite of Pipeline task (since it relies on "context" objects everywhere),  can be large.

## Future 

* However, with the brpading arsynchonise scope and longer processing gaps, it's time to consider split the class / custom under pining: standalize the context as database or API with formalized scheme, split the method.. class-method-attribute-property, -> scheme / metedata / version-code -> asynchrnizing acess from multile instances.

* Interpoleplicate with legacy Pipeline, becauset the exitsnace of historical heurtsics legacy pipeline implemnettion, central,  and uncertainty timeline of RADPS imaging and calibration, transition of existancing pipeline heuristics to directly use the new database concept would be a lengthy and also costly process without signifcant benefits. One possible approach addiping into the database transition first without createing the general context-qauivelenth database scheme and start exloring options and create a thin / like transplayer to facilitate the still-in-use legacy ppeline task comminucation with the to-be-determined database realization as a transition interium solution: this still take advanateges all of the benefits of database concept but without entirely isolated the legacy pipeline from the initial database design.
**or** someone can write an adaptation/translation layer to mick the context object behavior on a live process but at the backside backed by a database...