<a href="https://colab.research.google.com/github/casangi/ngcasa/blob/master/docs/ngcasa_development.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Development

Radio interferometry data analysis applications and algorithms may be assembled from CNGI and ngCASA building blocks. A user may choose to implement their own analysis scripts, use a pre-packaged task similar to those in current CASA or embed ngCASA and CNGI methods in a production pipeline DAG.


## TBD's

(1)  Are img_datasets expected to contain one image at a time, or entire sets used for imaging ? 
    - Some methods need to see sets of associated images, whereas others need just one image.
    - Some methods change the shape of the image, but it is still associated with the same reconstruction run
    ==> It seems best to restrict an img_dataset to contain only one image array. 
    
    Example : Cube vs MFS have different image definitions. 
    For mtmfs, major cycles are cubes, and minor cycles are MTMFS. 
    So here, should both definitions reside inside the same img_dataset 
    and the deconvolver know to look for some naming convention ?  
    How to do this cleaner ? 
    
(2) How to reconcile the 'stateless function' approach with the need for maintaining iteration control state right through an image reconstruction run ? => The application layer will have to manage return-vals, inp_vals and 'state' ?

(3) When to use parameters to a single method, versus having multiple methods ?  The former allows for cleaner application code. The latter allows for more atomic code structure.  => Decide on a case-by-case basis ? 

(4) Are there rules for what data-array names are allowed in the XDS and Zarr datasets ? Can we implement 'versioning' of arrays simply by picking different names ? E.g. Versions of corrected_data or flags. This is something in the MS V3 that we have needed for a long time.    All the above usage examples assume this is possible. 

(5) Need more examples in CNGI

- Example of how to merge img_datasets that may have different image shapes ?  Join operation for xds
  - Needed for linear_mosaic, merge_models_across_image_fields...
- Example of how to merge vis_Datasets ? Join operation for xds
  - Needed for pipeline use case of doing different operations on different subsets of the data, and then joining the results.  A simple 'regrid' call ? 
  
- MS-Selection replacement examples : How to connect the zarr and xarray metadata to selection data

- ComponentList replacement examples : Just a dictionary that we have to define and decide upon ?

- Reverse operation for time/chan average or rebin/regrid, or topo-lsrk. 
  E.g. vis.timeaverage () followed by 'edit flags' and then write back to original dataset ? FLAG value is to be copied during expansion. 
  E.g. Caltable solutions need an interpolation to get back to the original data resolution. Interpolation+Extrapolation
  E.g. vis.chanaverage() followed by 'edit CORRECTED_DATA' (e.g. uvcontfit+uvsub), and then write back to original dataset. This makes no sense. So, how to prevent this ?


- Visualization ? 
  - Point to available python libs for data visualization...
  - With array 'versioning' implemented as separate arrays... it's easy to plot/explore versions. On-the-fly application of operations (averaging, coord-shifts, cal-apply, model-predict) is also possible simply by not saving to disk before visualization. 
  - It can be inserted between any sequence of steps in any user application script.
  - Will apply to vis and cal datasets, and img datasets too. This will get us scatter and raster displays for all kinds of datasets...
  
      
(6) List of prototypes : Demonstration that ngCASA algorithms and science use cases can be implemented and scale as expected.

Imaging : Weighting, Gridding, Imaging (with mosaic?). 
 ( For imaging, this is enough.)

Flagging : Implement manual flags using native python selections and demonstrate that this scales with realistic flag command sizes. (Autoflags are simpler, structurally). 

Calibration : Implement a gain solve and apply.  Demonstrate a complicated sequence of data preprocessing and reshaping steps in between a series of cal solve/apply steps. 

Visualization : Use OTS python libs to demonstrate interactive plotting of large volumes of data. 

Simulation :  Generate a simulated dataset to mimic a real observed dataset and show that meta-data are accurate. 

Pipeline Usage Modes : Generation of outputs, editing of input datasets, array/flag names/versions, pipeline as a DAG.  String together the above prototypes for a full pipeline demo ! 

## Layers

- Application
- Functional Blocks
- Lower Level

This document will describe the functional block level. This level will be used by:

- CASA Developers
- Pipeline Developers
- Advance users and algorithm developers

Functional block smallest reasonable unit of data reduction work. 



## ngCASA Function Desgin



## Chunking

Zarr Chunking and Dask chunking


## ngCASA Pitfalls

## Loading Data from AWS S3

## Creating Graphs

## Dask Bokeh Dashboard

## More Information

Development environment, process and rules are inherited from CNGI and may be found here:

https://cngi-prototype.readthedocs.io/en/latest/development.html