# VLBI-cwl  Tutorial

This tutorial is meant to run the LOFAR VLBI pipeline implemented in CWL. \
Here you can find the LOFAR VLBI documentation https://lofar-vlbi.readthedocs.io/en/latest/before.html \
Here you can find the VLBI-cwl wiki page  https://git.astron.nl/RD/VLBI-cwl/-/wikis/home

## Set up

#### Directories

First of all you need to set up the following directories: 
- INSTALL_DIR = will contain all the necessary LOFAR software;
- INPUT_DIR = will contain the data to be processed, in particular: \
                      - the observation MeasurementSet (MS) data;\
                      - the solution tables generated by LINC pipeline (h5 format file);\
                      - a catalogue containing a suitable delay calibrator in CSV format.
- WORK_DIR = will contain the intermediate files produced during pipeline operation;
- OUTPUT_DIR = will contain the data products produced by the pipeline.

#### Software

The LOFAR software and singularity containers can then be installed as follows:

##### Note: The latest version of VLBI-cwl also required the LINC pipeline to be in the INSTALL_DIR

#### Catalogue

As previously mentioned the pipeline also need a catalogue containing a suitable delay calibrator in CSV format. \
This could be created using the [plot_field.py](https://github.com/jwpetley/lofar_plotting/blob/main/plot_field.py) script.

Here, an example of a script you could use to create the catalogue 

In [None]:
#!/bin/bash -i

export INSTALL_DIR=$HOME/software/VLBI-cwl
export INPUT_DIR=/../input
export WORK_DIR=/../work
export OUTPUT_DIR=/../output

export PATH=${INSTALL_DIR}:$PATH
export PYTHONPATH=${INSTALL_DIR}:$PYTHONPATH

singularity exec \
  --bind ${INSTALL_DIR},${INPUT_DIR},${OUTPUT_DIR},${WORK_DIR} \
  --env PATH="${INSTALL_DIR}/vlbi/scripts:\$PATH" \
  --env PYTHONPATH="${INSTALL_DIR}/vlbi/scripts:\$PYTHONPATH" \
  --env VLBI_DATA_ROOT="${INSTALL_DIR}/vlbi" \
  ${INSTALL_DIR}/vlbi.sif \
  python3 ${INPUT_DIR}/plot_field.py --targRA xxx.xx --targDEC xxx.xx --MS ${INPUT_DIR}/name.MS

save it as **catalogue_generator.sh** to make it executable and run it, use the following commands:

In [None]:
chmod +x catalogue_generator.sh 
./catalogue_generator.sh 

This will create different catalogues:
- lbcs_catalogue.csv - list of all lbcs sources in the field
- lotss_catalogue.csv - list of all lotss sources in the field
- extreme_catalogue.csv - list of bright sources outside field
- image_catalogue.csv - lotss catalogued restricted in flux and radius
- delay_calibrators.csv - list of suitable delay calibrators
  
What we need is the **delay_calibrators.csv** \
This catalogue usually contains different sources, sorted by distance from the field centre. The pipeline, however, will take that in the first row, so it is up to you to choose which calibrator do you prefer and move it on the first line, or remove the others. \
To understand which is a good calibrator, you can check the [Long-baseline calibrator survey](https://lofar-surveys.org/lbcs.html), at the end of the page there is the description of each column and values. Briefly, you want a calibrator with much possible **P** in the **Goodness** column (which means good detection of fringes in an international station) and closer as possible to your target. 

**Note:** if the field is not in LoTSS, the script will copy the entries of the lbcs_catalogue.csv to delay_calibrators.csv, so the source name is under the column "Observation", while the vlbi-pipeline is looking for an entry called "Source_id", so you have to add additional column with the names of the lotts_catalogue.csv file and add a random name and values (all 1), except for the coordinates, which must be those of the calibrator. (Modify the column name from "Observation" to "Source_id" doesn't work well, as it takes the name with each letter separated by a coma and the files are not recognised later in the pipeline).

## Running the pipeline

The vlbi pipeline consists of two main workflows:

### Delay calibration

The delay calibration pipeline does the following:
- applies LINC solutions to target data,
- reduces the total dataset,
- creates a MeasurementSet with data phase-shifted to a given delay calibrator,
- performs the direction independent calibration of the international antenna array based in this phase-shifted data.

#### Inputs:

**msin** \
type: Directory[] \
doc: The raw input data in a MeasurementSet version 2.0 format.

**solset**\
type: File\
doc: The solution tables generated by the LINC target pipeline in an HDF5 format.

**delay_calibrator**\
type: File\
doc: A delay calibrator catalogue in CSV format.

**ddf_solset**\
type: File?\
doc: The solution tables generated by the DDF pipeline in an HDF5 format.

**filter_baselines**\
type: string?\
default: "*&"\
doc: The default filter constraints for the dp3_prep_target step. Usage instructions can be found on the lofar-vlbi documentation.

**flag_baselines**\
type: string?\
default: "[]"\
doc: The pattern used by DP3 to flag baselines, eg "[ CS013HBA*&&* ]". Usage instructions can be found on the lofar-vlbi documentation.

**phasesol**\
type: string?\
default: TGSSphase\
doc: The name of the target solution table to use from solset.

**configfile**\
type: File\
doc: Settings for the delay calibration in delay_solve.

**selfcal**\
type: Directory\
doc: Path of external calibration scripts.

**h5merger**\
type: Directory\
doc: External LOFAR helper scripts for merging h5 files.

**reference_stationSB**\
type: int?\
default: 104\
doc: Subbands are concatenated in concatenate-flag relative to this station subband.

**number_cores**\
type: int?\
default: 12\
doc: Number of cores to use per job for tasks with high I/O or memory.

**max_dp3_threads**\
type: int?\
default: 5\
doc: The number of threads per DP3 process.

NOTE: Input parameters of which the type is marked with a ? are optional and can be adjusted according to user-specific needs.

To generate the input file for the pipeline, you can use the **generate_input.sh** script, located in the *vlbi/script* directory and you can run it in that way:

In [None]:
bash ${INSTALL_DIR}/vlbi/scripts/generate_input.sh ${INPUT_DIR} ${INSTALL_DIR}

It will create the **input.yaml** file in the current working directory, with indication of the MS files, delay calibration catalogue and the location of the needed softwares.
Optional input parameters, such as flagged baselines or solutions generated by the Direction-Dependent Faceting (DDF) pipeline should be added to this file by hand.

**Note:** you should check the **summary.log** file produced by LINC to check if some subbands and baselines have been flagged. \
If some subbands have been flagged, you have to remove the corresponding MS files from the **input.yaml** file.

#### How to run it

That's an example code to run the delay calibration workflow inside the singularity. It could be put in an executable file.

In [None]:
#!/bin/bash -i

export INSTALL_DIR=$HOME/software/VLBI-cwl
export INPUT_DIR=/data/nbiava/MS0735/data_short/MS0735
export WORK_DIR=/data/nbiava/MS0735/vlbi-cwl/work
export OUTPUT_DIR=/data/nbiava/MS0735/vlbi-cwl/results
export LOG_DIR=/data/nbiava/MS0735/vlbi-cwl/log
export VLBI_DATA_ROOT=${INSTALL_DIR}/vlbi

export PATH=${INSTALL_DIR}:$PATH
export PYTHONPATH=${INSTALL_DIR}:$PYTHONPATH
export PATH=${INSTALL_DIR}/vlbi/scripts:$PATH
export PYTHONPATH=${INSTALL_DIR}/vlbi/scripts:$PYTHONPATH

singularity exec \
  --bind ${INSTALL_DIR},${INPUT_DIR},${OUTPUT_DIR},${WORK_DIR},${LOG_DIR} \
  --env PATH="${INSTALL_DIR}/vlbi/scripts:\$PATH" \
  --env PYTHONPATH="${INSTALL_DIR}/vlbi/scripts:\$PYTHONPATH" \
  --env VLBI_DATA_ROOT="${INSTALL_DIR}/vlbi" \
  ${INSTALL_DIR}/vlbi.sif \
  cwltool \
  --no-container \
  --preserve-entire-environment \
  --parallel \
  --timestamps \
  --outdir ${OUTPUT_DIR} \
  --tmpdir-prefix ${WORK_DIR}/ \
  --log-dir ${LOG_DIR} \
  --leave-tmpdir \
  --debug \
  ${INSTALL_DIR}/vlbi/workflows/delay-calibration.cwl \
  input.yaml


**Note:** check if the last bug was fixed in your pipeline version:\
open the **vlbi/steps/dp3_make_parset.cwl** file, in line 56 the **filter** should be the first step like this: 

### Split directions

The split directions pipeline is a calibration and imaging pipeline that is performed on data that has been processed in the delay calibration pipeline. It does the following:
- It splits MeasurementSet formatted data into various target directions.
- It applies delay calibrator solutions.
- It (optionally) performs self-calibration on the target directions.


#### Inputs:

**msin** \
type: Directory[] \
doc: The input MS. This should have coverage of the target directions.

**delay_solset** \
type: File \
doc: The solution tables generated by the VLBI delay calibration workflow in an HDF5 format.

**image_cat** \
type: File \
default: lotss_catalogue.csv \
doc: The image catalogue (in CSV format) containing the target directions.

**max_dp3_threads** \
type: int? \
default: 4 \
doc: Number of cores to use per job for tasks with high I/O or memory.

**numbands**
type: int? \
default: -1 \
doc: The number of bands to group. -1 means all bands.

**truncateLastSBs** \
type : boolean? \
default: true \
doc: Whether to truncate the last subbands of the MSs to the same length.

**do_selfcal** \
type: boolean? \
default: false \
doc: Whether to do selfcal on the direction concat MSs.

**configfile** \
type: File \
doc: The configuration file to be used to run facetselfcal.py during the target_solve step.

**h5merger** \
type: Directory \
doc: The h5merger directory.

**selfcal** \
type: Directory \
doc: The selfcal directory.

NOTE: Input parameters of which the type is marked with a ? are optional and can be adjusted according to user-specific needs.

You should modify the **input.yaml** file of the previous pipeline: 
- change the *msin* paths to the ms files produced by the Delay_calibration pipeline **output/out_*MHz_uv.dp3-concat**
- change the *solset* input with **delay_solset**, putting to delay calibration pipeline h5 file solutions 
- change the *delay_calibrator* input with **image_cat**, putting to the catalogue of source(s) you wish to image. For default the pipeline uses the *lotss_catalogue.csv* produced by plot_field.py script, but you can create your own file, it requires at least the **Source_id**, **RA** and **DEC** parameters, for the others you could try to put random 1 values. 
- change the *configfile* name to the **target_selfcal_config.txt**

Create new **work**, **log** and **output** directories, then run the pipeline as before, indicating the new input file and the new worflow, as below: