# processMeerKAT Jupyter Notebook Tutorial

processMeerKAT implements a CASA based wide-band full Stokes calibration pipeline (in the linear basis). Broadly, the pipeline aims to “do the right thing” and by keeping the steps as general as possible we believe that there should be no need for fine tuning in order to obtain a well calibrated dataset.

This Jupyter Notebook Tutorial is based on the [processMeerKAT](https://idia-pipelines.github.io/docs/processMeerKAT) documentation for the IDIA processMeerKAT Pipeline.

In [1]:
casa['build']['version']

'5.5.0-149  '

In [2]:
from __future__ import print_function
import sys
import os
import subprocess

home = !pwd

## singularity data path:
#data_path = os.path.join(home[0],'data/')

## singularity pipeline path:
pipeline_path = os.path.join(home[0],'pipelines/processMeerKAT/')
functions_path = os.path.join(home[0],'pipelines/processMeerKAT/cal_scripts/')

## singularity module paths:
sys.path.append(pipeline_path)
sys.path.append(functions_path)


In [3]:
### View config file

f = open('myconfig.txt', 'r')
file_contents = f.read()
print(file_contents)
f.close()

[crosscal]
minbaselines = 4                  # Minimum number of baselines to use while calibrating
specavg = 1                       # Number of channels to average after calibration (during split)
timeavg = '8s'                    # Time interval to average after calibration (during split)
keepmms = True                    # Output MMS (True) or MS (False) during split
spw = '0:860~1700MHz'             # Spectral window / frequencies to extract for MMS
calcrefant = True                 # Calculate reference antenna in program (overwrites 'refant')
refant = 'm005'                   # Reference antenna name / number
standard = 'Perley-Butler 2010'   # Flux density standard for setjy
badants = []                      # List of bad antenna numbers (to flag)
badfreqranges = [ '935~947MHz',   # List of bad frequency ranges (to flag)
	'1160~1310MHz',
	'1476~1611MHz',
	'1670~1700MHz']

[slurm]
nodes = 6
ntasks_per_node = 4
plane = 2
mem = 236
partition = 'Main'
time = '12:00:00'
submit = Fal

## Input validation 

This script performs a few basic validity checks, on the default config file, and on the input MS. the existence of the input MS, and the data types of the inputs specified in the config file are all verified before the pipeline continues to the next steps. If reference antenna calculation is not requested, a simple check is performed to verify that the input reference antenna exists in the MS. Otherwise, the following paragraph describes the details of reference antenna calculation.

In [4]:
run pipelines/processMeerKAT/cal_scripts/validate_input.py --config myconfig.txt

2019-08-07 10:54:14,012 INFO: This is version 1.0 of the pipeline


# Reference antenna calculation 

If the `calcrefant` parameter in the config file is set to `True`, then this script is executed. The algorithm works by calculating the median and standard deviation over all the visibility amplitudes for a given antenna, and iterates over every antenna in the array. Any outlier antennas, in the top 2 and bottom 5 percentile of this distribution are then flagged. The reference antenna is selected to be the un-flagged antenna with the smallest visibility rms.

In [5]:
run pipelines/processMeerKAT/cal_scripts/calc_refant.py --config myconfig.txt

2019-08-07 10:54:14,587 INFO: Flux field scan no: 1
2019-08-07 10:54:14,595 INFO: Antenna statistics on total flux calibrator
2019-08-07 10:54:14,596 INFO: (flux in Jy averaged over scans & channels, and over all of each antenna's baselines)
2019-08-07 10:54:14,597 INFO: ant median rms 
2019-08-07 10:54:48,526 INFO: All 2.25  17.55
2019-08-07 10:54:48,533 INFO: 19  1.57  7.21 (best antenna)
2019-08-07 10:54:48,535 INFO: 1   1.74  8.36 (1st good antenna)
2019-08-07 10:54:48,536 INFO: setting reference antenna to: 19
2019-08-07 10:54:48,538 INFO: Bad antennas: [0, 10, 27, 34, 37, 57, 58, 59, 61, 62]


# Data partition

The input measurement set (MS) is partitioned into a [multi-measurement set (MMS)](https://casa.nrao.edu/casadocs/casa-5.4.1/uv-manipulation/data-partition) using the CASA task `partition`. This task splits up the main MS into smaller SUBMSs that are individual units of a larger logical MMS. The number of SUBMSs created are equal to the number of scans in the input MS. Partitioning the data in this manner allows for more efficient use of computation while using MPI, since each SUBMS can be independently operated on by different MPI workers.

In [6]:
run pipelines/processMeerKAT/cal_scripts/partition.py --config myconfig.txt

# Flagging (round 1) 

The first of two rounds of pre-calibration flagging. If `badfreqranges` and `badants` are specified in the config file, they are flagged. These lists are also allowed to be empty. Further, any autocorrelations are also flagged using `mode='manual'` and `autocorr=True` in the flagdata parameters. 

Subsequently, `flagdata` is called on the calibrators and target sources with conservative limits to clip out the worst RFI (the data are clipped at the level of 50 Jy). It also makes a single call to `tfcrop` to flag data at a 6 $\sigma$ limit. `tfcrop` in this case is preferred, since the as yet uncalibrated bandpass shape should be taken care of by fitting a piecewise polynomial across the band.

In [7]:
run pipelines/processMeerKAT/cal_scripts/flag_round_1.py --config myconfig.txt

# setjy 

The `setjy` task is run on the specified primary calibrators - this step is run once each before the first and second rounds of calibration.

By default, the ‘Perley-Butler 2010’ flux scale is used, since it is the only one which contains the popular southern calibrator PKS B1934-638. In case the calibrator J0408-6545 is present in the data, it is preferred. A broadband Stokes I model for J0408-6545 is used, via the `manual` mode of `setjy`.

In [8]:
run pipelines/processMeerKAT/cal_scripts/setjy.py --config myconfig.txt

# Parallel hand calibration 

Standard delay, bandpass and gain calibration is run on the data, in order to obtain better statistics for a second round of flagging.

In [9]:
run pipelines/processMeerKAT/cal_scripts/xx_yy_solve.py --config myconfig.txt

2019-08-07 12:00:37,307 INFO:  starting antenna-based delay (kcorr)
 -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.kcal
2019-08-07 12:02:21,277 INFO:  starting bandpass -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.bcal
2019-08-07 12:02:53,597 INFO:  starting gain calibration
 -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.gcal


In [10]:
run pipelines/processMeerKAT/cal_scripts/xx_yy_apply.py --config myconfig.txt

2019-08-07 12:05:11,566 INFO:  applying calibration -> primary calibrator


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

2019-08-07 12:05:38,549 INFO:  applying calibration -> secondary calibrator


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

2019-08-07 12:07:14,529 INFO:  applying calibration -> target calibrator


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

# Flagging (round 2) 

Similar to the first round, the `tfcrop` algorithm is run independently on the primary and secondary calibrator and the target(s). The thresholds are lower than the first round as the algorithm is now operating on calibrated data.

In [11]:
run pipelines/processMeerKAT/cal_scripts/flag_round_2.py --config myconfig.txt

In [12]:
run pipelines/processMeerKAT/cal_scripts/setjy.py --config myconfig.txt

# Cross hand calibration 

The full Stokes calibration procedure is done across as much of the SPW as is requested in the config file. In the default case, the entire SPW (spanning ~ 800 MHz) is calibrated across. The caveat here is that CASA does not support a true wideband, full polarization calibration. For example the Stokes Q and U values of a source with non-zero RM across the band will not be correctly accounted for. The assumption CASA makes is that the bandwidth is split into several smaller SPWs (such as is the case of VLA or ALMA) and that the Stokes parameters within each SPW can be assumed to be a constant. We have identified work-arounds to this, and will be implementing the fix in upcoming versions of the pipeline.

The cross-hand calibration performs the following steps:
* Delay calibration (the K term), time averaged, parallel hand
* Bandpass calibration (the B term), time averaged, parallel hand
* Cross hand delay calibration (the KCROSS term), time averaged, cross hand

After the cross-hand delay calibration is performed, we iterate over calculating the time dependent gains. Initially the time-dependent gains are calculated for the primary and secondary calibrators, as a function of time and parallactic angle. The polarization properties of the secondary are assumed to be unknown, and are determined from the gain variation as a function of parallactic angle. This is fit for by the `qufromgain` task, which is contained in `almapolhelpers` and can be accessed in CASA by

```
from almapolhelpers import *
```

This imports several helper tasks that are meant to solve ALMA polarization, but are general enough to work with any telescope that has linear feeds.

Once the fractional Q and U values are determined for the phase calibrator, the gain solutions are recomputed with the fractional polarization as an input, in theory resulting in more accurate gain solutions. This is followed by a call to `xyamb`, also within `almapolhelpers` that breaks the ambiguity in the X-Y phases for the solutions generated by `qufromgain`. These can be cross-checked with the solutions obtained by running `gaincal` with `gaintype='XYf+QU'` which solves for the X-Y phase as a function of frequency, assuming an unknown source Q, U value. Finally we run `polcal` in the `Dflls` mode in order to calculate the polarization leakage (the D term) as a function of frequency (f), using a linear least squares algorithm (lls). Finally, we bootstrap the fluxes from the primary to the secondary using `fluxscale`.

In [13]:
run pipelines/processMeerKAT/cal_scripts/xy_yx_solve.py --config myconfig.txt

2019-08-07 13:44:46,377 INFO:  starting antenna-based delay (kcorr)
 -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.kcal
2019-08-07 13:46:31,880 INFO:  starting bandpass -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.bcal
2019-08-07 13:47:03,272 INFO:  starting cross hand delay -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.xdel
2019-08-07 13:49:10,547 INFO:  starting gaincal -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.mms.g1cal
2019-08-07 13:52:16,610 INFO: 
 Solve for Q, U from initial gain solution
2019-08-07 13:52:16,748 INFO: (-0.012887761252330349, -0.031960989042421978)
2019-08-07 13:52:16,750 INFO: 
 Starting x-y phase calibration
 -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.mms.xyambcal


Latitude =  -30.7124007766
Found as many as 4 fields.
Can't discern an ALMA bandname from: none
Found as many as 1 spws.
Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
Fld= 0 Spw= 0 Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
(B=none, PA offset=0.0deg) Gx/Gy= 1.00766914578 Q= 0.00511203916839 U= -0.00186378425788 P= 0.00544119804998 X= -10.0156051604
For field id =  0  there are  1 good spws.
Spw mean: Fld= 0 Q= 0.00511203916839 U= -0.00186378425788 (rms= 0.0 0.0 ) P= 0.00544119804998 X= -10.0156051604
Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
Fld= 2 Spw= 0 Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
(B=none, PA offset=0.0deg) Gx/Gy= 0.998190111509 Q= -0.0128877612523 U= -0.0319609890424 P= 0.0344615613498 X= -55.9804868374
For field id =  2  there are  1 goo

2019-08-07 13:54:51,488 INFO: 
 Check for x-y phase ambiguity.
2019-08-07 13:54:51,573 INFO: Model for polarization calibrator S = [1.0
2019-08-07 13:54:51,575 INFO: Fractional polarization = 0.03446


Expected QU =  (-0.012887761252330349, -0.031960989042421978)
Spw = 0: Found QU = [ 0.01388603  0.03897931]
   ...CONVERTING X-Y phase from -13.0793966761 to 166.920603324 deg
Ambiguity resolved (spw mean): Q= -0.0138860289007 U= -0.0389793068171 (rms= 0.0 0.0 ) P= 0.0413788370858 X= -54.8039929247
Returning the following Stokes vector: [1.0, -0.013886028900742531, -0.038979306817054749, 0.0]


2019-08-07 13:55:24,769 INFO: 
 solution for secondary with parang = true
2019-08-07 13:58:01,459 INFO: 
 now re-solve for Q,U from the new gainfile
 -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.gcal
2019-08-07 13:58:01,592 INFO: (-0.0015349984966918217, -0.0056420298192221725)
2019-08-07 13:58:01,593 INFO: starting 'Dflls' polcal -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.pcal


Latitude =  -30.7124007766
Found as many as 4 fields.
Can't discern an ALMA bandname from: none
Found as many as 1 spws.
Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
Fld= 0 Spw= 0 Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
(B=none, PA offset=0.0deg) Gx/Gy= 1.00766914578 Q= 0.00511203916839 U= -0.00186378425788 P= 0.00544119804998 X= -10.0156051604
For field id =  0  there are  1 good spws.
Spw mean: Fld= 0 Q= 0.00511203916839 U= -0.00186378425788 (rms= 0.0 0.0 ) P= 0.00544119804998 X= -10.0156051604
Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
Fld= 2 Spw= 0 Can't discern an ALMA bandname from: none
Unresolved bandname: default band position angle set to 0.0
(B=none, PA offset=0.0deg) Gx/Gy= 0.996432925595 Q= -0.00153499849669 U= -0.00564202981922 P= 0.00584711218174 X= -52.609895528
For field id =  2  there are  1 g

2019-08-07 14:00:56,948 INFO:  starting fluxscale -> /scratch/mightee/MeerKAT-IRIS/caltables/XMMLSS12_1539286252_tiny.fluxscale


In [14]:
run pipelines/processMeerKAT/cal_scripts/xy_yx_apply.py --config myconfig.txt

2019-08-07 14:00:57,679 INFO: applying calibrations: primary calibrator


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

2019-08-07 14:01:34,177 INFO:  applying calibrations: secondary calibrators


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

2019-08-07 14:03:53,424 INFO:  applying calibrations: target fields


*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The selected table has zero rows.
*** Error *** Error in data selection specification: MSSelectionNullSelection : The sele

# Splitting out calibrated data 

Finally the calibrated data are averaged down in time and frequency by the amount specified in the config file, and the target(s) and calibrators are split out into separate MMSs for further imaging/processing.

In [15]:
run pipelines/processMeerKAT/cal_scripts/split.py --config myconfig.txt

In [16]:
run pipelines/processMeerKAT/cal_scripts/quick_tclean.py --config myconfig.txt

In [17]:
run pipelines/processMeerKAT/cal_scripts/plot_solutions.py --config myconfig.txt



Combining all plots into multi-page PDF "plots/bpass_freq_amp_all.pdf"
Combining all plots into multi-page PDF "plots/bpass_freq_phase_all.pdf"
Combining all plots into multi-page PDF "plots/phasecal_time_amp_all.pdf"
Combining all plots into multi-page PDF "plots/phasecal_time_phase_all.pdf"
