# MeerKAT Data

MeerKAT observations have different sizes from some GB to TB. To provide access to such data sets, [katdal](https://github.com/ska-sa/katdal) provides an interface. The data products are available as *MeerKAT Visibility Format (MVF)*. [katdal](https://github.com/ska-sa/katdal) additionally offers a script which converts the *MVF* to CASA measurement *.ms* files.

For further details using [katdal](https://github.com/ska-sa/katdal) you can visit it's [documentation](https://katdal.readthedocs.io/en/latest/index.html). In addition, [MeerKAT-Cookbook](https://github.com/ska-sa/MeerKAT-Cookbook) also provides some examples of different use-cases using [katdal](https://github.com/ska-sa/katdal).

In [None]:
#!pip install katdal
import katdal
import time, yaml

The observation we used to run this Notebook is:

- Observer: Operator
- Experiment ID: 20210605-0009
- Description: 'Meertime phase up with flatten bandpass'
- Observed from 2021-06-06 01:06:13.567 CEST to 2021-06-06 01:13:09.646 CEST
- Dump rate / period: 0.12498 Hz / 8.002 s

If you want to run the same configurations, please insert the according token in `config.yaml`.

# Data Access

MeerKAT observation files are available through the [MeerKAT archive](https://archive.sarao.ac.za/). It requires a registration and login to get access. A detailed description is accessible through [Archive Interface User Guide](https://archive.sarao.ac.za/statics/Archive_Interface_User_Guide.pdf).

## katdal Visibility Data V4

This section provides a rough overview of how to access the data directly using `katdal.visdatav4.VisibilityDataV4`. In case you are just interested of how to get a CASA measurement set, you can skip this section and continue on the chapter [Convert to CASA Measurement Set](#1).

Der grosse Vorteil bei der Nutzung von [katdal](https://github.com/ska-sa/katdal) liegt darin, dass schnell ein Überblick über die Metadaten sowie ein Subset der gesamten Daten gewonnen werden kann, ohne dabei den gesamten Datensatz zuerst downloaden zu müssen. [katdal](https://github.com/ska-sa/katdal) stellt lazy Datenobjekte zur Verfügung. Diese laden lediglich die Metadaten des Datensets. Darauf können dann verschiedene Filterfunktionen ausgeführt werden, was die effektive Datenmenge des Datensatzes beim Download beschränkt.

To get access through [katdal](https://github.com/ska-sa/katdal) you have to generate an *rdb-link with token* on the [MeerKAT archive](https://archive.sarao.ac.za/). The *rdb-link* is defined as:

*https<span>://archive-gw-1.kat.ac.za/\<captureblockID>/\<captureblockID>_sdp_l0.full.rdb?token=\<tokenString>*

Then you can just copy the token \<tokenString> into the `katdal.open` token argument. Afterwards, you can just copy the url into the first argument of `katdal.open`.

**Note: all tokens have expiry dates**

In [None]:
with open('../config.yaml', 'r') as stream:
    token = yaml.safe_load(stream)['MeerKAT_archive_token']

stime = time.time()
data = katdal.open('https://archive-gw-1.kat.ac.za/1622934371/1622934371_sdp_l0.full.rdb',
                   s3_endpoint_url='https://archive-gw-1.kat.ac.za',
                   token=token)
print('time to read file = {} s'.format(time.time() - stime))
print('(dumps x channels x baselines) = {}'.format(data.shape))
print(data.vis.dataset)

### Observation Details

In [None]:
print(data)

data.shape\[0\]: dumps

data.shape\[1\]: channels

data.shape\[2\]: correlation products

In [None]:
data.shape

esimated memory of the full visibilities in Bytes: dumps \* channels \* correlation_products \* 64(complex64) / 8

In [None]:
vis_memory = int(data.shape[0]*data.shape[1]*data.shape[2]*64/8)
vis_memory

### Filtering & Access the Data

[katdal](https://github.com/ska-sa/katdal) provides functionality regarding filtering through `data.select`. This pre-filters the data so that when `data.vis` is accessed, not all the data has to be loaded. After that an `npumpy.ndarray` is returned, on which you can continue working as usual.

The following chunks show a small example of data access. Many parameter settings can be derived from the output of "print(data)".

select `timerange` from **Scans**

select `targets` from **Targets**

In [None]:
data.select(timerange=('2021-06-05 23:11:53', '2021-06-05 23:13:05'), targets=[0])

select `spw` from **Spectral Windows**

select `subarray` from **Subarrays**

In [None]:
data.select(spw=0, subarray=0)

select `ants` from **ants** or **Antennas**

select `pol` (polarisation) from {'H', 'V', 'HH', 'VV', 'HV', 'VH'}

select `scans` from **Scans ScanState**

select `freqrange` NOT sure yet

In [None]:
data.select(ants='m000,m001,m002,m003,m005,m006,m007,m008,m009',
            pol='H', scans=(0,1,2), freqrange=(1700e6, 1800e6))

[katdal](https://github.com/ska-sa/katdal) provides more filter functionality. The code can be found [here](https://github.com/ska-sa/katdal/blob/master/katdal/dataset.py) in `DataSet`.

Note the `[:]` indexing, as the `vis` and `timestamps` properties are special `LazyIndexer` objects that only give you the actual data when you use indexing, in order not to inadvertently load the entire array into memory.

For the example dataset and no selection the `vis` array will have a shape of `data.shape`. The time dimension is labelled by `d.timestamps`, the frequency dimension by `d.channel_freqs` and the correlation product dimension by `d.corr_products`.

In [None]:
#vis = data.vis[:] # loads the actual visibilities to `vis` as numpy.ndarray

## Convert to CASA Measurement Set <a class="anchor" id="1"></a>

The conversion from *MVF* to *.ms* files is done by mvftoms.py. As soon as you've installed [katdal](https://github.com/ska-sa/katdal) using pip, the mvftoms.py is available in your environments /bin. So you should be able to address the mvftoms.py directly using your shell.

The datasets may be local filenames or archive URLs (*rdb-link* including access tokens). If there are multiple datasets they will be concatenated via katdal before conversion.

The conversion has the following form:

- mvftoms.py \[options\] \<dataset> \[\<dataset2>\]*

An example if this is:

- mvftoms.py -o myms https<span>://archive-gw-1.kat.ac.za/\<captureblockID>/\<captureblockID>_sdp_l0.full.rdb?token=\<tokenString>
    
Using \[options\] you are able to pre-configure the measurement set. An overview of the options available can be found in the [MeerKAT-Cookbook](https://github.com/ska-sa/MeerKAT-Cookbook/blob/master/archive/Convert%20MVF%20dataset(s)%20to%20MeasurementSet.ipynb) or in the [mvftoms.py](https://github.com/ska-sa/katdal/blob/master/scripts/mvftoms.py) directly. If you just want to download the full measurement set, you can do so by download it directly using the *download-link* instead of the *rdb-link* on the [MeerKAT archive](https://archive.sarao.ac.za/).
    
According to [katdal issue 218](https://github.com/ska-sa/katdal/issues/218) it does sadly not provide a way to pre-select on `katdal.VisibilityDataV4` (`data`) and then create a measurement set out of it. However, it could be possible if you are able to hack the `data.select` directrly into your own [mvftoms.py](https://github.com/ska-sa/katdal/blob/master/scripts/mvftoms.py).