# MeerKAT Data

MeerKAT observations have different sizes from some GB to TB. To provide access to such data sets, [katdal](https://github.com/ska-sa/katdal) provides an interface. The data products are available as *SeaKAT Visibility Format (MVF)*. [katdal](https://github.com/ska-sa/katdal) additionally offers a script which converts the *MVF* to CASA measurement *.ms* files.

In [None]:
#!pip install katdal
import katdal
import time, yaml

## Access to Data

MeerKAT observation files are available through the [MeerKAT archive](https://archive.sarao.ac.za/). It requires a registration and login to get access. A detailed description is accessible through [Archive Interface User Guide](https://archive.sarao.ac.za/statics/Archive_Interface_User_Guide.pdf).

To get access through [katdal](https://github.com/ska-sa/katdal) you have to generate an *rdp-link with token* on the [MeerKAT archive](https://archive.sarao.ac.za/) and copy the token into the `katdal.open` token argument. The URL doesn't need an update since it should never change. 

**Note: all tokens have expiry dates**

In [None]:
with open('../config.yaml', 'r') as stream:
    token = yaml.safe_load(stream)['MeerKAT_archive_token']

stime = time.time()
data = katdal.open('https://archive-gw-1.kat.ac.za/1622934371/1622934371_sdp_l0.full.rdb',
                   s3_endpoint_url='https://archive-gw-1.kat.ac.za',
                   token=token)
print('time to read file = {} s'.format(time.time() - stime))
print('(dumps x channels x baselines) = {}'.format(data.shape))
print(data.vis.dataset)

## Observation Details

In [None]:
print(data)

data.shape\[0\]: dumps

data.shape\[1\]: channels

data.shape\[2\]: correlation products

In [None]:
data.shape

## Use the Data

Note the `[:]` indexing, as the `vis` and `timestamps` properties are special `LazyIndexer` objects that only give you the actual data when you use indexing, in order not to inadvertently load the entire array into memory.

For the example dataset and no selection the `vis` array will have a shape of `data.shape`. The time dimension is labelled by `d.timestamps`, the frequency dimension by `d.channel_freqs` and the correlation product dimension by `d.corr_products`.

In [None]:
# vis = data.vis[:] # loads the actual visibilities to `vis` as numpy.ndarray