# xradio and MSv4

# Table of contents
+ [Overview](#overview)
+ [xarray](#xarray)
   - [Variable and DataArray](#Variable-and-DataArray)
   - [Dataset](#Dataset)
+ [MSv4 schema](#MSv4-schema)
   - [MSv2 to MSv4 conversion](#MSv2-to-MSv4-conversion)
   - [Inspect a MSv4 dataset](#Inspect-a-MSv4-dataset)
   - [Data selection in MSv4](#Data-selection-in-MSv4)
   - [Plotting with external tools](#Plotting-with-external-tools)

## Overview

[xradio](https://xradio.readthedocs.io/en/latest/) provides a reference implementation to the new MSv4 visibility format, along with schemas for associated data products like images and calibration tables. This tutorial is based on xradio [documentation](https://xradio.readthedocs.io/en/latest/). 

xradio currently supports two types of **datatypes**:
+ Datasets - native labelled multi-dimensional data provided by xarray, and
+ Processing Sets - custom built-in data structure that is a collection of xarray Datasets. Processing Sets might be replaced in the future with xarray's native Datatree. 

xradio currently supports five types of **schemas**:
+ MSv4 - schema for interferometry and single-dish data
+ images - sky and aperture images
+ tables - calibration tables
+ aperture models - antenna dish models using Zernike polynomials
+ component lists - skymodel representation

## xarray

Since xradio is built on top of xarray, this section provides a quick summary of the basic functionalities within xarray. This section provides a brief overview of the important xarray concept. See [xarray documentation](https://docs.xarray.dev/en/latest/user-guide/terminology.html) for more information. 



### Variable and DataArray

An xarray Variable is a low-level xarray class that contains dimensions, data, and additional attributes that describe a single array. 

DataArray is a labelled collection of unlabelled data structures (such as numpy or dask arrays). Each DataArray has an underlying Variable that can be accessed as `DataArray.variable`. 

In [1]:
import xarray as xr
import numpy as np

# A 3D numpy array to hold temperature measured somewhere
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)

# Create a DataArray
temp = xr.DataArray(data=temperature)
temp

In the code snippet, all we have done is encapsulate an unlabelled 3D numpy array inside a DataArray. 

Next, let's add coordinates to this 3D numpy array. Let's say that this 3D array represents a temperature map measured at three different time periods. So, the dimensions are labelled `x`, `y`, and `time`.  

In [2]:
temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"]
)
temp

We can now use the dimension labels as shown below.

In [3]:
# Find the average temperature at each location (i.e.) find the mean along the time axis.
temp.mean("time")

We can do more by adding coordinates to each axes in the data. Think of this as adding WCS headers to a FITS image. 

In [4]:
import pandas as pd

lon = [-10, 10]
lat = [0, -5]
time = pd.date_range("2014-09-06", periods=3)

temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time
    )
)
temp

We can then do things like find the minimum temperature and the coordinates where this minimum temperature occurred. The example shows that the minimum temperature is `7.18177696` at coordinates `x=-10`, `y=-5`, and `time=2014-09-08`. 

In [5]:
temp.isel(temp.argmin(...))

In [6]:
lon = [-10, 10]
lat = [0, -5]
time = pd.date_range("2014-09-06", periods=3)

temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time
    )
)
temp

xarray makes a distinction between dimension and non-dimension coordinates. The three coordinates we have been working with above are dimension coordinates since they have been assigned to specific dimensions in the 3D data. xarray allows you to specify non-dimension coordinates, which mainly act as auxiliary labels. For example, we can have a non-dimensional coordinate `detector` which keep track of the instrument which was used to make the temperature measurements. `DataArray.dims` lists all the dimensional coordinates and `DataArray.coords` lists both dimensional and non-dimensional coordinates.

In [7]:
lon = [-10, 10]
lat = [0, -5]
time = pd.date_range("2014-09-06", periods=3)

temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
        detector="sensor 1"
    )
)
print(temp.dims)
print(temp.coords)
temp

('x', 'y', 'time')
Coordinates:
  * x         (x) int64 16B -10 10
  * y         (y) int64 16B 0 -5
  * time      (time) datetime64[ns] 24B 2014-09-06 2014-09-07 2014-09-08
    detector  <U8 32B 'sensor 1'


Two DataArray objects can be concatenated using `xarray.concat()`. 

In [8]:
lon = [-10, 10]
lat = [0, -5]

time = pd.date_range("2014-09-06", periods=3)
temp_1 = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

time = pd.date_range("2014-09-06", periods=3)
temp_2 = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

# Concat the two DataArray objects along the time axis
temp_cat = xr.concat([temp_1, temp_2], dim="time")

temp_cat

### Dataset

A Dataset is a dict-like collection of DataArray objects with aligned dimensions. This implies that any operation that can be performed on a given DataArray dimension can be performed on the same dimension on the Dataset.

The example below shows how two DataArray objects can be combined into a Dataset. 

In [9]:
lon = [-10, 10]
lat = [0, -5]
time = pd.date_range("2014-09-06", periods=3)

temperature = 15 + 8 * np.random.randn(2, 2, 3)
temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

precipitation = np.random.randn(2, 2, 3)
precip = xr.DataArray(
    data=precipitation,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

# Create a DataSet 
temp_ds = xr.Dataset(
    data_vars=dict(
        temperature=temp,
        precipitation=precip
    )
)

temp_ds

In the example above, both the input DataArray objects have the same coordinate grid. The example below shows two DataArray objects with different coordinate grids combined into a single Dataset. The coordinate grid of the resulting Dataset is a combination of the coordinate grids of the input DataArray objects. Any missing data value in the new coordinate grid is indicated by a NaN. 

In [10]:
lon = [-10, 10]
lat = [0, -5]
time = pd.date_range("2014-09-06", periods=3)

temperature = 15 + 8 * np.random.randn(2, 2, 3)
temp = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

lon = [20, 30]
lat = [0, -5]

precipitation = np.random.randn(2, 2, 3)
precip = xr.DataArray(
    data=precipitation,
    dims=["x", "y", "time"],
    coords=dict(
        x=lon,
        y=lat,
        time=time,
    )
)

# Create a DataSet 
temp_ds = xr.Dataset(
    data_vars=dict(
        temperature=temp,
        precipitation=precip
    )
)

temp_ds

## MSv4 schema

### MSv2 to MSv4 conversion

The function `xradio.measurement_set.convert_msv2_to_processing_set()` allows one to convert a correlated dataset from MSv2 format to MSv4.

First, download a small MeerKAT dataset that comes with xradio for testing purposes. 

In [11]:
import toolviper
toolviper.utils.data.download("small_meerkat.ms")

[[38;2;128;05;128m2024-11-19 20:27:36,019[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m Updating file metadata information ...  
 

[[38;2;128;05;128m2024-11-19 20:27:38,062[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m File exists: small_meerkat.ms 


In [12]:
from xradio.vis import convert_msv2_to_processing_set
msv2_name = "small_meerkat.ms"
msv4_name = "small_meerkat.zarr"

convert_msv2_to_processing_set(
    in_file=msv2_name,
    out_file=msv4_name,
    overwrite=True
)

[[38;2;128;05;128m2024-11-19 20:27:39,350[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m Partition scheme that will be used: ['DATA_DESC_ID', 'OBS_MODE', 'OBSERVATION_ID', 'FIELD_ID'] 
[[38;2;128;05;128m2024-11-19 20:27:39,387[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m Number of partitions: 3 
[[38;2;128;05;128m2024-11-19 20:27:39,388[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m OBSERVATION_ID [0], DDI [0], STATE [1], FIELD [0], SCAN [1] 
[[38;2;128;05;128m2024-11-19 20:27:40,604[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m OBSERVATION_ID [0], DDI [0], STATE [2], FIELD [1], SCAN [2 4 6] 
[[38;2;128;05;128m2024-11-19 20:27:41,974[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m    viperlog: [0m OBSERVATION_ID [0], DDI [0], STATE [3], FIELD [2], SCAN [3 5] 


### Inspect a MSv4 dataset

The function `read_processing_set()` (lazy) loads an MSv4 dataset. In xradio parlance, one or a collection of MSv4 datasets is referred to as a processing set and so the visibility data is represented by the `ProcessingSet` object.

Remember that `read_processing_set()` only does a lazy load. If you want the entire processing set in memory, use `load_processing_set()`. 

In [13]:
from xradio.vis import read_processing_set
vis_data = read_processing_set(msv4_name)
type(vis_data)

xradio.vis._processing_set.processing_set

`ProcessingSet.summary()` seems to be the equivalent for CASA's listobs or miriad's prthd tasks. Summarize the contents of the MSv4 dataset with

In [14]:
vis_data.summary()

Unnamed: 0,name,obs_mode,shape,polarization,scan_number,spw_name,field_name,source_name,line_name,field_coords,start_frequency,end_frequency
1,small_meerkat_0,"[CALIBRATE_BANDPASS, CALIBRATE_FLUX]","(74, 6, 50, 4)","[XX, XY, YX, YY]",[1],spw_0,[J1939-6342_0],[J1939-6342_0],[],"[fk5, 19h39m25.03s, -63d42m45.60s]",3265869000.0,3276337000.0
0,small_meerkat_1,"[CALIBRATE_PHASE, CALIBRATE_AMPLI]","(42, 6, 50, 4)","[XX, XY, YX, YY]","[2, 4, 6]",spw_0,[J1619-8418_1],[J1619-8418_1],[],"[fk5, 16h19m33.97s, -84d18m19.10s]",3265869000.0,3276337000.0
2,small_meerkat_2,[TARGET],"(223, 6, 50, 4)","[XX, XY, YX, YY]","[3, 5]",spw_0,[J0358-8103_2],[J0358-8103_2],[],"[fk5, 3h58m31.50s, -81d03m45.70s]",3265869000.0,3276337000.0


As you can see from the above output, the orginal MSv2 dataset has been split into 3 partitions in the new MSv4 dataset. You can see the number of partitions with just `len()`.

In [15]:
len(vis_data)

3

`ProcessingSet.keys()` lists the names of each partition.

In [16]:
vis_data.keys()

dict_keys(['small_meerkat_1', 'small_meerkat_0', 'small_meerkat_2'])

Each partition is then an object of `xradio.measurement_set.measurement_set_xds.MeasurementSetXds`. 

In [17]:
type(vis_data["small_meerkat_0"])

xarray.core.dataset.Dataset

Inspecting the content of a partition produces something quite similar to the structure of `xarray.DataArray` we saw above. 

In [18]:
vis_data["small_meerkat_0"]

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6,)","(6,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 96 B 96 B Shape (6,) (6,) Dask graph 1 chunks in 2 graph layers Data type",6  1,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6,)","(6,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6,)","(6,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 96 B 96 B Shape (6,) (6,) Dask graph 1 chunks in 2 graph layers Data type",6  1,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6,)","(6,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,592 B,592 B
Shape,"(74,)","(74,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 592 B 592 B Shape (74,) (74,) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",74  1,

Unnamed: 0,Array,Chunk
Bytes,592 B,592 B
Shape,"(74,)","(74,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(74, 6)","(74, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.47 kiB 3.47 kiB Shape (74, 6) (74, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  74,

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(74, 6)","(74, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,86.72 kiB,86.72 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray
"Array Chunk Bytes 86.72 kiB 86.72 kiB Shape (74, 6, 50, 4) (74, 6, 50, 4) Dask graph 1 chunks in 2 graph layers Data type bool numpy.ndarray",74  1  4  50  6,

Unnamed: 0,Array,Chunk
Bytes,86.72 kiB,86.72 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,bool numpy.ndarray,bool numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(74, 6)","(74, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.47 kiB 3.47 kiB Shape (74, 6) (74, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  74,

Unnamed: 0,Array,Chunk
Bytes,3.47 kiB,3.47 kiB
Shape,"(74, 6)","(74, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,10.41 kiB,10.41 kiB
Shape,"(74, 6, 3)","(74, 6, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.41 kiB 10.41 kiB Shape (74, 6, 3) (74, 6, 3) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  6  74,

Unnamed: 0,Array,Chunk
Bytes,10.41 kiB,10.41 kiB
Shape,"(74, 6, 3)","(74, 6, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,693.75 kiB,693.75 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 693.75 kiB 693.75 kiB Shape (74, 6, 50, 4) (74, 6, 50, 4) Dask graph 1 chunks in 2 graph layers Data type complex64 numpy.ndarray",74  1  4  50  6,

Unnamed: 0,Array,Chunk
Bytes,693.75 kiB,693.75 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,346.88 kiB,346.88 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 346.88 kiB 346.88 kiB Shape (74, 6, 50, 4) (74, 6, 50, 4) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",74  1  4  50  6,

Unnamed: 0,Array,Chunk
Bytes,346.88 kiB,346.88 kiB
Shape,"(74, 6, 50, 4)","(74, 6, 50, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


It might be useful to compare the above output with this schema layout (taken from xradio documentation). xradio documentation says "Optional datasets are indicated by round brackets. Data variables are capitalized. The suffix ‘_xds’ denotes an xarray dataset, while ‘_info’ indicates dictionaries." 

<center><img src="MSv4_Schema_Overview.png" width=400 /></center>

We can recognize a lot of familiar names in the output above. For example, what used to be the ANTENNA subtable in MSv2 has now become the `antenna_xds` attribute in MSv4. 

In [19]:
vis_data["small_meerkat_0"].antenna_xds

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 72 B 72 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type",3  1,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3, 2)","(3, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 24 B 24 B Shape (3, 2) (3, 2) Dask graph 1 chunks in 2 graph layers Data type",2  3,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3, 2)","(3, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 48 B 48 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type",3  1,

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 24 B 24 B Shape (3,) (3,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  1,

Unnamed: 0,Array,Chunk
Bytes,24 B,24 B
Shape,"(3,)","(3,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3, 3)","(3, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (3, 3) (3, 3) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  3,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3, 3)","(3, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3, 3)","(3, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72 B 72 B Shape (3, 3) (3, 3) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",3  3,

Unnamed: 0,Array,Chunk
Bytes,72 B,72 B
Shape,"(3, 3)","(3, 3)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(3, 2, 2)","(3, 2, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 96 B 96 B Shape (3, 2, 2) (3, 2, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  2  3,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(3, 2, 2)","(3, 2, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(3, 2)","(3, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 48 B 48 B Shape (3, 2) (3, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  3,

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(3, 2)","(3, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In the same way, you can access other useful information as described below:

+ `vis_data["small_meerkat_0"].frequency` - Channel frequencies in the visibility data
+ `vis_data["small_meerkat_0"].VISIBILITY` - What used to be the visibility column in MSv2
+ `vis_data["small_meerkat_0"].FLAG` - What used to be the flag column in MSv2. MSv4 has gotten rid of MSv2's FLAG_ROW column.
+ `vis_data["small_meerkat_0"].UVW` - What used to be the uvw column in MSv2. 

### Data selection in MSv4

From the previous section, we see that the MeerKAT test dataset contains channels in the 3.265869e+09 - 3.276337e+09 Hz frequency range. The `.sel()` and the `.isel()` methods provide quite a flexible way to filter data. For example, we can create a new visibility dataset containing channels between 3265 and 3266 MHz as follows:

In [20]:
selected_data = vis_data["small_meerkat_0"].sel(frequency=slice(3.265e9, 3.266e9))
selected_data.frequency

In the same way, we can select the first 2 channels as shown below:

In [21]:
selected_data = vis_data["small_meerkat_0"].isel(frequency=slice(0, 2))
selected_data.frequency

### Plotting with external tools