## Walk through H5 files

Isaac Sihlangu<br>
Email: isihlangu@ska.ac.za

In [2]:
# Importing useful packages
import katdal  # It is useful to manipulate H5 files from MeerKAT
import h5py

### We use katdal to open the H5 files produced by MeerKAT telescope, katdal is used as the data access library. 

In [3]:
# Openning the H5 file to obtain the data set object
h5 = katdal.open( '/var/kat/archive2/data/MeerKATAR1/telescope_products/2018/02/18/1518941264.h5')

### The content of the files can be inspeceted by printing the object.

In [4]:
print h5

Name: /var/kat/archive2/data/MeerKATAR1/telescope_products/2018/02/18/1518941264.h5 (version 3.0)
Observer: sarah  Experiment ID: 20180218-0003
Description: 'MKAIV-405 Generic AR1 phaseup'
Observed from 2018-02-18 10:07:45.570 SAST to 2018-02-18 10:11:21.479 SAST
Dump rate / period: 0.12505 Hz / 7.997 s
Subarrays: 1
  ID  Antennas                            Inputs  Corrprods
   0  m000,m002,m003,m006,m007,m008,m011,m012,m013,m019,m022,m023,m027,m029,m032,m034  32      544
Spectral Windows: 1
  ID Band Product  CentreFreq(MHz)  Bandwidth(MHz)  Channels  ChannelWidth(kHz)
   0 L    bc856M4k   1284.000         856.000           4096       208.984
-------------------------------------------------------------------------------
Data selected according to the following criteria:
  subarray=0
  ants=['m019', 'm008', 'm003', 'm002', 'm012', 'm013', 'm007', 'm006', 'm029', 'm023', 'm022', 'm032', 'm027', 'm011', 'm000', 'm034']
  spw=0
------------------------------------------------------------

The first segment of the printout displays the static information of the data set, including observer, dump rate and all the available subarrays and spectral windows in the data set. The second segment (between the dashed lines) highlights the active selection criteria. The last segment displays dynamic information that is influenced by the selection, including the overall visibility array shape, antennas, channel frequencies, targets and scan info.



The data set is built around the concept of a three-dimensional visibility array with dimensions of time, frequency and correlation product. This is reflected in the shape of the dataset:



In [5]:
print h5.shape

(27, 4096, 544)


The above it means 27 dumps by 4096 frequency channels by 544 correletion product.

#### Dumps
The dump rate as we can read from the above H5 file content print out is approximately 8s. It means that every 8sec we record a data point. The observation time for this file is approximately 3 minutes and 36 seconds $\approx$ 216 seconds. <br>

So, 
\begin{eqnarray}
\begin{aligned}
Dumps ={} &\frac{length \hspace{0.25cm} of \hspace{0.25cm}  observation \hspace{0.25cm}  time}{Dumping \hspace{0.25cm}  rate} \\
& = \frac{216sec}{8sec} \\
& = 27 \hspace{0.25cm} dumps
\end{aligned}
\end{eqnarray}

Thus we have 27 dumps as we can read it from the shape of the file. <br >

#### Frequency Channels

We are observeing using the L-band reciever. The frequency range for the L-band is 856 MHz - 1711.791 MHz as it can be read from the print out above. The channel width is 208.984 KHz. So, <br >

\begin{eqnarray}
\begin{aligned}
freqRange = {}& endFreq \hspace{0.25cm} - \hspace{0.25cm} startFreq \\
            &= 1711.791MHz \hspace{0.25cm} - \hspace{0.25cm} 856MHz \\
            &= 855.79 MHz
\end{aligned}
\end{eqnarray}

then, <br >

\begin{eqnarray}
\begin{aligned}
Frequency \hspace{0.25cm}channels = & \frac{freqRange}{channel \hspace{0.25cm}width} \\
                                 & = \frac{855790.999 KHz}{208.984 KHz}
                                 & = 4096
\end{aligned}
\end{eqnarray}

As it can be seen from the shape of the H5 file object.

#### Correlation products

Th correlation product is divided into: <br >
- Auto-Correlation  =  Telescope correlating with it self.
- Cross- Correlation = Telescope correlating with other telescopes. <br >

We can compute the number of baseline as follows: <br >

\begin{eqnarray}
\begin{aligned}
No. \hspace{0.25 cm}baseline = & \frac{N(N-1)}{2}
\end{aligned}
\end{eqnarray}

where N is the number of antennas. In our case we have 16 antennas, so No. baseline = 120. <br >

Our antennas are linearly polarized, (i.e They have Horizonatl and Vertical polarization). <br >

For Auto-correlation we have 4 Polarization which are HH, VV, HV and VH. Also for Cross-correlation we have 4 polarization which are HH, VV, HV and VH. But, for auto-correlation HV and HV are the same since it is the same antenna. <br >

In our example we have 16 antennas, so we shall have <br >

- Auto-correlation <br >
16 $\times$ 4 = 64 products

- Cross-correlation <br >
120 $\times$ 4 = 480 products <br >

Thus the total number of correlation products is 64 + 480 = 544. <br >


We can access the file at attributes by using the dot method (i.e H5. > tab) will give us the list of atrributes that the object contain. We can also select the subset of the data using Dataset.select attribute, for more detailed information (i.e. do d.select? in IPython).

### Select Attribute

We can use the 'select' attribute to select the subset of the data depending on our needs. We can select subset of data, based on time, frequency , and correlation products. This applies a set of selection criteria to the data set, which updates the data set properties and attributes to match the selection. In other words, the :meth:`timestamps` and :meth:`vis` methods will return the selected subset of the data, while attributes such as :attr:`ants`,
:attr:`channel_freqs` and :attr:`shape` are updated. This function returns nothing, but modifies the existing
data set in-place. <br >

The selection criteria are divided into groups, based on whether they
affect the time, frequency or correlation product dimension::

* Time: `dumps`, `timerange`, `scans`, `compscans`, `targets`
* Frequency: `channels`, `freqrange`
* Correlation product: `corrprods`, `ants`, `inputs`, `pol` <br >

To Reset <br >

We use the :meth: 'select' without any arguments or attribute, then then the selection will be reset to the original data set. Example below:


We going to select only the cross correlation product, and HH,VV,HV, and VH polarization and lastly only scans which were tracking.

In [6]:
# Applying the selection criteria
h5.select(corrprods = 'cross',pol = ['HH','VV','HV','VH'],scans='track')

In [7]:
print h5

Name: /var/kat/archive2/data/MeerKATAR1/telescope_products/2018/02/18/1518941264.h5 (version 3.0)
Observer: sarah  Experiment ID: 20180218-0003
Description: 'MKAIV-405 Generic AR1 phaseup'
Observed from 2018-02-18 10:07:45.570 SAST to 2018-02-18 10:11:21.479 SAST
Dump rate / period: 0.12505 Hz / 7.997 s
Subarrays: 1
  ID  Antennas                            Inputs  Corrprods
   0  m000,m002,m003,m006,m007,m008,m011,m012,m013,m019,m022,m023,m027,m029,m032,m034  32      544
Spectral Windows: 1
  ID Band Product  CentreFreq(MHz)  Bandwidth(MHz)  Channels  ChannelWidth(kHz)
   0 L    bc856M4k   1284.000         856.000           4096       208.984
-------------------------------------------------------------------------------
Data selected according to the following criteria:
  corrprods='cross'
  pol=['HH', 'VV', 'HV', 'VH']
  subarray=0
  scans='track'
  spw=0
-------------------------------------------------------------------------------
Shape: (15 dumps, 4096 channels, 480 correlation 

We can notice that on the second segment (between the dashed lines) highlights the active selection criteria. All our selection criteria asre printed.

In [8]:
# Print the shape of the updated dataset
print h5.shape

(15, 4096, 480)


We can note that now we have 15 dumps as compared to the original dataset which has 27 dumps, this is because now we have only chosen tracking scans. The number of frequency channel is still the same as the original dataset because we did not select anything. The number of correlation products now it is 480, which is what we have as expected as per the explanation given above under correlation product.