In [None]:
%pylab inline

This tutorial will introduce you to basic interferometric data capture from KAT-4 and show you how to use
built-in python tools to explore and manipulate the data.


## A data perspective on interferometers

You should already have some familiarity with signal dish radio telescopes and the type of data they produce. An interferometer is a collection
of single dish radio telescopes whose output voltage data are cross-correlated to form a number of baselines.

If we exclude redundant pairs it is easy to see that we will produce (N\*N-1)/2 baselines from N antennas. We can also correlate the signal from a single antenna with itself, producing an auto-correlation. Including auto-correlations gives us a total of (N\*N+1)/2 baselines in our data.

These baseline pairs are produced by a device known as the correlator. In addition to performing correlation, the correlator also channelises and averages the data. Further, each antenna has two polarisations producing four possible polarisation products for each baseline.

The upshot of all this is that the output data from the correlator consists of a number of baselines, split into a number of channels, each with 4 polarisations. A block of this data is emitted by the correlator each integration period (typically once per second).

The diagram below shows the data structure for the KAT-4 correlator:

<img src='files/images/dbe_format.png' />

## The data on disk

For KAT-4 we store the data from the correlator in a standard format known as HDF5. The format is a specifier of how the data is physically stored onto a computer hard disk and how we should access the data. By using a standard format, instead of simply writing the data directly to disk (effectively using our own custom format), we can make use of externally developed tools to access and view the data.

In particular, HDF5 has good Python bindings allowing us to easily import data into a Python session.

## An example HDF5 file

For this tutorial we have prepared an example HDF5 file that contains a short snapshot of interferometric data captured from the KAT-4 telescope. This file named **kat4_int_example_github.hdf5.bz2** is available in your tutorial working directory.

For convenience, let's copy it to another file, then lets unzip it:
(We only have to run this command once - if you re-run this notebook you will get an error that the unzipped file already exists. Thats fine.)

In [None]:
!cp kat4_int_example_github.hdf5.bz2 kat4_int_example.hdf5.bz2
!bzip2 -d kat4_int_example.hdf5.bz2

We unzipped the interferometric data file as **kat4_int_example.hdf5**. We can start off by using the command line system tools to explore the structure of the file. The system command **h5ls** will help. You can run linux shell commands from this notebook by simply prefixing the commands with an !

Try the following in the cell below: 

    !h5ls kat4_int_example.hdf5
    
You should see output that looks like:

    Antennas        Group
    Correlator      Group
    Scan            Group

In [None]:
!h5ls kat4_int_example.hdf5

You can think of a group in much the same way as a directory in a normal file system. It is a named placeholder for further content below it. In this case we have three groups:

 * Antennas - This holds information related to physical antennas that was captured during the experiment. It includes information such as where the antenna was pointing at any particular time.
 * Correlator - This hold information about how the correlator was configured during the experiment. For example, the number of output channels is recorded.
 * Scan - This holds the actual correlator data for the experiment. A scan is a generic name given to any observation made with the telescope.

For more information try running **h5ls** in recursive mode (remember **h5ls -h** can help you with the available commands) and seeing all the items contained in the groups.

As you can see there is a quite a bit of information in the file, but we are only really interested in some parts of the file.

The next step is to start an IPython session and import the file into our session so that we can start using more sophisticated tools to explore the data:

In [None]:
import h5py
 # this tells python that we are going to be using the h5py module in our code

In [None]:
f = h5py.File('kat4_int_example.hdf5')
 # open our example file with h5py and return a reference to it

You can now navigate through the file using array syntax `[]` and tab completion. Try it... Type **f['** in your session and hit the tab key. Completion examples will be shown and should include *Antennas*, *Correlator*, and *Scan*. You can explore the file using this technique. Try and find the following sections:

    f['Scan']['data']
    f['Antennas']['Antenna1']['Sensors']['pos_actual_scan_azim']

In addition to groups and datasets an HDF5 file can contain attributes. Attributes can be stored in datasets or groups and define key-value pairs that can be used for ancillary information. These can be accessed using the *.attrs['* syntax in a similar fashion to the standard *f['* syntax used above. For example:

    f['Scan'].attrs['

Try and find the description attribute for the *pos_actual_scan_azim* dataset you found earlier.

Once we have located a group or dataset we can assign these to variables for convenient future reference. Try and assign the following variables to the corresponding datasets:

    data assign to /Scan/data
    ts assign to /Scan/timestamps


You can check you have done the right thing by inspecting your new variables. Check the shape and the dtype and make sure you understand what these are telling you.

In [None]:
print ts.shape # should be (900,)
print data.shape # should be (900, 512)

We have now assigned a reference to the data in the scan and the timestamps for this data. Timestamps are very important for a radio telescope as we need to know at exactly which time a particular data point was captured so that we can correctly align it with other pieces of information such as the telescope pointing direction.

Now that we have loaded some data lets have a look at how the correlator data is actually being stored. You have already had some experience with NumPy arrays (or should have). The data you have loaded is in the form of a NumPy array. This means it will have a data type and dimensions. Try and check out the data type and the dimensions of your loaded data.

*Note: If you need a refresher on what a NumPy array can do, remember that tab complete will give you hints on the available attributes and methods for an object.*

The data array is two dimensional. The first axis is time. This means that each consecutive correlator dump is a new row in this array. The second is frequency channel. All the channels in the dataset combined form a frequency band of data. How many channels does your data set contain ?

Each channel in the dataset has a specific width and frequency. For many aspects of the data we need additional data (called meta-data) to properly interpret them. For example, information on then channel widths and frequencies are stored as attributes in the *Correlator* group. A key feature of the H5 file is that it contains a wide variety of meta-data used to interpret the actual telescope data saved therein.

You will see from the datatype that the *data* array is not a simple array of numbers, each cell in the array has a complicated type. For this particular experiment we used 4 antennas each with a single polarisation. The possible baselines combinations are labelled '0' through '11' in the output file. For each baseline we have a single complex value (NumPy type complex64).

## Task 1 - Plot a single correlator time dump

In this task you will plot a single time snapshot of data and make sure that the axes are correct and properly labelled. To do this you will need to perform the following basic steps::

    Select a single row from the data array
    Select a specific baseline from the data array
    Find the attributes that describe the channel bandwidth, number of channels and center frequency of the entire band.
    Create an array of center frequencies for each channel
    Create a plot of absolute value vs frequency (make sure to label the axes)

You may find that channel 0 contains a large DC offset and should be excluded from your plot.

Some hints have been placed in the cells below, but they will only work if you fill in the rest correctly :)

*Note: The data is stored in 64-bit complex numbers by default. The NumPy functions np.abs and np.angle will give you the absolute value and phase respectively of a NumPy array of complex numbers.*

In [None]:
bline_data = row['0']
 # step 2

In [None]:
cfs_mhz = [(cf_hz + (bw_hz/nc*1.0)*c + 0.5*(bw_hz/nc*1.0))/1e6 for c in range(-int(nc)/2, int(nc)/2)]
 # step 4

## Task 2 - Create a waterfall plot and measure the fringe rate

A plot that we use very often when trying to debug or understand interferemetric data is a spectrogram (or waterfall plot). This plots time vs frequency with the intensity of each point shown using a colormap. By plotting the phase of the complex data against time and frequency we can get a feel for how the phase of a particular baseline changes as our observation proceeds.

The following hints may help you with this task:

 * Remember you can get help on any python method by adding a `?` after the method name. (e.g. to get help on the matplotlib plot command type `plot?` and hit enter)
 * The imshow matplotlib method can plot arrays of data
 * Some of the baselines are auto-correlation and some are cross-correlation. The auto-correlation baselines always have zero phase.
 * I suggest you plot baseline '3' from the data which is the cross-correlation between antenna 1 and antenna 2.
 * Remember that you have complex data and need to use a NumPy function to get the phase...

Your phase plot should end up looking something like this:

<img src='files/images/phase_plot.png' />

Try and make something similar...

*Note: It may be helpful to create a figure of a decent size first, so you can easily see the fringes. The help for the figure command is your friend.*

In order to read the fringe frequency for a particular channel (needed below), it may be helpful to produce a zoomed in version of the plot.

*Note: the imshow argument aspect='auto' is helpful for producing square plots that show more detail*

The phase gradient you see has a number of causes, but the most easily explained is that caused by the rotation of the earth. Imagine a pair of antennas each pointed directly at the same source in the sky. If the source in question is directly overhead then the effective path length between the source and each antenna is the same. However as the earth rotates the source appears lower and lower in the sky. At the horizon, the path length between the furthest antenna and the source is longer (by the length of the baseline between them) than for the antenna closest to the source.

This change in projected baseline with time gives rise to a characteristic 'fringe' pattern in the phase of the correlation between the two antennas. Clearly the faster the projected baseline changes the faster the phase will change.

For any given channel the phase will change by $2 \pi$ when the projected baseline changes by 1 wavelength. If we measure the period of this phase change (and knowing some information about our source) we can calculate how fast the source is moving in the sky. If it is a celestial source we would expect it to move at the Earth's rotational velocity.

To check this we can use the following formula:

   \begin{equation}
      \nu_{F} = -\omega_{e} u \cos{\delta}
   \end{equation}

This relates the fringe frequency $\nu_{F}$ to the projected baseline length $u$ (in wavelengths), the source's angular velocity $\omega_{e}$ and current declination $\delta$ of the source in question. In our case we want to solve for the angular velocity $\omega_{e}$. We can measure the fringe frequency from our plot, but we need to do a bit of work to get the declination and baseline. The first step is to extract some more meta-data from our file.

We will need to get the following pieces of information (and assign them to convenient variables):

 * The timestamp at the start of the interval over which we have measured the fringe frequency. (Note that the timestamps inside the 'Scan' group are in ms)
 * A description of the source that we are observing. This is stored as a text string as an attribute of the 'Scan' group.
 * A description of each of the two antennas for the particular baseline we are measuring. The description for a particular antenna can be found within the 'Antnna' group.
 * The frequency of the channel in which you have made the fringe frequency measurement (in hz)
 
Some hints are provided in the cell below, but you will need to additional variables to make things work.

In [None]:
ant1 = f['Antennas/Antenna1'].attrs['description']
source = f['Scan'].attrs['target']

Once you have these variables (in my case called ts_start, source, ant1, ant2 and nu respectively) and have measured the fringe frequency you can use the following code to find the source declination and project baseline:

 

In [None]:
import katpoint
 # import our source and coordinate library
tgt = katpoint.Target(source)
 # create an object describing the observation target
ant1_obj = katpoint.Antenna(ant1)
ant2_obj = katpoint.Antenna(ant2)
 # two antenna objects
(u, v, w) = tgt.uvw(ant2_obj, ts_start / 1000, ant1_obj)
l = katpoint.lightspeed / nu
B = math.sqrt((u/l)**2 + (v/l)**2)
 # work out the project baseline at the time of the observation
(ra, dec) = tgt.radec(ts_start/1000, ant1_obj)
 # calculate Right Ascension and Declination for the appropriate time

You now have the declination (dec) and projected baseline (B) in wavelengths.

See if you can work out the source's angular velocity. (it will be in radians)

Does this show that the source is celestial ?