<center><strong><font size=+3>Power spectra at different stages of the calibration pipeline</font></center>
<br><br>
<center><strong><font size=+2>Matyas Molnar</font><br></strong></center>
<br><center><strong><font size=+1>6th January 2020</font></strong></center>

This jupyter notebook is designed to be run when logged into NRAO. In addition to the standard python libraries, it requires pyuvdata, hera_cal, and hera_qm. For more information on access, see the HERA wiki.

In [None]:
import pyuvdata

In [None]:
import os
import sys

import numpy as np
from matplotlib import pyplot as plt

import hera_pspec as hp
from hera_cal.io import HERAData
from pyuvdata import UVData
from pyuvdata import utils as uvutils

In [None]:
plt.rcParams["figure.figsize"] = (12,8)
%matplotlib inline

In [None]:
analysis_folder = '/lustre/aoc/projects/hera/H1C_IDR2/IDR2_2/2458098'
H1C_IDR2_folder = '/lustre/aoc/projects/hera/H1C_IDR2/IDR2_2'
raw_data_fn = 'zen.2458098.43869.HH.uvh5'

hera_pkgs = '/users/heramgr/hera_software/'
working_dir = '/lustre/aoc/projects/hera/mmolnar/hc_analysis/sample_calib'
if not os.path.exists(analysis_folder): # working locally
    analysis_folder = '/Users/matyasmolnar/Downloads/HERA_Data/robust_cal/'
    hera_pkgs = '/Users/matyasmolnar/Downloads/HERA_Data/hera_packages/'
    working_dir = '/Users/matyasmolnar/Downloads/HERA_Data/hc_analysis/sample_calib'
    
os.chdir(working_dir)

The day-by-day data, along with the pipeline settings file used (makeflow/idr2_2.toml), and the LST-binned data can be found in:

In [None]:
!ls /lustre/aoc/projects/hera/H1C_IDR2/IDR2_2

H1C IDR 2.2 includes 18 nearly-consecutive nights of data. All data products are sorted by JD into the following folders on the NRAO servers, which also contain softlinks to raw H1C IDR 2 data in the `.uvh5` format and antenna metrics released by the commissioning team. Each night has 73 raw visibility files. Each file has 4 polarizations, 1024 frequency channels and (usually) 60 integrations, each 10.7374 seconds.

In [None]:
# this is the baseline and file we examine through this notebook
raw_data_fn = 'zen.2458098.43869.HH.uvh5'
raw_data_file = os.path.join(analysis_folder, raw_data_fn)
bl = (38, 39, 'ee')

In [None]:
jd = os.path.basename(raw_data_file).split('.')[1]

In [None]:
hd = HERAData(raw_data_file)
print('This file has', len(hd.times), 'integrations', 'and', len(hd.freqs), 'frequency channels.')

In [None]:
# only loads a single bl; default loads all bls
data, flags, nsamples = hd.read(bls=[bl])

In [None]:
plt.figure(figsize=(12,3), dpi=100)
plt.imshow(np.angle(data[bl]) / (~flags[bl]), aspect='auto', cmap='twilight',
           extent=[hd.freqs[0]/1e6, hd.freqs[-1]/1e6, hd.times[-1]-int(jd), hd.times[0]-int(jd)])
plt.ylabel('JD - {}'.format(jd))
plt.xlabel('Frequency (MHz)')
plt.title('Raw Visibility Phases: ' + str(bl))
plt.colorbar(label='Phase (Radians)', aspect=8, pad=.025);

### Calculating the delay spectrum

In [None]:
# Load beam model
beamfile = os.path.join(hera_pkgs, 'hera_pspec/hera_pspec/data/HERA_NF_dipole_power.beamfits')
cosmo = hp.conversions.Cosmo_Conversions()
uvb = hp.pspecbeam.PSpecBeamUV(beamfile, cosmo=cosmo)

In [None]:
def pspec_calc(dfile):

    # Load data into UVData objects
    uvd = UVData()
    uvd.read_uvh5(dfile)
    
    # find conversion factor from Jy to mK
    Jy_to_mK = uvb.Jy_to_mK(np.unique(uvd.freq_array), pol='xx')

    # reshape to appropriately match a UVData.data_array object and multiply in!
    uvd.data_array *= Jy_to_mK[None, None, :, None]

    # We only have 1 data file here, so slide the time axis by one integration 
    # to avoid noise bias (not normally needed!)
    uvd1 = uvd.select(times=np.unique(uvd.time_array)[:-1:2], inplace=False)
    uvd2 = uvd.select(times=np.unique(uvd.time_array)[1::2], inplace=False)

    # Create a new PSpecData object
    ds = hp.PSpecData(dsets=[uvd1, uvd2], wgts=[None, None], beam=uvb)
    
    # Because we are forming power spectra between datasets that are offset in LST there will be some
    # level of decoherence (and therefore signal loss) of the EoR signal. For short baselines and small
    # LST offsets this is typically negligible, but it is still good to try to recover what coherency
    # we can, simply by phasing (i.e. fringe-stopping) the datasets before forming the power spectra. 
    # This can be done with the rephase_to_dset method, and can only be done once.    
    ds.rephase_to_dset(0) # Phase to the zeroth dataset
    
    # change units of UVData objects
    ds.dsets[0].vis_units = 'mK'
    ds.dsets[1].vis_units = 'mK'

    # Find list of baselines pairs to calculate power spectra for
    uvd_ant_copy = uvd.copy()
    uvd_ant_copy.select(times=uvd_ant_copy.time_array[0])

    # Returned values: list of redundant groups, corresponding mean baseline vectors, baseline lengths. 
    # No conjugates included, so conjugates is None.
    tol = 0.5  # Tolerance in meters
    baseline_groups, vec_bin_centers, lengths = uvutils.get_baseline_redundancies(uvd_ant_copy.baseline_array, \
                                                                                  uvd_ant_copy.uvw_array, tol=tol)

    # Selecting shortest (~14.6m) EW baselines group
    # Check to see if baselines haven't already been aggregated by group - this is done in omnical
    # where only 'only one baseline per unique separation' is kept
    if len(baseline_groups) == len([bl for bl_group in baseline_groups for bl in bl_group]):
        bls_to_include = baseline_groups[0]
        bls1 = [uvutils.baseline_to_antnums(bls_to_include[0], len(uvd.get_ants()))]
        bls2 = bls1
    else:
        bls_to_include = baseline_groups[1]
        
        # Converting to antnum tuples to be used to construct_blpairs later on
        ant_pairs_to_include = [uvutils.baseline_to_antnums(i, len(uvd.get_ants())) for i in bls_to_include]
        bls1, bls2, blp = hp.utils.construct_blpairs(ant_pairs_to_include, exclude_permutations=False, exclude_auto_bls=True)

    # Power spectrum calculation
    uvp = ds.pspec(bls1, bls2, (0, 1), [('xx', 'xx')], spw_ranges=[(300, 400), (600,721)], input_data_weight='identity', norm='I', 
                   taper='blackman-harris', verbose=False)

    blpairs = np.unique(uvp.blpair_array)
    blps = list(blpairs)
    
    return uvp, blps

### LST-binned data

In [None]:
lstbinned_file = os.path.join(H1C_IDR2_folder, 'LSTBIN/one_group/grp1/zen.grp1.of1.LST.1.40949.HH.OCRSL.uvh5')
lst_uvp, lst_blps = pspec_calc(lstbinned_file)

In [None]:
# Plot the spectra averaged over baseline-pairs and times
fig, ax = plt.subplots(figsize=(12, 8))
fig = hp.plot.delay_spectrum(lst_uvp, [lst_blps,], spw=0, pol=('xx','xx'), average_blpairs=True, \
                             average_times=False, ax=ax)
plt.tight_layout()
plt.show()

### Raw data PS

In [None]:
raw_uvp, raw_blps = pspec_calc(raw_data_file)

In [None]:
# Plot the spectra averaged over baseline-pairs and times
fig, ax = plt.subplots(figsize=(12, 8))
fig = hp.plot.delay_spectrum(raw_uvp, [raw_blps,], spw=0, pol=('xx','xx'), average_blpairs=True, \
                             average_times=True, ax=ax)
plt.tight_layout()
plt.show()

In [None]:
# repeat for redundantly, absolutely, XRFI flagged absolutely, smooth, delay-filtered calibrated data

In [None]:
delay_filtered_fn = raw_data_fn.replace('uvh5', 'OCRSD.uvh5')
delay_filtered_file = os.path.join(analysis_folder, delay_filtered_fn)

In [None]:
dly_uvp, dly_blps = pspec_calc(delay_filtered_file)

In [None]:
# Plot the spectra averaged over baseline-pairs and times
fig, ax = plt.subplots(figsize=(12, 8))
fig = hp.plot.delay_spectrum(dly_uvp, [dly_blps,], spw=0, pol=('xx','xx'), average_blpairs=True, \
                             average_times=True, ax=ax)
plt.tight_layout()
plt.show()

### LST-binned data

In [None]:
# # The UVData files contain 3 time bins; let's average over baseline-pairs but keep the time bins intact. 
# # We can also use the shorthand 'xx' to specify the matching polarization pair ('ee', 'ee')
# fig = hp.plot.delay_spectrum(uvp, [blps,], spw=0, pol='xx', average_blpairs=True, average_times=False)
# fig.set_size_inches(12, 8)
# plt.show()

In [None]:
# # And now let's try the opposite: average over times, but keep the baseline-pairs separate.
# fig = hp.plot.delay_spectrum(uvp, [blps,], spw=0, pol='xx', average_blpairs=False, average_times=True)
# fig.set_size_inches(12, 8)
# plt.show()