# Test the zpix (or ztile) VAC after creation
Stephanie Juneau (NOIRLab)


## Overview

The goals of this notebook are to compare the new VAC to the original redshift catalog and test the following:
- Columns that should not have changed are indeed identical
- Columns that have been modified have changed in the expected way
- Quantify the number of rows affected by each change
- Visualize the data range of certain columns for data quality assurance
- Document a few remaining issues/features for future action

### Redshift catalogs for EDR (SPECPROD = 'fuji'):
- Original redshift catalogs: 
    - `zall-pix-fuji.fits`
    - `zall-tilecumulative-fuji.fits`
- VAC:
    - `zall-pix-edr-vac.fits`
    - `zall-tilecumulative-edr-vac.fits`
    
### Note

There are two cells to set up (1) the choice of VAC to test (`'healpix'` for zpix or `'cumulative'` for ztile), and (2) the JupyterLab platform being used (Astro Data Lab or NERSC).

### DESI Kernel Version

This notebook was tested on the official Fuji version (`22.5`)

## imports

In [1]:
import numpy as np
from astropy.io import fits
from astropy.table import Table, join, setdiff
from desitarget.targets import decode_targetid

#-----------------------------------------------------
# Below, these could potentially be useful for additional tests but they are not yet used

# For FIBERSTATUS
#from desispec.fiberbitmasking import get_all_fiberbitmask_with_amp, get_all_nonamp_fiberbitmask_val, get_justamps_fiberbitmask

# DESI targeting masks - 
#from desitarget.sv1 import sv1_targetmask    # For SV1
#from desitarget.sv2 import sv2_targetmask    # For SV2
#from desitarget.sv3 import sv3_targetmask    # For SV3

## Initial Setup (NERSC or Astro Data Lab?)

In [2]:
zcat_type = 'healpix'
#zcat_type = 'cumulative'

In [3]:
platform = 'datalab'
#platform = 'nersc'

## Read in files

In [4]:
specprod = "fuji"    # Internal name for the EDR

In [5]:
if platform=='datalab':
    # Relative path for DL users with access to file mount (available to DESI members upon request for testing purposes)
    desi_dir = "../../../../../DESI/"
    vac_dir = f"../../{specprod}/nersc/"
else:
    desi_dir = "/global/cfs/cdirs/desi/"
    vac_dir = f"{desi_dir}public/edr/vac/edr/zcat/{specprod}/v1.0/"
    # Currently in gqp for testing
    #vac_dir = f"{desi_dir}science/gqp/vac/edr/zcat/{specprod}/v1.0/"

In [6]:
# Original redshift catalog
path_before = f"{desi_dir}spectro/redux/{specprod}/zcatalog/"

# Public version
#path_before = f"{desi_dir}public/edr/spectro/redux/{specprod}/zcatalog/"

if zcat_type=='healpix':
    file_before = path_before+"zall-pix-fuji.fits"
else:
    file_before = path_before+"zall-tilecumulative-fuji.fits"

In [7]:
# VAC
if zcat_type=='healpix':
    file_after = vac_dir+"zall-pix-edr-vac.fits"
else:
    file_after = vac_dir+"zall-tilecumulative-edr-vac.fits"    

In [8]:
fits.info(file_before)

Filename: ../../../../../DESI/spectro/redux/fuji/zcatalog/zall-pix-fuji.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1  ZCATALOG      1 BinTableHDU    333   2847435R x 130C   [K, 7A, 6A, J, J, D, D, K, D, 10D, K, 6A, 20A, K, D, J, D, D, E, E, E, K, B, 3A, D, J, I, 8A, J, J, 4A, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, I, E, E, E, E, K, 2A, E, E, E, E, 1A, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, D, D, I, E, I, I, E, E, E, E, D, E, D, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, J, L, K, L]   


In [9]:
fits.info(file_after)

Filename: ../../fuji/nersc/zall-pix-edr-vac.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1  ZCATALOG      1 BinTableHDU    384   2451325R x 135C   [K, 7A, 6A, J, J, D, D, K, D, 10D, K, 6A, 20A, K, D, J, D, D, E, E, E, K, B, 3A, D, J, I, 8A, J, J, 4A, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, I, E, E, E, E, K, 2A, E, E, E, E, 1A, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, D, D, I, E, I, I, E, E, E, E, D, E, D, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, J, L, K, L, D, D, D, J, J]   


In [10]:
%%time
tb = Table.read(file_before)

CPU times: user 5.92 s, sys: 1.06 s, total: 6.97 s
Wall time: 6.97 s


In [11]:
%%time
tz = Table.read(file_after)

CPU times: user 7.34 s, sys: 1.15 s, total: 8.49 s
Wall time: 8.49 s


## Print and compare column names

In [12]:
colnames = tz.colnames
print(colnames)

['TARGETID', 'SURVEY', 'PROGRAM', 'HEALPIX', 'SPGRPVAL', 'Z', 'ZERR', 'ZWARN', 'CHI2', 'COEFF', 'NPIXELS', 'SPECTYPE', 'SUBTYPE', 'NCOEFF', 'DELTACHI2', 'COADD_FIBERSTATUS', 'TARGET_RA', 'TARGET_DEC', 'PMRA', 'PMDEC', 'REF_EPOCH', 'FA_TARGET', 'FA_TYPE', 'OBJTYPE', 'SUBPRIORITY', 'OBSCONDITIONS', 'RELEASE', 'BRICKNAME', 'BRICKID', 'BRICK_OBJID', 'MORPHTYPE', 'EBV', 'FLUX_G', 'FLUX_R', 'FLUX_Z', 'FLUX_W1', 'FLUX_W2', 'FLUX_IVAR_G', 'FLUX_IVAR_R', 'FLUX_IVAR_Z', 'FLUX_IVAR_W1', 'FLUX_IVAR_W2', 'FIBERFLUX_G', 'FIBERFLUX_R', 'FIBERFLUX_Z', 'FIBERTOTFLUX_G', 'FIBERTOTFLUX_R', 'FIBERTOTFLUX_Z', 'MASKBITS', 'SERSIC', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'REF_ID', 'REF_CAT', 'GAIA_PHOT_G_MEAN_MAG', 'GAIA_PHOT_BP_MEAN_MAG', 'GAIA_PHOT_RP_MEAN_MAG', 'PARALLAX', 'PHOTSYS', 'PRIORITY_INIT', 'NUMOBS_INIT', 'CMX_TARGET', 'DESI_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET', 'SV1_DESI_TARGET', 'SV1_BGS_TARGET', 'SV1_MWS_TARGET', 'SV1_SCND_TARGET', 'SV2_DESI_TARGET', 'SV2_BGS_TARGET', 'SV2_MWS_TARGE

In [13]:
colnames_before = tb.colnames
print(colnames_before)

['TARGETID', 'SURVEY', 'PROGRAM', 'HEALPIX', 'SPGRPVAL', 'Z', 'ZERR', 'ZWARN', 'CHI2', 'COEFF', 'NPIXELS', 'SPECTYPE', 'SUBTYPE', 'NCOEFF', 'DELTACHI2', 'COADD_FIBERSTATUS', 'TARGET_RA', 'TARGET_DEC', 'PMRA', 'PMDEC', 'REF_EPOCH', 'FA_TARGET', 'FA_TYPE', 'OBJTYPE', 'SUBPRIORITY', 'OBSCONDITIONS', 'RELEASE', 'BRICKNAME', 'BRICKID', 'BRICK_OBJID', 'MORPHTYPE', 'EBV', 'FLUX_G', 'FLUX_R', 'FLUX_Z', 'FLUX_W1', 'FLUX_W2', 'FLUX_IVAR_G', 'FLUX_IVAR_R', 'FLUX_IVAR_Z', 'FLUX_IVAR_W1', 'FLUX_IVAR_W2', 'FIBERFLUX_G', 'FIBERFLUX_R', 'FIBERFLUX_Z', 'FIBERTOTFLUX_G', 'FIBERTOTFLUX_R', 'FIBERTOTFLUX_Z', 'MASKBITS', 'SERSIC', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'REF_ID', 'REF_CAT', 'GAIA_PHOT_G_MEAN_MAG', 'GAIA_PHOT_BP_MEAN_MAG', 'GAIA_PHOT_RP_MEAN_MAG', 'PARALLAX', 'PHOTSYS', 'PRIORITY_INIT', 'NUMOBS_INIT', 'CMX_TARGET', 'DESI_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET', 'SV1_DESI_TARGET', 'SV1_BGS_TARGET', 'SV1_MWS_TARGET', 'SV1_SCND_TARGET', 'SV2_DESI_TARGET', 'SV2_BGS_TARGET', 'SV2_MWS_TARGE

In [14]:
common_colnames = set(colnames).intersection(set(colnames_before))

In [15]:
for col in common_colnames:
    dtype_comp = tz[col].dtype==tb[col].dtype
    if not dtype_comp: 
        print(f"Different datatype for column = {col}")
        

## Define subsets of columns of interest with expected results

In [16]:
# Columns that shouldn't have changed (except removing COEFF because it doesn't work with Table.setdiff)

common_cols = ['TARGETID', 'SURVEY', 'PROGRAM', 'SPGRPVAL', 'Z', 'ZERR', 'ZWARN', 'CHI2', \
             'NPIXELS', 'SPECTYPE', 'SUBTYPE', 'NCOEFF', 'DELTACHI2', 'COADD_FIBERSTATUS', \
             'TARGET_RA', 'TARGET_DEC', 'PMRA', 'PMDEC', 'REF_EPOCH', 'FA_TARGET', 'FA_TYPE', 'OBJTYPE', 'SUBPRIORITY', \
             'OBSCONDITIONS', 'RELEASE', 'BRICKNAME', 'BRICKID', 'BRICK_OBJID', 'MORPHTYPE', 'EBV', \
             'FLUX_G', 'FLUX_R', 'FLUX_Z', 'FLUX_W1', 'FLUX_W2', 'FLUX_IVAR_G', 'FLUX_IVAR_R', 'FLUX_IVAR_Z', 'FLUX_IVAR_W1', 'FLUX_IVAR_W2', \
             'FIBERFLUX_G', 'FIBERFLUX_R', 'FIBERFLUX_Z', 'FIBERTOTFLUX_G', 'FIBERTOTFLUX_R', 'FIBERTOTFLUX_Z', \
             'MASKBITS', 'SERSIC', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'REF_ID', 'REF_CAT', 'GAIA_PHOT_G_MEAN_MAG', 'GAIA_PHOT_BP_MEAN_MAG', \
             'GAIA_PHOT_RP_MEAN_MAG', \
             'PARALLAX', 'PHOTSYS', 'PRIORITY_INIT', 'NUMOBS_INIT', 'CMX_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET', \
             'SV1_BGS_TARGET', \
             'SV2_DESI_TARGET', 'SV2_BGS_TARGET', 'SV2_MWS_TARGET', 'SV2_SCND_TARGET', \
             'SV3_BGS_TARGET', 'SV3_MWS_TARGET', \
             'PLATE_RA', 'PLATE_DEC', 'COADD_NUMEXP', 'COADD_EXPTIME', 'COADD_NUMNIGHT', 'COADD_NUMTILE', \
             'TSNR2_GPBDARK_B', 'TSNR2_ELG_B', 'TSNR2_GPBBRIGHT_B', 'TSNR2_LYA_B', 'TSNR2_BGS_B', 'TSNR2_GPBBACKUP_B', \
             'TSNR2_QSO_B', 'TSNR2_LRG_B', 'TSNR2_GPBDARK_R', 'TSNR2_ELG_R', 'TSNR2_GPBBRIGHT_R', 'TSNR2_LYA_R', \
             'TSNR2_BGS_R', 'TSNR2_GPBBACKUP_R', 'TSNR2_QSO_R', 'TSNR2_LRG_R', 'TSNR2_GPBDARK_Z', 'TSNR2_ELG_Z', \
             'TSNR2_GPBBRIGHT_Z', 'TSNR2_LYA_Z', 'TSNR2_BGS_Z', 'TSNR2_GPBBACKUP_Z', 'TSNR2_QSO_Z', 'TSNR2_LRG_Z', \
             'TSNR2_GPBDARK', 'TSNR2_ELG', 'TSNR2_GPBBRIGHT', 'TSNR2_LYA', 'TSNR2_BGS', 'TSNR2_GPBBACKUP', 'TSNR2_QSO', \
             'TSNR2_LRG', 'SV_NSPEC', 'SV_PRIMARY', 'ZCAT_NSPEC', 'ZCAT_PRIMARY']

if zcat_type=='cumulative':
    common_cols.extend(['LASTNIGHT','TILEID'])
    
if zcat_type=='healpix':
    common_cols.extend(['HEALPIX'])

In [17]:
# Always keep the 3 identifier columns plus any column(s) to be tested

# Should not change (sanity check)
fix1_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', 'CMX_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET']
N1_correct = 0


In [18]:
# New columns
new_cols = ['FIRSTNIGHT', 'LASTNIGHT', 'MIN_MJD', 'CENTER_MJD', 'MEAN_MJD', 'MAX_MJD']

In [19]:
# To use Table.setdiff, we need to fill in missing string values for SUBTYPE and REF_CAT
tz['SUBTYPE'].fill_value = '--'
tz['REF_CAT'].fill_value = '--'

tb['SUBTYPE'].fill_value = '--'
tb['REF_CAT'].fill_value = '--'

tz = tz.filled()
tb = tb.filled()

## Sanity checks for columns that shouldn't change

In [20]:
# Sanity check that the number of rows with OBTYPE=TGT is the same
Ntgt_z = len(tz[tz['OBJTYPE']=='TGT'])
Ntgt_b = len(tb[tb['OBJTYPE']=='TGT'])

print(f"N(rows) with OBJTYPE=TGT before = {Ntgt_b}")
print(f"N(rows) with OBJTYPE=TGT after  = {Ntgt_z}")

N(rows) with OBJTYPE=TGT before = 2044588
N(rows) with OBJTYPE=TGT after  = 2044588


In [21]:
%%time
# Compare for the common columns (should stay identical)
diff_zcat = setdiff(tz[common_cols], tb[common_cols])

CPU times: user 1min 25s, sys: 6.01 s, total: 1min 31s
Wall time: 1min 31s


In [22]:
# Check results
Ndiff = len(diff_zcat)

print(" ")
if Ndiff==0: print(f"SUCCESS: Values unchanged as expected for: {common_cols}")
else: print("ERROR: unexpected changes in the new VAC")  #Raise exception here?
                   
# For debugging (if there are > 0 results)
print(" ")
print(np.unique(diff_zcat['SURVEY']))
print(np.unique(diff_zcat['PROGRAM']))

 
SUCCESS: Values unchanged as expected for: ['TARGETID', 'SURVEY', 'PROGRAM', 'SPGRPVAL', 'Z', 'ZERR', 'ZWARN', 'CHI2', 'NPIXELS', 'SPECTYPE', 'SUBTYPE', 'NCOEFF', 'DELTACHI2', 'COADD_FIBERSTATUS', 'TARGET_RA', 'TARGET_DEC', 'PMRA', 'PMDEC', 'REF_EPOCH', 'FA_TARGET', 'FA_TYPE', 'OBJTYPE', 'SUBPRIORITY', 'OBSCONDITIONS', 'RELEASE', 'BRICKNAME', 'BRICKID', 'BRICK_OBJID', 'MORPHTYPE', 'EBV', 'FLUX_G', 'FLUX_R', 'FLUX_Z', 'FLUX_W1', 'FLUX_W2', 'FLUX_IVAR_G', 'FLUX_IVAR_R', 'FLUX_IVAR_Z', 'FLUX_IVAR_W1', 'FLUX_IVAR_W2', 'FIBERFLUX_G', 'FIBERFLUX_R', 'FIBERFLUX_Z', 'FIBERTOTFLUX_G', 'FIBERTOTFLUX_R', 'FIBERTOTFLUX_Z', 'MASKBITS', 'SERSIC', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'REF_ID', 'REF_CAT', 'GAIA_PHOT_G_MEAN_MAG', 'GAIA_PHOT_BP_MEAN_MAG', 'GAIA_PHOT_RP_MEAN_MAG', 'PARALLAX', 'PHOTSYS', 'PRIORITY_INIT', 'NUMOBS_INIT', 'CMX_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET', 'SV1_BGS_TARGET', 'SV2_DESI_TARGET', 'SV2_BGS_TARGET', 'SV2_MWS_TARGET', 'SV2_SCND_TARGET', 'SV3_BGS_TARGET', 'SV3_

In [23]:
%%time
fix_cols = fix1_cols
diff_zcat = setdiff(tz[fix_cols], tb[fix_cols])

CPU times: user 17.1 s, sys: 1.81 s, total: 18.9 s
Wall time: 18.9 s


In [24]:
# fix1: 'CMX_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET' --> 0

Ndiff = len(diff_zcat)

print(" ")
if Ndiff==0: print(f"SUCCESS: Values unchanged as expected for: {fix_cols}")
else: print("ERROR: unexpected changes in the new VAC")  #Raise exception here?

 
SUCCESS: Values unchanged as expected for: ['TARGETID', 'SURVEY', 'PROGRAM', 'CMX_TARGET', 'BGS_TARGET', 'MWS_TARGET', 'SCND_TARGET']


## Checks on columns that have been modified

### Function that prints out some basic comparisons

In [25]:
def check_fix_cols(fix_cols, N_correct, large_diff=None):

    diff_zcat = setdiff(tz[fix_cols], tb[fix_cols])
    Ndiff = len(diff_zcat)
    
    print(fix_cols)
    print(F"N(rows) with a difference = {Ndiff}")
    print("Number as expected? ",Ndiff==N_correct)
    print(" ")
    print("Differences found in the following Surveys & Programs: ")
    print(np.unique(diff_zcat['SURVEY']))
    print(np.unique(diff_zcat['PROGRAM']))

    if Ndiff==0:
        return([0])
    # Table '1' = after; Table '2' = before
    test = join(diff_zcat, tb[fix_cols], join_type='left', keys=['TARGETID','SURVEY','PROGRAM'], table_names=['1','2'])

    print("======== STATS FOR INDIVIDUAL COLUMNS ========")
    for col in fix_cols[3:]:
        
        if large_diff:
            # This is not robust to dividing by zero 
            # (well, the value will be excluded so it's fine but it'll print an error)
            # possible improvement: divide by max([col1, col2])
            is_diff = abs((test[col+'_1']-test[col+'_2'])/test[col+'_1'])>large_diff          
        else: 
            is_diff = test[col+'_1']!=test[col+'_2']

        # Is the new value smaller or larger than before the changes?
        is_smaller = is_diff&(test[col+'_1']<test[col+'_2'])
        is_larger = is_diff&(test[col+'_1']>test[col+'_2']) 
            
        Ndiff = len(test[is_diff])
        Nsm = len(test[is_smaller])
        Nlarg = len(test[is_larger])


        if Ndiff>0:
            print(f"Column = {col}")
            print(f"   N(rows with minimum relative difference>{large_diff}) = {Ndiff}")
        if Nsm>0:
            print(f"   N(rows with new smaller value) = {len(test[is_smaller])}, with median={np.median(test[col+'_1'][is_smaller]-test[col+'_2'][is_smaller])}")
        else:
            print(f"   N(rows with new smaller value) = {len(test[is_smaller])}")
        if Nlarg>0:
            print(f"   N(rows with new larger value) = {len(test[is_larger])}, with median={np.median(test[col+'_1'][is_larger]-test[col+'_2'][is_larger])}")
        else:
            print(f"   N(rows with new larger value) = {len(test[is_larger])}")
    
    return(diff_zcat)

### Run one test per cell (~20-25 sec each)

In [26]:
%%time
# Should change for just CMX targets now set to have DESI_TARGET=0
fix2_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', 'DESI_TARGET']

if zcat_type=='healpix':
    N2_correct = 1039
else:
    N2_correct = 0
    
survey2_correct = 'cmx'

diff_zcat = check_fix_cols(fix2_cols, N2_correct)

['TARGETID', 'SURVEY', 'PROGRAM', 'DESI_TARGET']
N(rows) with a difference = 1039
Number as expected?  True
 
Differences found in the following Surveys & Programs: 
SURVEY
------
   cmx
PROGRAM
-------
  other
Column = DESI_TARGET
   N(rows with minimum relative difference>None) = 1039
   N(rows with new smaller value) = 1039, with median=-4294967296.0
   N(rows with new larger value) = 0
CPU times: user 22.8 s, sys: 486 ms, total: 23.3 s
Wall time: 23.2 s


In [27]:
%%time
# Changes for just 42 rows compared to zall-pix-fuji:
#   39 rows for SV1_DESI_TARGET + SV1_SCND_TARGET) + 3 rows for just SV1_MWS_TARGET
fix3_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', \
             'SV1_DESI_TARGET', 'SV1_MWS_TARGET', 'SV1_SCND_TARGET']

if zcat_type=='healpix':
    N3_correct = 42
    survey3_correct = 'sv1'

    diff_zcat = check_fix_cols(fix3_cols, N3_correct)

    print("==============================================================")
    print("CHECK: New values should always be LARGER and in SV1-dark only")
    print("==============================================================")

['TARGETID', 'SURVEY', 'PROGRAM', 'SV1_DESI_TARGET', 'SV1_MWS_TARGET', 'SV1_SCND_TARGET']
N(rows) with a difference = 42
Number as expected?  True
 
Differences found in the following Surveys & Programs: 
SURVEY
------
   sv1
PROGRAM
-------
   dark
Column = SV1_DESI_TARGET
   N(rows with minimum relative difference>None) = 40
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 40, with median=4.611686018427388e+18
Column = SV1_MWS_TARGET
   N(rows with minimum relative difference>None) = 12
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 12, with median=2.0
Column = SV1_SCND_TARGET
   N(rows with minimum relative difference>None) = 39
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 39, with median=34359738368.0
CHECK: New values should always be LARGER and in SV1-dark only
CPU times: user 23.3 s, sys: 644 ms, total: 24 s
Wall time: 23.9 s


In [28]:
%%time

# Expecting for SV3 (bright|dark) for ToO (RELEASE=9999 from decode_targetid)
fix4_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', \
             'SV3_DESI_TARGET', 'SV3_BGS_TARGET', 'SV3_MWS_TARGET', 'SV3_SCND_TARGET']

if zcat_type=='healpix':
    N4_correct = 230
    survey4_correct = 'sv3'
    release4_correct = 9999

    diff_zcat = check_fix_cols(fix4_cols, N4_correct)

    print("==============================================================")
    print("If the change is for ToO then RELEASE = 9999 from TARGETID")
    _,_,release,_,_,_ = decode_targetid(diff_zcat['TARGETID'])
    print(np.unique(release))

['TARGETID', 'SURVEY', 'PROGRAM', 'SV3_DESI_TARGET', 'SV3_BGS_TARGET', 'SV3_MWS_TARGET', 'SV3_SCND_TARGET']
N(rows) with a difference = 230
Number as expected?  True
 
Differences found in the following Surveys & Programs: 
SURVEY
------
   sv3
PROGRAM
-------
 bright
   dark
Column = SV3_DESI_TARGET
   N(rows with minimum relative difference>None) = 230
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 230, with median=4.611686018427388e+18
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 0
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 0
Column = SV3_SCND_TARGET
   N(rows with minimum relative difference>None) = 230
   N(rows with new smaller value) = 0
   N(rows with new larger value) = 230, with median=1.152921504606847e+18
If the change is for ToO then RELEASE = 9999 from TARGETID
TARGETID
--------
    9999
CPU times: user 24.5 s, sys: 611 ms, total: 25.1 s
Wall time: 25 s


In [29]:
# This will change for most surveys/programs but MEAN_PSF_TO_FIBER_SPECFLUX might 
# be expected to always be greater or equal? CHECK!!
fix5_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', \
             'MEAN_DELTA_X', 'RMS_DELTA_X', 'MEAN_DELTA_Y', 'RMS_DELTA_Y', \
             'MEAN_PSF_TO_FIBER_SPECFLUX']

#- TODO: investigate the numbers more closely because small differences at numerical precision level 
#  can influence the exact numbers (expected behavior described below is consistent so far)
if zcat_type=='healpix':
    N5_correct = 150153
else:
    N5_correct = 69220
    
diff_zcat = check_fix_cols(fix5_cols, N5_correct)
print("==============================================================")
print(" Expecting the following for Differences: ")
print("  - Most values of RMS_DELTA_{X|Y} should be smaller (some unchanged)")
print("  - Values of MEAN_DELTA_{X|Y} should be ~equally smaller or larger")
print("  - Most values of MEAN_PSF_TO_FIBER_SPECFLUX should be larger (some unchanged)")

['TARGETID', 'SURVEY', 'PROGRAM', 'MEAN_DELTA_X', 'RMS_DELTA_X', 'MEAN_DELTA_Y', 'RMS_DELTA_Y', 'MEAN_PSF_TO_FIBER_SPECFLUX']
N(rows) with a difference = 138509
Number as expected?  False
 
Differences found in the following Surveys & Programs: 
 SURVEY
-------
    cmx
special
    sv1
    sv2
    sv3
PROGRAM
-------
 backup
 bright
   dark
  other
Column = MEAN_DELTA_X
   N(rows with minimum relative difference>None) = 138332
   N(rows with new smaller value) = 70830, with median=-0.5089362263679504
   N(rows with new larger value) = 67502, with median=0.505854070186615
Column = RMS_DELTA_X
   N(rows with minimum relative difference>None) = 138377
   N(rows with new smaller value) = 132371, with median=-1.0288488864898682
   N(rows with new larger value) = 6006, with median=0.0010273351799696684
Column = MEAN_DELTA_Y
   N(rows with minimum relative difference>None) = 138287
   N(rows with new smaller value) = 68680, with median=-0.5090000033378601
   N(rows with new larger value) = 696

In [30]:
# Should change for most rows because of cos(dec) term
fix6_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', 'STD_FIBER_RA']

#- TODO: investigate the numbers more closely because small differences at numerical precision level 
#  can influence the exact numbers (expected behavior described below is consistent so far)
if zcat_type=='healpix':
    N6_correct = 1434375
else:
    N6_correct = 1197838

diff_zcat = check_fix_cols(fix6_cols, N6_correct)
print("==============================================================")
print("Expecting most values of STD_FIBER_RA to be smaller")

['TARGETID', 'SURVEY', 'PROGRAM', 'STD_FIBER_RA']
N(rows) with a difference = 1434375
Number as expected?  True
 
Differences found in the following Surveys & Programs: 
 SURVEY
-------
    cmx
special
    sv1
    sv2
    sv3
PROGRAM
-------
 backup
 bright
   dark
  other
Column = STD_FIBER_RA
   N(rows with minimum relative difference>None) = 1434375
   N(rows with new smaller value) = 1433215, with median=-0.03680609166622162
   N(rows with new larger value) = 1160, with median=0.008712729439139366
Expecting most values of STD_FIBER_RA to be smaller


In [31]:
# Without the STD_FIBER_RA which has a cos(dec) term
fix7_cols =  ['TARGETID', 'SURVEY', 'PROGRAM', \
             'MEAN_FIBER_RA', 'MEAN_FIBER_DEC', 'STD_FIBER_DEC']

if zcat_type=='healpix':
    N7_correct = 155330
else:
    N7_correct = 516125

diff_zcat = check_fix_cols(fix7_cols, N7_correct)
print("==============================================================")
print(" Expecting the following for Differences: ")
print("  - Most values of STD_FIBER_DEC should be smaller (some unchanged)")
print("  - Values of MEAN_FIBER_{RA|DEC} should be ~equally smaller or larger")
print("==============================================================")
print(" ")
print(" Besides small differences, a subset will have >10% changes ")
print(" Namely, ~10-15k rows should have larger MEAN_FIBER_{RA|DEC}")
print(" ")
print(" Warning: this calculation is not relevant for STD_FIBER_DEC. ")
print(" ")

# Call again with a threshold of 10% difference
diff_zcat = check_fix_cols(['TARGETID', 'SURVEY', 'PROGRAM', \
             'MEAN_FIBER_RA', 'MEAN_FIBER_DEC'], N7_correct,large_diff=0.10)


['TARGETID', 'SURVEY', 'PROGRAM', 'MEAN_FIBER_RA', 'MEAN_FIBER_DEC', 'STD_FIBER_DEC']
N(rows) with a difference = 155330
Number as expected?  True
 
Differences found in the following Surveys & Programs: 
 SURVEY
-------
    cmx
special
    sv1
    sv2
    sv3
PROGRAM
-------
 backup
 bright
   dark
  other
Column = MEAN_FIBER_RA
   N(rows with minimum relative difference>None) = 155326
   N(rows with new smaller value) = 69893, with median=-0.002655959640179617
   N(rows with new larger value) = 85433, with median=0.004128272000656352
Column = MEAN_FIBER_DEC
   N(rows with minimum relative difference>None) = 155327
   N(rows with new smaller value) = 73457, with median=-0.002289701038964864
   N(rows with new larger value) = 81870, with median=0.0027925135916015975
Column = STD_FIBER_DEC
   N(rows with minimum relative difference>None) = 155328
   N(rows with new smaller value) = 151217, with median=-13.61166763305664
   N(rows with new larger value) = 4111, with median=0.010217368602

## Check on new columns

In [32]:
# New columns
new_cols = ['FIRSTNIGHT', 'LASTNIGHT', 'MIN_MJD', 'MEAN_MJD', 'MAX_MJD']

In [33]:
# Convert table to dataframe to use describe()
df_new = tz[new_cols].to_pandas()

In [34]:
res = df_new.describe()

In [35]:
# Add a column to compare the stats of MAX_MJD - the stats of MIN_MJD
res['MAX_MIN_DIFF'] = res['MAX_MJD'] - res['MIN_MJD']

In [36]:
res

Unnamed: 0,FIRSTNIGHT,LASTNIGHT,MIN_MJD,MEAN_MJD,MAX_MJD,MAX_MIN_DIFF
count,2451325.0,2451325.0,2451325.0,2451325.0,2451325.0,0.0
mean,20210020.0,20210160.0,59301.67,59306.09,59312.24,10.565186
std,1761.005,1491.326,41.60221,38.46983,34.91973,-6.68248
min,20201210.0,20201220.0,59198.18,59198.2,59198.22,0.040346
25%,20210310.0,20210330.0,59289.12,59293.14,59305.34,16.223611
50%,20210410.0,20210410.0,59315.22,59317.27,59318.35,3.132923
75%,20210420.0,20210500.0,59330.13,59333.68,59338.3,8.168697
max,20210610.0,20210610.0,59376.21,59376.21,59376.21,0.0


## Check on edge case tile

Tile 80870 is duplicated and includes two values of `LASTNIGHT = [20210512, 20210513]` rather than a single one. Check that both entries still exist, implying that there will be N=10,000 rows instead of 5,000 rows for a normal single tile.

In [37]:
if zcat_type=='cumulative':
    tiledup = 80870
    
    istile = tz['TILEID']==tiledup
    
    print(len(tz[istile]))
    print(np.unique(tz['LASTNIGHT'][istile]))

## Remaining issues for future action

In [38]:
npixzero = tz['NPIXELS']==0

print(np.unique(tz['ZWARN'][npixzero]))
#print(np.unique(tz['COADD_FIBERSTATUS'][npixzero]))  # a lot of different values

print(f"Cases with NPIXELS=0 and ZWARN=0: {len(tz[npixzero&(tz['ZWARN']==0)])}")
print(f"Cases with NPIXELS=0 and COADD_FIBERSTATUS=0: {len(tz[npixzero&(tz['COADD_FIBERSTATUS']==0)])}")

ZWARN
-----
 1570
 1571
 1698
 1699
 3618
 3619
 3746
 3747
Cases with NPIXELS=0 and ZWARN=0: 0
Cases with NPIXELS=0 and COADD_FIBERSTATUS=0: 614


In [39]:
tz[npixzero][:5]

TARGETID,SURVEY,PROGRAM,HEALPIX,SPGRPVAL,Z,ZERR,ZWARN,CHI2,COEFF,NPIXELS,SPECTYPE,SUBTYPE,NCOEFF,DELTACHI2,COADD_FIBERSTATUS,TARGET_RA,TARGET_DEC,PMRA,PMDEC,REF_EPOCH,FA_TARGET,FA_TYPE,OBJTYPE,SUBPRIORITY,OBSCONDITIONS,RELEASE,BRICKNAME,BRICKID,BRICK_OBJID,MORPHTYPE,EBV,FLUX_G,FLUX_R,FLUX_Z,FLUX_W1,FLUX_W2,FLUX_IVAR_G,FLUX_IVAR_R,FLUX_IVAR_Z,FLUX_IVAR_W1,FLUX_IVAR_W2,FIBERFLUX_G,FIBERFLUX_R,FIBERFLUX_Z,FIBERTOTFLUX_G,FIBERTOTFLUX_R,FIBERTOTFLUX_Z,MASKBITS,SERSIC,SHAPE_R,SHAPE_E1,SHAPE_E2,REF_ID,REF_CAT,GAIA_PHOT_G_MEAN_MAG,GAIA_PHOT_BP_MEAN_MAG,GAIA_PHOT_RP_MEAN_MAG,PARALLAX,PHOTSYS,PRIORITY_INIT,NUMOBS_INIT,CMX_TARGET,DESI_TARGET,BGS_TARGET,MWS_TARGET,SCND_TARGET,SV1_DESI_TARGET,SV1_BGS_TARGET,SV1_MWS_TARGET,SV1_SCND_TARGET,SV2_DESI_TARGET,SV2_BGS_TARGET,SV2_MWS_TARGET,SV2_SCND_TARGET,SV3_DESI_TARGET,SV3_BGS_TARGET,SV3_MWS_TARGET,SV3_SCND_TARGET,PLATE_RA,PLATE_DEC,COADD_NUMEXP,COADD_EXPTIME,COADD_NUMNIGHT,COADD_NUMTILE,MEAN_DELTA_X,RMS_DELTA_X,MEAN_DELTA_Y,RMS_DELTA_Y,MEAN_FIBER_RA,STD_FIBER_RA,MEAN_FIBER_DEC,STD_FIBER_DEC,MEAN_PSF_TO_FIBER_SPECFLUX,TSNR2_GPBDARK_B,TSNR2_ELG_B,TSNR2_GPBBRIGHT_B,TSNR2_LYA_B,TSNR2_BGS_B,TSNR2_GPBBACKUP_B,TSNR2_QSO_B,TSNR2_LRG_B,TSNR2_GPBDARK_R,TSNR2_ELG_R,TSNR2_GPBBRIGHT_R,TSNR2_LYA_R,TSNR2_BGS_R,TSNR2_GPBBACKUP_R,TSNR2_QSO_R,TSNR2_LRG_R,TSNR2_GPBDARK_Z,TSNR2_ELG_Z,TSNR2_GPBBRIGHT_Z,TSNR2_LYA_Z,TSNR2_BGS_Z,TSNR2_GPBBACKUP_Z,TSNR2_QSO_Z,TSNR2_LRG_Z,TSNR2_GPBDARK,TSNR2_ELG,TSNR2_GPBBRIGHT,TSNR2_LYA,TSNR2_BGS,TSNR2_GPBBACKUP,TSNR2_QSO,TSNR2_LRG,SV_NSPEC,SV_PRIMARY,ZCAT_NSPEC,ZCAT_PRIMARY,MIN_MJD,MEAN_MJD,MAX_MJD,FIRSTNIGHT,LASTNIGHT
int64,bytes7,bytes6,int32,int32,float64,float64,int64,float64,float64[10],int64,bytes6,bytes20,int64,float64,int32,float64,float64,float32,float32,float32,int64,uint8,bytes3,float64,int32,int16,bytes8,int32,int32,bytes4,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,int16,float32,float32,float32,float32,int64,bytes2,float32,float32,float32,float32,bytes1,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,int64,float64,float64,int16,float32,int16,int16,float32,float32,float32,float32,float64,float32,float64,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,int32,bool,int64,bool,float64,float64,float64,int32,int32
39628473198708395,cmx,other,2154,2154,-0.0019956912923479,4.131149357334911e-48,1570,8.999999999999996e+99,0.0 .. 0.0,0,STAR,CV,3,1.942668892225729e+84,512,23.66196767736725,29.84758879289675,0.0,0.0,2020.9597,9007199254742016,1,TGT,0.3743222091683128,7,9010,,494512,1707,DEV,0.056008916,0.8742358,4.4879527,14.53286,40.183647,23.470558,846.09424,161.24467,27.071745,-1.0,-1.0,0.30432662,1.5622828,5.0589743,0.30432662,1.5622828,5.0589743,0,4.0,1.4857041,-0.47312373,0.34610084,0,--,0.0,0.0,0.0,0.0,S,3200,1,9007199254742016,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23.66196767736725,29.84758879289675,0,0.0,0,0,0.0,0.0,0.0,0.0,23.6619676773673,0.0,29.8475887928968,0.0,0.7702122,334.5758,0.23833227,63.154266,251.84634,1200.8414,489.7509,6.6365247,1.7938427,29931.836,67.48044,5253.2964,0.107736714,5998.398,33749.49,20.86845,95.85028,4.4280867e-05,226.88919,8.178434e-06,0.0,9751.99,5.995135e-05,48.05161,102.743744,30266.412,294.60797,5316.4507,251.95409,16951.23,34239.24,75.55658,200.38788,0,False,1,True,59200.06640136,59200.095110125,59200.12381137,20201216,20201216
39628473198711342,cmx,other,2152,2152,-0.0019956912923479,4.131149357334911e-48,1570,8.999999999999996e+99,0.0 .. 0.0,0,STAR,CV,3,1.942668892225729e+84,512,23.80220668826011,29.832150182607567,0.0,0.0,2020.9597,1280,1,TGT,0.7489200605083568,5,9010,,494512,4654,DEV,0.053667612,2.6539938,11.6347475,23.4123,50.11407,35.76747,595.7886,118.594986,25.252254,-1.0,-1.0,1.2104546,5.3064675,10.678067,1.2104546,5.3064675,10.678067,0,4.0,0.6114804,-0.016180348,-0.099382535,0,--,0.0,0.0,0.0,0.0,S,3200,1,1280,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23.80220668826011,29.832150182607567,0,0.0,0,0,0.0,0.0,0.0,0.0,23.8022066882601,0.0,29.832150182607588,0.0,0.7802256,336.3489,0.24101405,63.501137,254.75517,1214.5858,492.71896,6.7145753,1.8078632,30322.682,68.46811,5322.6733,0.10913243,6050.4844,34205.188,21.129179,97.14442,4.468534e-05,229.07835,8.253198e-06,0.0,9876.345,6.0501185e-05,48.593376,103.95344,30659.031,297.78748,5386.1743,254.8643,17141.414,34697.906,76.43713,202.90572,0,False,1,True,59200.06640136,59200.095110125,59200.12381137,20201216,20201216
39628473202903539,cmx,other,2152,2152,-0.0019956912923479,4.131149357334911e-48,1570,8.999999999999996e+99,0.0 .. 0.0,0,STAR,CV,3,1.942668892225729e+84,512,23.9976460870853,29.8298292545281,0.0,0.0,2020.9597,4096,1,TGT,0.0739111871605955,1,9010,,494513,2547,PSF,0.04923722,0.36979324,0.9976344,1.748319,2.46375,3.3211248,1973.6221,577.8516,77.217896,-1.0,-1.0,0.28765,0.776027,1.3599598,0.28765,0.776027,1.3599598,0,0.0,0.0,0.0,0.0,0,--,0.0,0.0,0.0,0.0,S,3400,1,4096,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23.9976460870853,29.8298292545281,0,0.0,0,0,0.0,0.0,0.0,0.0,23.9976460870853,0.0,29.8298292545281,0.0,0.789,354.91718,0.2617178,66.83243,279.138,1302.0255,514.57477,7.2949877,1.9679965,32698.225,72.3826,5724.0327,0.12486173,6486.4995,36558.098,22.485813,103.08629,5.2723684e-05,243.56364,9.7335205e-06,0.0,10398.31,7.1218026e-05,51.462486,110.08269,33053.14,316.20795,5790.865,279.26285,18186.834,37072.67,81.243286,215.13696,0,False,1,True,59200.06640136,59200.095110125,59200.12381137,20201216,20201216
39628473207095521,cmx,other,2153,2153,-0.0019956912923479,4.131149357334911e-48,1570,8.999999999999996e+99,0.0 .. 0.0,0,STAR,CV,3,1.942668892225729e+84,512,24.163411373260956,29.79216658799389,0.0,0.0,2020.9597,2048,1,TGT,0.555666204166491,3,9010,,494514,225,PSF,0.045480132,0.30380702,0.3845816,1.4267591,8.489307,7.9283123,1792.5405,496.92365,77.64431,-1.0,-1.0,0.23662426,0.29953662,1.1112508,0.23662426,0.29953662,1.1112508,0,0.0,0.0,0.0,0.0,0,--,0.0,0.0,0.0,0.0,S,3000,1,2048,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24.163411373260956,29.79216658799389,0,0.0,0,0,0.0,0.0,0.0,0.0,24.163411373261,0.0,29.792166587993894,0.0,0.789,337.05124,0.2431455,63.58909,253.58008,1227.6969,492.40305,6.760502,1.8641728,31537.506,69.669136,5534.6167,0.12165418,6253.054,35545.855,21.641685,99.22704,5.071971e-05,230.93983,9.3810295e-06,0.0,9997.022,6.897441e-05,49.029175,104.93886,31874.557,300.8521,5598.2056,253.70174,17477.773,36038.258,77.431366,206.03008,0,False,1,True,59200.06640136,59200.095110125,59200.12381137,20201216,20201216
39628478437397782,cmx,other,2154,2154,-0.0019956912923479,4.131149357334911e-48,1570,8.999999999999996e+99,0.0 .. 0.0,0,STAR,CV,3,1.942668892225729e+84,512,23.004779136437485,30.119407147896,0.0,0.0,2020.9597,2048,1,TGT,0.8550797457575564,3,9010,,495761,5398,DEV,0.06340205,0.66084456,1.152647,3.5934534,9.887055,9.529856,921.0356,261.8495,31.15059,-1.0,-1.0,0.24227671,0.42257974,1.3174204,0.24281985,0.45946148,1.447525,0,4.0,1.2109119,0.2759375,-0.31842917,0,--,0.0,0.0,0.0,0.0,S,3000,1,2048,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23.004779136437485,30.119407147896,0,0.0,0,0,0.0,0.0,0.0,0.0,23.0047791364375,0.0,30.119407147896005,0.0,0.77242905,657.1011,0.2452533,121.30457,132.81993,1427.7833,882.08527,6.589169,2.9969666,31845.826,75.80818,5568.016,0.09411048,6172.1987,35468.086,22.631115,105.63433,5.724519e-05,226.35516,1.0555282e-05,0.0,10588.217,7.692063e-05,49.46934,107.10196,32502.928,302.4086,5689.321,132.91405,18188.2,36350.17,78.68962,215.73325,0,False,1,True,59200.06640136,59200.095110125,59200.12381137,20201216,20201216


## Compare with targeting patch file

This is for rare edge cases with conflicting/missing info, including a few secondary targets and ToO (target of opportunity) observations. This patch file is provided along the VAC for information/completeness purposes but the corrections are *already applied*.

In [40]:
patchfile = vac_dir+"zall-pix-targeting-patch-edr.fits"
tg = Table.read(patchfile)
print(f"Check if N=272: {(len(tg)==272)}")

Check if N=272: True


In [41]:
fits.info(patchfile)

Filename: ../../fuji/nersc/zall-pix-targeting-patch-edr.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1  TARGETPATCH    1 BinTableHDU     49   272R x 20C   [K, 7A, 6A, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K, K]   


In [42]:
# Patch file is only used for zpix and not for ztile; this cell of for information (reporting how many row should have been patched)
# can uncomment to examine example rows or could add checks to compare values

if zcat_type=='healpix':
    surveys = np.unique(tg['SURVEY'])
    for survey in surveys:

        programs = np.unique(tg['PROGRAM'][tg['SURVEY']==survey])
        for program in programs:
            tg_sel = tg[(tg['SURVEY']==survey)&(tg['PROGRAM']==program)]
            tz_sel = tz[(tz['SURVEY']==survey)&(tz['PROGRAM']==program)]
            need_patch = np.isin(tz_sel['TARGETID'], tg_sel['TARGETID'])

            Npatch = len(tz_sel[need_patch])
            print(f"In Survey={survey}; Program={program}; Nb that need targeting correction= {Npatch} [{np.round(Npatch/len(tz_sel)*100,3)}%]")

            # Uncomment to print the first 10 results of the corrected table and patching table
    #        print(tz_sel['TARGETID','SV1_DESI_TARGET','SV1_MWS_TARGET','SV3_DESI_TARGET'][need_patch][:10])
    #        print(tg_sel['TARGETID','SV1_DESI_TARGET','SV1_MWS_TARGET','SV3_DESI_TARGET'][:10])

In Survey=sv1; Program=dark; Nb that need targeting correction= 42 [0.011%]
In Survey=sv3; Program=bright; Nb that need targeting correction= 143 [0.025%]
In Survey=sv3; Program=dark; Nb that need targeting correction= 87 [0.012%]
