In [10]:
# means that my matplotlib graphs will be included in the notebook, next to the code
%matplotlib inline

import os

import math
import astropy
import random
import numpy as np
import tables as tb
import pandas as pd
import matplotlib.pyplot as plt

from astropy.table import Table, Column, join
from astropy.coordinates import SkyCoord
from astropy.io import fits
import astropy.units as u

from hetdex_tools.get_spec import get_spectra
from hetdex_api.config import HDRconfig
from hetdex_api.detections import Detections
from hetdex_api.elixer_widget_cls import ElixerWidget

In [2]:
# not sure why the code below is here, it was in the Detections database and API notebook
# https://github.com/HETDEX/hetdex_api/blob/master/notebooks/api-notebooks/03-Detections_Database_and_API.ipynb

In [3]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

### Opens the catalogs and turns them into dataframes

I like to open both catalogs separately since they are both big (HDR3 especially!)

In [4]:
# Opening HDR3 detections catalog  ** double check this <-- statement ** and converting it into a pandas DF
# HDR3 is detections HETDEX found
HDR_source_cat = fits.open('/home/jovyan/Hobby-Eberly-Telesco/hdr3/catalogs/source_catalog_3.0.1.fits', memmap = True)
HDR3_data = HDR_source_cat[1].data
HDR3_DF = pd.DataFrame(HDR3_data, columns=HDR3_data.columns.names)

In [5]:
# Opening H20 NEP catalog and converting it into a pandas DF
# H20 is stricly photometry sources
H20_NEP_catalog = fits.open('H20_NEP_VIRUS_OVERLAP_CAT_10_2021.fits', memmap = True)
H20_NEP_data = H20_NEP_catalog[1].data
H20_NEP_DF = pd.DataFrame(H20_NEP_data, columns=H20_NEP_data.columns.names)

In [6]:
# Columns we will then take from the entire data set (it was huge so we needed to determine what we wanted to look at specifically).
# As the name suggests, these are the ones that are useful to us!
useful_hdr3_cols = ['source_id', 'detectid',  'selected_det', 'ra_mean', 'dec_mean', 'fwhm', 'shotid', 'field',  'ra', 'dec', 'wave', 'wave_err', 'flux', 'flux_err', 'sn', 'sn_err', 'chi2', 'chi2_err',
'linewidth', 'linewidth_err', 'plya_classification', 'z_hetdex', 'z_hetdex_conf', 'combined_plae']

# For now, the only useful columns for us in H20 NEP is RA and DEC.
useful_h20nep_cols = ['RA_MODELING', 'DEC_MODELING', 'VALID_SOURCE_MODELING']

# From the original DF, taking the useful columns
reduced_hdr3_df = HDR3_DF.loc[:, useful_hdr3_cols]
reduced_h20nep_df = H20_NEP_DF.loc[:, useful_h20nep_cols]

### Cleaning up the data

In [11]:
# Removing data from before 2017 because it isn't good (not useful to us)
# No need to do this for H20 NEP
removed_bad_shots_hdr3_df = reduced_hdr3_df[reduced_hdr3_df.shotid.values >= 20180000000]

### Filtering data. For HDR3 we use a signal to noise greater than 6.5 and for H20 NEP we check if the VALID_SOURCE_MODELING is true

In [12]:
# This will give high confidence detections. Something we would want to do also. What is sn threshold that Valentina's code is having trouble with.
# Reason why, we want high-confidence Lya. If we are very confident sn and another filter, then that's what we consider high-conf lya.
# Once noise and high-confidence sample. We can start exanping on valentina's code and do our own stuff
hdr3_signal_to_noise_interval = removed_bad_shots_hdr3_df[removed_bad_shots_hdr3_df['sn'] > 6.5]

# For now, no need to specify a field. But once trained, we want to run this for the NEP field!
# The VALID_SOURCE_MODELING column which just tells us that the model was able to converge and get fluxes from the source
# False valid source modeling means that the model used to measure the fluxes failed somehow so we cannot use that galaxy reliably
# We want the true ones since we know the model was able to find a galaxy and we can use that for 
# our imaging counterpart identification (ie: to use these galaxy to check if there is a galaxy at our new extraction coordinate)
h20_valid_source_check = reduced_h20nep_df[reduced_h20nep_df['VALID_SOURCE_MODELING'] == True]

## Making two skycoord objects for both catalogs
We do this (instead of directly looking at the dataframes cause when we exract we get fibers from nearby as well. We don't want that.

In [157]:
hdr3_skycoord = SkyCoord(hdr3_signal_to_noise_interval['ra'] * u.deg, hdr3_signal_to_noise_interval['dec'] * u.deg)
h20_skycoord = SkyCoord(h20_valid_source_check['RA_MODELING'] * u.deg, h20_valid_source_check['DEC_MODELING'] * u.deg)

In [274]:
# Getting a random coordinate from the h20 catalog. I didn't get from the hdr3 catalog 
# because the hdr3 one is so much bigger, so it could happen where we dont have imaging overlap
random_coord = h20_skycoord[random.randint(0, h20_skycoords.size)]
# Creating an offseted skycoord
offset_skycoord = random_coord.directional_offset_by(200 * u.arcsec, 200 * u.arcsec)

In [275]:
# Creating two variables holding the separation of both catalogs, returns angles
sep_hdr3 = offset_skycoord.separation(hdr3_skycoord)
sep_h20 = offset_skycoord.separation(h20_skycoord)

In [276]:
# Checking if the separation is less than 1.2 arcseconds (arbitrary value)
check_hdr3 = (sep_hdr3 < 1.2 * u.arcsec)
check_h20 = (sep_h20 < 1.2 * u.arcsec)

In [278]:
# If there is a true value in any of the checks, we need to find a new source because it means there was a close match
if (True in check_hdr3) or (True in check_h20):
    print('true')
else:
    # we extract!!
    print('extract')
    extraction = get_spectra(offset_skycoord)

[INFO - 2022-10-21 03:27:54,545] Finding shots of interest


extract


[INFO - 2022-10-21 03:28:00,685] Number of shots of interest: 7
[INFO - 2022-10-21 03:28:00,687] Extracting 7 sources
[INFO - 2022-10-21 03:28:00,857] Working on shot: 20210608013
[INFO - 2022-10-21 03:28:00,857] Working on shot: 20210611013
[INFO - 2022-10-21 03:28:00,857] Working on shot: 20210616018
[INFO - 2022-10-21 03:28:00,858] Working on shot: 20210612014
[INFO - 2022-10-21 03:28:00,858] Working on shot: 20210604005
[INFO - 2022-10-21 03:28:00,858] Working on shot: 20210514007
[INFO - 2022-10-21 03:28:00,869] Working on shot: 20210708012
[INFO - 2022-10-21 03:28:01,913] Extracting 1
[INFO - 2022-10-21 03:28:02,461] Extraction of sources completed in 0.03 minutes.
[INFO - 2022-10-21 03:28:02,509] Retrieved 1 spectra.


0 means we change offset or another random coordinate

1 great! Makes our job easier, we have noise spectrum

2 or more, what do we do? Bounce off ideas. "Weighted stacking" might be an option! However might not be ideal for comparing to real data.

In [279]:
extraction

ID,shotid,wavelength,spec,spec_err,apcor,flag,gal_flag,amp_flag,meteor_flag
Unnamed: 0_level_1,Unnamed: 1_level_1,Angstrom,1e-17 erg / (Angstrom cm2 s),1e-17 erg / (Angstrom cm2 s),Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
int64,int64,float64[1036],float64[1036],float64[1036],float64[1036],int64,int64,int64,int64
1,20210611013,3470.0 .. 5540.0,-1.0780397447165955 .. 0.5174640528847851,5.454460321738501 .. 1.9303348480370321,0.8050533501696622 .. 0.8708031443982315,1,1,1,1


In [54]:
# picks a random source for us, the 1 means a single random source.
# Might need to find a different way of getting a random source, just because I don't think
# there's a way to control which source this gets, so it'll always be a different source!
# Or just run this once
random_source = hdr3_signal_to_noise_interval.sample(n = 1)

In [55]:
random_source

Unnamed: 0,source_id,detectid,selected_det,ra_mean,dec_mean,fwhm,shotid,field,ra,dec,...,sn,sn_err,chi2,chi2_err,linewidth,linewidth_err,plya_classification,z_hetdex,z_hetdex_conf,combined_plae
781047,3010000795359,3007753020,True,222.453506,53.972984,1.730452,20200513016,dex-spring,222.453506,53.972984,...,9.15,0.91,1.87,0.22,8.62,1.83,0.519791,-3.066024e-07,0.8,6.703052


In [56]:
# We are using an offset to try to pick an area where there is no source.
# This is a value we are experimenting with, going to go with 20 arcseconds for now
offset = 20 * u.arcsec
# need to convert to degrees so I can add to the ra and dec in catalog,
# the ra and dec in catalog are in degrees
offset = offset.to('deg')

In [57]:
# Applying an offset, then we check if there is a source at the offset!
# I'm using random_source.values[0][8] because it's the entire number, instead of a truncated/rounded version
delta_ra = random_source.values[0][8] + offset.value
delta_dec = random_source.values[0][9] + offset.value

### Checking if the delta ra and delta dec are in either of the catalogs

In [73]:
# This truth_check df will check if the delta_ra and delta_dec are in the catalog. 
# If the size of this df is 0, then there is no source in this catalog with those specific ra and dec.
truth_check_hdr3 = signal_to_noise_interval[(hdr3_signal_to_noise_interval['ra'] == delta_ra)  
                                            & (hdr3_signal_to_noise_interval['dec'] == delta_dec)]

# Same logic as above
truth_check_h20 = valid_source_check[(h20_valid_source_check['RA_MODELING'] == delta_ra )
                                   & (h20_valid_source_check['DEC_MODELING'] == delta_dec)] 

### If the size of both DF's is 0, now we extract!

Got a bit confused here with the slack messages. Also, wanted to make sure my previous code was right before continuing.

"You should have two RA and DEC skycoords for every coordinate in both catalogs and compare the new coordinate to these two"


In [61]:
sky_coords = SkyCoord(delta_ra, delta_dec, frame = 'icrs', unit = 'deg')

In [62]:
sky_coords

<SkyCoord (ICRS): (ra, dec) in deg
    (222.45906203, 53.97853987)>

### Ran multiple times, sometimes get 0 spectra and sometimes 1 or 2, do we want 0?

In [63]:
get_spectra(sky_coords)

[INFO - 2022-10-14 03:19:25,614] Finding shots of interest
[INFO - 2022-10-14 03:19:32,088] Number of shots of interest: 1
[INFO - 2022-10-14 03:19:32,090] Extracting 1 sources
[INFO - 2022-10-14 03:19:32,292] Working on shot: 20200513016
[INFO - 2022-10-14 03:19:36,488] Extraction of sources completed in 0.07 minutes.
[INFO - 2022-10-14 03:19:36,590] Retrieved 0 spectra.


ID,shotid,wavelength,spec,spec_err,apcor,flag,gal_flag,amp_flag,meteor_flag
Unnamed: 0_level_1,Unnamed: 1_level_1,Angstrom,1e-17 erg / (Angstrom cm2 s),1e-17 erg / (Angstrom cm2 s),Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
float64,float64,float64,float64,float64,float64,int64,int64,int64,int64


Once spectrum, check length of table. If 0 no fiber coverage, then need to change things around to get a spectrum. No spectrum means wasn't able to get fiber.

## NOTES

We want no source. This catalog has HETDEX detections one. 

    Want to make sure:
        1.No hetdection detec
        
        2. No imaging counterpart. Do some cross-matching. Gives us a 0 and THEN we extract. Want to extract in basically empty space. Start with 100. 
        
            Coordinates still. Trying to see if no match with the .fits file.

Start with detection. One approach was fits file with coordinates. 

Or 

Use this but expand upon it. Find RA and DEC of each shot. And randomly extract.

    delta ra and delta dec. Double check if is there a source there. 
    
For noise sample, no need to run through valentina's code. Only focus on High-z after filtering through Valentina's code.

Once we have noise sample.

Run through valentina's code. Hopefully it detects them all as high-z. Cause neither low-z or star.

Two skycoords. Check coordinates to see if HETDEX detection is there. Compare minimum separation. If the difference is smaller than 3 arcseconds. Then there is a source there, so do not extract there.