## Make multiwavelength light curves using archival data

### Summary:
 - model plots after van Velzen et al. 2021, https://arxiv.org/pdf/2111.09391.pdf
 
### Input:
 - a catalog of CLAGN from the literature

### Output:
 - an archival optical + IR + neutrino light curve
 
### Technical Goals:
 - should be able to run from a clean checkout from github
 - should be able to automatically download all catalogs & images used
 - need to have all photometry in the same physical unit
 - need to have a data structure that is easy to use but holds light curve information (time and units) and is extendable to ML applications
 - need to have a curated list of catalogs to search for photometry that is generalizeable to other input catalogs
 
### Authors:
IPAC SP team

### Acknowledgements:
Suvi Gezari, Antara Basu-zych,
MAST, HEASARC, & IRSA Fornax teams

In [1]:
import numpy as np
import time

from astroquery.ipac.ned import Ned
from astroquery.heasarc import Heasarc
from astroquery.gaia import Gaia

from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.table import Table, vstack, hstack

## 1. Define the Sample

In [2]:
# use the following paper to make a sample of CLAGN: https://iopscience.iop.org/article/10.3847/1538-4357/aaca3a 

# This sample can later be switched out to a differen/larger sample of "interesting" targets

#use ADS to find the refcode for this paper
CLAGN = Ned.query_refcode('2018ApJ...862..109Y')


### What is the best data structure for this work?
 - needs to hold multiwavelength light curves
 - understands both time and units on fluxes
 - would like to know if whatever we choose can be scaled up to make light curves of the while WISE sample
 - some things to look into
     - astropy has a light curve class
         -would probably need to do some development work to make this work for multiwavelength application
     - LINCC people are interested in this and might have some suggestions on a 6mo. timescale
     - xarray
     - pandas might have more unit support now than before
     - what is ZTF using?
     - what did Dave do in his WISE parquet files?
     
- One suggestion is that instead of one large dataframe with the multiwavelength information, we keep them as seperate astropy light curves for each band, do the feature extraction on each light curve and keep the features in one large dataframe.

In [3]:
type(CLAGN)

astropy.table.table.Table

In [4]:
#### Build a list of skycoords from target ra and dec #####
coords_list = [
    SkyCoord(ra, dec, frame='icrs', unit='deg')
    for ra, dec in zip(CLAGN['RA'], CLAGN['DEC'])
]


## 2. Find photometry for these targets in NASA catalogs
- look at NAVO use cases to get help with tools to do this - although they mostly use pyvo
- deciding up front to use astroquery instead of pyvo
    - astroquery is apparently more user friendly
- data access concerns:
    - can't ask the archives to search their entire holdings
        - not good enough meta data
        - not clear that the data is all vetted and good enough to include for science
        - all catalogs have differently named columns so how would we know which columns to keep
    - instead work with a curated list of catalogs for each archive
        - focus on general surveys
        - try to ensure that this list is also appropriate for a generalization of this use case to other input catalogs
        - could astroquery.NED be useful in finding a generalized curated list
- How do we know we have a match that is good enough to include in our light curve
     - look at nway for the high energy catalogs
     - probably need to generate a table of search radii for each catalog based on bandpass
         - need domain knowledge for that
     


### HEASARC (Krick)
- asked Antara for help making a curated list of catalogs
- Suvi mentioned scientifically sensible to include Fermi Gamma ray photometry

In [None]:
#list all the available HEASARC missions
heasarc = Heasarc()
mission_table = heasarc.query_mission_list()
#mission_table.pprint_all()



In [None]:
#figure out what the column names are in one of the catalogs
cols = heasarc.query_mission_cols(mission='fermi3fgl')
#cols


In [None]:
#For all CLAGN coords in the paper
c = 1 #just playing with astroquery query_region
#do a query on position
mission = 'fermi3fgl'
radius = 0.1*u.degree
results = heasarc.query_region(coords(c), mission = mission, radius = radius, sortvar = 'SEARCH_OFFSET_')
#if there is a good match where good = ??
#save the found photometry in the chosen data structure
        

### IRSA

astroquery.ipac.irsa 

 - need to make a curated list of catalogs here
     - ZTF (Faisst)
 
     - WISE (Krick)
         - use Dave Shupe's light curve catalog parquet file /irsa-data-download10/parquet-work/NEOWISE-R/neowise_lc_half.parquet
         - can use existing code in https://github.com/IPAC-SW/ipac-sp-notebooks/blob/main/catwise_variables/nhel_xgboost.ipynb to access and work with this catalog
         - will need to work on how to efficiently search that catalog since it is too big to fit in memory
             - re-do work on Vaex and Dask and Spark
         - Do we need updates to this catalog from Dave?
             - once concern is that it is only half sky, hopefully enough of our targets are in the catalog


### MAST (Krick)

- astroquery MAST doesn't require a catalog input but we might want it to narrow things down?
    - which catalogs are interesting?
        - Pan-STARRS
        - need to ask someone at MAST for a curated list of catalogs to search
        
- MAST has copies of ATLAS all-sky stellar reference catalog- but not searchable
     - might be available through astroquery.vizier
    


## 3. Find photometry for these targets in relevant, non-NASA catalogs


### Gaia (Faisst)
- astroquery.gaia will presumably work out of the box for this

In [6]:
############ EXTRACT GAIA DATA FOR OBJECTS ##########

## Select Gaia table (DR3)
Gaia.MAIN_GAIA_TABLE = "gaiaedr3.gaia_source"

## Define search radius
radius = u.Quantity(20, u.arcsec)

## Search and Cross match.
# This can be done in a smarter way by matching catalogs on the Gaia server, or grouping the
# sources and search a larger area.

# get catalog
gaia_table = Table()
t1 = time.time()
for cc,coord in enumerate(coords_list):
    print(len(coords_list)-cc , end=" ")

    gaia_search = Gaia.cone_search_async(coordinate=coord, radius=radius , background=True)
    gaia_search.get_data()["dist"].unit = "deg"
    gaia_search.get_data()["dist"] = gaia_search.get_data()["dist"].to(u.arcsec) # Change distance unit from degrees to arcseconds
    
    
    # match
    if len(gaia_search.get_data()["dist"]) > 0:
        gaia_search.get_data()["input_object_name"] = CLAGN["Object Name"][cc] # add input object name to catalog
        sel_min = np.where( (gaia_search.get_data()["dist"] < 1*u.arcsec) & (gaia_search.get_data()["dist"] == np.nanmin(gaia_search.get_data()["dist"]) ) )[0]
    else:
        sel_min = []
        
    #print("Number of sources matched: {}".format(len(sel_min)) )
    
    if len(sel_min) > 0:
        gaia_table = vstack( [gaia_table , gaia_search.get_data()[sel_min]] )
    else:
        gaia_table = vstack( [gaia_table , gaia_search.get_data()[sel_min]] )

print("\nSearch completed in {:.2f} seconds".format((time.time()-t1) ) )
print("Number of objects mached: {} out of {}.".format(len(gaia_table),len(CLAGN) ) )

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
Search completed in 76.60 seconds
Number of objects mached: 28 out of 31.


In [7]:
########## EXTRACT PHOTOMETRY #########
# Note that the fluxes are in e/s, not very useful. However, there are magnitudes (what unit??) but without errors.
# We can get the errors from the flux errors?

## Define keys (columns) that will be used later. Also add wavelength in angstroms for each filter
mag_keys = ["phot_bp_mean_mag" , "phot_g_mean_mag" , "phot_rp_mean_mag"]
magerr_keys = ["phot_bp_mean_mag_error" , "phot_g_mean_mag_error" , "phot_rp_mean_mag_error"]
flux_keys = ["phot_bp_mean_flux" , "phot_g_mean_flux" , "phot_rp_mean_flux"]
fluxerr_keys = ["phot_bp_mean_flux_error" , "phot_g_mean_flux_error" , "phot_rp_mean_flux_error"]
mag_lambda = ["5319.90" , "6735.42" , "7992.90"]

## Get photometry. Note that this includes only objects that are 
# matched to the catalog. We have to add the missing ones later.
_phot = gaia_table[mag_keys]
_err = hstack( [ 2.5/np.log(10) * gaia_table[e]/gaia_table[f] for e,f in zip(fluxerr_keys,flux_keys) ] )
gaia_phot2 = hstack( [_phot , _err] )

## Clean up (change units and column names)
_ = [gaia_phot2.rename_column(f,m) for m,f in zip(magerr_keys,fluxerr_keys)]
for key in magerr_keys:
    gaia_phot2[key].unit = "mag"
gaia_phot2["input_object_name"] = gaia_table["input_object_name"].copy()

## Also add object for which we don't have photometry.
# Add Nan for now, need to think about proper format. Also, there are probably smarter ways to do this.
# We do this by matching the object names from the original catalog to the photometry catalog. Then add
# an entry [np.nan, ...] if it does not exist. To make life easier, we add a dummy entry as the first
# row so we can compy all the 
gaia_phot = Table( names=gaia_phot2.keys() , dtype=gaia_phot2.dtype )
for ii in range(len(CLAGN)):
    sel = np.where( CLAGN["Object Name"][ii] == gaia_phot2["input_object_name"] )[0]
    if len(sel) > 0:
        gaia_phot = vstack([gaia_phot , gaia_phot2[sel] ])
    else:
        tmp = Table( np.repeat(np.NaN , len(gaia_phot2.keys())) , names=gaia_phot2.keys() , dtype=gaia_phot2.dtype )
        gaia_phot = vstack([gaia_phot , tmp ])

In [8]:
gaia_phot

phot_bp_mean_mag,phot_g_mean_mag,phot_rp_mean_mag,phot_bp_mean_mag_error,phot_g_mean_mag_error,phot_rp_mean_mag_error,input_object_name
mag,mag,mag,mag,mag,mag,Unnamed: 6_level_1
float32,float32,float32,float64,float64,float64,str25
,,,,,,
19.334736,20.655428,18.006554,0.0489463475382072,0.01195253816419854,0.03560993262228968,WISEA J012648.10-083948.0
19.742887,20.841955,18.55511,0.09464535378467852,0.02097086422081428,0.04248351283430384,2MASS J01595763+0033105
19.742887,20.841955,18.55511,0.09464535378467852,0.02097086422081428,0.04248351283430384,WISEA J015957.63+003310.3
,,,,,,
18.682938,19.121965,17.443657,0.06859739379512571,0.021330523804058837,0.03162004672292963,WISEA J083132.25+364617.0
...,...,...,...,...,...,...
18.879412,20.113039,17.437674,0.055547993309524764,0.01583562186684592,0.021049902715324503,WISEA J153355.99+011029.7
18.094048,19.134172,16.777948,0.0252111433628053,0.015455575156292103,0.017681559703418063,WISEA J154529.63+251127.9
18.89239,19.558643,17.498657,0.03719779255012713,0.014864182693170745,0.025478937985966617,WISEA J155017.23+413902.4


### ASAS-SN (all sky automated survey for supernovae) has a website that can be manually searched (Faisst)
- see if astroquery.vizier can find it



### icecube has a 2008 - 2018 catalog which we can download and is small (Faisst)
- https://icecube.wisc.edu/data-releases/2021/01/all-sky-point-source-icecube-data-years-2008-2018/

## 4. Make plots of luminosity as a function of time
- time could be days since peak, or days since first observation, or??

## Image extension: look for archival images of these targets
- NASA NAVO use cases should help us to learn how to do this
- can use the cutout service now in astropy from the first fornax use case

## ML Extension 
Consider training a ML model to do light curve classification based on this sample of CLAGN
 - once we figure out which bands these are likely to be observed in, could then have a optical + IR light curve classifier
 - what would the features of the light curve be?
 - what models are reasonable to test as light curve classifiers?
 - could we make also a sample of TDEs, SNe, flaring AGN? - then train the model to distinguish between these things?
 - need a sample of non-flaring light curves
 
After training the model:
 - would then need a sample of optical + IR light curves for "all" galaxies = big data to run the model on.

Some resources to consider:
- https://github.com/dirac-institute/ZTF_Boyajian
- https://ui.adsabs.harvard.edu/abs/2022AJ....164...68S/abstract
- https://ui.adsabs.harvard.edu/abs/2019ApJ...881L...9F/abstract

