## Make multiwavelength light curves using archival data

### Summary:
 - model plots after van Velzen et al. 2021, https://arxiv.org/pdf/2111.09391.pdf
 
### Input:
 - a catalog of CLAGN from the literature

### Output:
 - an archival optical + IR + neutrino light curve
 
### Technical Goals:
 - should be able to run from a clean checkout from github
 - should be able to automatically download all catalogs & images used
 - need to have all photometry in the same physical unit
 - need to have a data structure that is easy to use but holds light curve information (time and units) and is extendable to ML applications
 - need to have a curated list of catalogs to search for photometry that is generalizeable to other input catalogs
 
### Authors:
IPAC SP team

### Acknowledgements:
Suvi Gezari, Antara Basu-zych,
MAST, HEASARC, & IRSA Fornax teams

In [19]:
from astroquery.ipac.ned import Ned
from astropy.coordinates import SkyCoord
from astroquery.heasarc import Heasarc


## 1. Define the Sample

In [3]:
# use the following paper to make a sample of CLAGN: https://iopscience.iop.org/article/10.3847/1538-4357/aaca3a 

# This sample can later be switched out to a differen/larger sample of "interesting" targets

#use ADS to find the refcode for this paper
CLAGN = Ned.query_refcode('2018ApJ...862..109Y')


### What is the best data structure for this work?
 - needs to hold multiwavelength light curves
 - understands both time and units on fluxes
 - would like to know if whatever we choose can be scaled up to make light curves of the while WISE sample
 - some things to look into
     - astropy has a light curve class
         -would probably need to do some development work to make this work for multiwavelength application
     - LINCC people are interested in this and might have some suggestions on a 6mo. timescale
     - xarray
     - pandas might have more unit support now than before
     - what is ZTF using?
     - what did Dave do in his WISE parquet files?
     
- One suggestion is that instead of one large dataframe with the multiwavelength information, we keep them as seperate astropy light curves for each band, do the feature extraction on each light curve and keep the features in one large dataframe.

In [39]:
type(CLAGN)


astropy.table.table.Table

In [18]:
# Build a list of skycoords from target ra and dec
coords_list = [
    SkyCoord(ra, dec, frame='icrs', unit='deg')
    for ra, dec in zip(CLAGN['RA'], CLAGN['DEC'])
]


## 2. Find photometry for these targets in NASA catalogs
- look at NAVO use cases to get help with tools to do this - although they mostly use pyvo
- deciding up front to use astroquery instead of pyvo
    - astroquery is apparently more user friendly
- data access concerns:
    - can't ask the archives to search their entire holdings
        - not good enough meta data
        - not clear that the data is all vetted and good enough to include for science
        - all catalogs have differently named columns so how would we know which columns to keep
    - instead work with a curated list of catalogs for each archive
        - focus on general surveys
        - try to ensure that this list is also appropriate for a generalization of this use case to other input catalogs
        - could astroquery.NED be useful in finding a generalized curated list
- How do we know we have a match that is good enough to include in our light curve
     - look at nway for the high energy catalogs
     - probably need to generate a table of search radii for each catalog based on bandpass
         - need domain knowledge for that
     


### HEASARC
- asked Antara for help making a curated list of catalogs
- Suvi mentioned scientifically sensible to include Fermi Gamma ray photometry

In [35]:
#list all the available HEASARC missions
heasarc = Heasarc()
mission_table = heasarc.query_mission_list()
#mission_table.pprint_all()



In [36]:
#figure out what the column names are in one of the catalogs
cols = heasarc.query_mission_cols(mission='fermi3fgl')
#cols




In [None]:
#For all CLAGN coords in the paper
c = 1 #just playing with astroquery query_region
#do a query on position
mission = 'fermi3fgl'
radius = 0.1*u.degree
results = heasarc.query_region(coords(c), mission = mission, radius = radius, sortvar = 'SEARCH_OFFSET_')
#if there is a good match where good = ??
#save the found photometry in the chosen data structure
        

### IRSA

astroquery.ipac.irsa 

 - need to make a curated list of catalogs here
     - ZTF
 
     - WISE
         - use Dave Shupe's light curve catalog parquet file /irsa-data-download10/parquet-work/NEOWISE-R/neowise_lc_half.parquet
         - can use existing code in https://github.com/IPAC-SW/ipac-sp-notebooks/blob/main/catwise_variables/nhel_xgboost.ipynb to access and work with this catalog
         - will need to work on how to efficiently search that catalog since it is too big to fit in memory
             - re-do work on Vaex and Dask and Spark
         - Do we need updates to this catalog from Dave?
             - once concern is that it is only half sky, hopefully enough of our targets are in the catalog


### MAST

- astroquery MAST doesn't require a catalog input but we might want it to narrow things down?
    - which catalogs are interesting?
        - Pan-STARRS
        - need to ask someone at MAST for a curated list of catalogs to search
        
- MAST has copies of ATLAS all-sky stellar reference catalog- but not searchable
     - might be available through astroquery.vizier
    


## 3. Find photometry for these targets in relevant, non-NASA catalogs


### Gaia
- astroquery.gaia will presumably work out of the box for this

### ASAS-SN (all sky automated survey for supernovae) has a website that can be manually searched
- see if astroquery.vizier can find it



### icecube has a 2008 - 2018 catalog which we can download and is small 
- https://icecube.wisc.edu/data-releases/2021/01/all-sky-point-source-icecube-data-years-2008-2018/

## 4. Make plots of luminosity as a function of time
- time could be days since peak, or days since first observation, or??

## Image extension: look for archival images of these targets
- NASA NAVO use cases should help us to learn how to do this
- can use the cutout service now in astropy from the first fornax use case

## ML Extension 
Consider training a ML model to do light curve classification based on this sample of CLAGN
 - once we figure out which bands these are likely to be observed in, could then have a optical + IR light curve classifier
 - what would the features of the light curve be?
 - what models are reasonable to test as light curve classifiers?
 - could we make also a sample of TDEs, SNe, flaring AGN? - then train the model to distinguish between these things?
 - need a sample of non-flaring light curves
 
After training the model:
 - would then need a sample of optical + IR light curves for "all" galaxies = big data to run the model on.

Some resources to consider:
- https://github.com/dirac-institute/ZTF_Boyajian
- https://ui.adsabs.harvard.edu/abs/2022AJ....164...68S/abstract
- https://ui.adsabs.harvard.edu/abs/2019ApJ...881L...9F/abstract

