# Day 5: finding samples of SDSS-V spectra with allspec

From this notebook, we will figure out how to use the allspec database on SkyServer compute to: 

--find a spectrum for objects within a certain distance on the sky to a target location

--find objects matching signal-to-noise and redshift thresholds

--find objects with a certain number of observed spectra

Then for the lab, you will adapt this code to measure C IV lines over time for a quasar with 10+ observed spectra and make plots of their properties

In [None]:
import os
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import astropy.io.fits
import astropy.coordinates
import fitsio
import sdss_access

matplotlib.rcParams['text.usetex'] = True
matplotlib.rcParams['font.size'] = 14

## Do not edit the following code adapted from the SDSS allspec tutorial

In [None]:
sdss_path = sdss_access.path.Path(release='dr19', verbose=True)
access = sdss_access.Access(release='dr19', verbose=True)

In [None]:
allspec_file = sdss_path.full('allspec', vers='1.0.1', release='dr19')

if not sdss_path.exists('',full=allspec_file):
    # if the file does not exist locally, this code will download the data.
    access.remote()
    access.add('allspec', vers='1.0.1', release='dr19')
    access.set_stream()
    access.commit()
print(allspec_file)

In [None]:
# this step is slow give it time the allspec file is enormous
allspec_hdus = astropy.io.fits.open(allspec_file)
allspec = np.array(allspec_hdus[1].data)

## 1. Matching by position to `allspec`

Find a sky position from e.g. your exploration of the sky using SkyServer on Monday and enter its RA and Dec below in degrees

In [None]:
center_ra = [XX]
center_dec = [XX]
center_coords = astropy.coordinates.SkyCoord(center_ra, center_dec, unit='deg', frame='icrs')

unique_sdss_id, unique_indx = np.unique(allspec['sdss_id'], return_index=True)
unique_ra = allspec['ra'][unique_indx]
unique_dec = allspec['dec'][unique_indx]

isfinite = np.isfinite(unique_ra) & np.isfinite(unique_dec)
unique_indx = unique_indx[isfinite]
unique_ra = unique_ra[isfinite]
unique_dec = unique_dec[isfinite]
unique_coords = astropy.coordinates.SkyCoord(unique_ra, unique_dec, unit='deg', frame='icrs')

indx, sep, s3 = unique_coords.match_to_catalog_sky(center_coords)
sep = sep.value   # avoid units nonsense ("value" is in deg in this case)

### modify the following code to find all objects within 0.5 degrees of the target location

In [None]:
isep = XX

# no need to modify, finding sdss_ids and index locations of matches
match_indx = unique_indx[isep]
sdss_id = allspec['sdss_id'][match_indx]

### 2. get the list of files for one spectrum per matching object

In [None]:
### find one spectrum per matching object
XX

### 3. now instead of matching by location let's find objects using redshift and signal-to-noise cuts

In [None]:
# this step loads data from the spall file and so also takes a little bit to run
spall_file = sdss_path.full('spAll', run2d='v6_1_3')

if not sdss_path.exists('',full=spall_file):
    # if the file does not exist locally, this code will download the data.
    access.remote()
    access.add('spAll', run2d='v6_1_3')
    access.set_stream()
    access.commit()
print(spall_file)

spall_columns = ['SDSS_ID', 'CARTON_TO_TARGET_PK', 'MJD', 'CLASS', 'SUBCLASS', 'Z', 'ZWARNING', 'SN_MEDIAN_ALL', 'PSFMAG']
spall = fitsio.read(spall_file, columns=spall_columns)

Here spall['Z'] contains redshifts, spall['PSFMAG'] contains ugriz magnitudes for the spectrum, and spall['SN_MEDIAN_ALL'] contains a summary signal to noise

#### write a condition e.g. using a logical expression np.where to find all objects where C IV should be visible in the observed spectrum, where the g-band magnitude (spall['PSFMAG'][:,1]) is less than 18, and where the signal-to-noise is greater than 20

In [None]:
# write your code here insead of XX
indx = np.where(XX)
sdssid_sample = spall['SDSS_ID'][indx]

In [None]:
# for some list of matching objects, let's find those with many spectra
for sdssid in sdssid_sample:
    iallmatch = np.where(allspec['sdss_id'] == sdssid)[0]
    files = allspec['allspec_id'][iallmatch]
    
    ### add code here to leave the loop if an object has more than 3 spectra
    XX
        
print(i,sdss_id,files,nfiles)

#### 4. Looking at the spectra (code from DR19 allspec tutorial, hopefully shouldn't need to be modified?)

We can track down the spectra on disk quite easily. The `sas_url` tells us the path. We just need to change the root of the tree to a local file path as follows.

If the data don't already exist on disk (e.g., if you're not running this notebook on SciServer), we can download the data easily with `sdss_access`

In [None]:
url_root = 'https://data.sdss.org/sas'
local_root = os.getenv('SAS_BASE_DIR')
spectrum_files = list()
download_files = list()

for p in allspec["sas_url"][iallmatch]:
    local_path = p.decode().replace(url_root, local_root)
    spectrum_files.append(local_path)
    if not os.path.exists(local_path):
        download_files.append(local_path)

if len(download_files) > 0:
    print("fetching files, please stand by")
    access.remote()
    for local_path in download_files:
        access.add_file(local_path, input_type='filepath')

    access.set_stream()

    # disable follow_symlinks
    access.commit(follow_symlinks=False)

Here then are the paths in the local SAS directory structure:

In [None]:
for f in spectrum_files:
    print(f)

We can open one of the BOSS files up to see what it has in it. We'll first just look at what the HDUs are called.

In [None]:
spec_hdulist = astropy.io.fits.open(spectrum_files[0])

In [None]:
for ihdu, spec_hdu in enumerate(spec_hdulist):
    if('extname' in spec_hdu.header):
        print(spec_hdu.header['extname'])
    else:
        print("HDU{i}".format(i=ihdu))

It looks like "COADD" actuall has the spectrum. This is a table, and the columns have the fluxes, wavelengths, masks, etc. Really you should look at the data model at: https://data.sdss.org/datamodel/files/BOSS_SPECTRO_REDUX/RUN2D/spectra/PLATE4/spec.html

In [None]:
coadd = np.array(spec_hdulist['COADD'].data)
coadd_header = spec_hdulist['COADD'].header

Now we can plot and label our plot:

In [None]:
# Let's set sensible limits; the model is better than the data for that
gd = coadd['IVAR'] > 0
gdmax = coadd['MODEL'][gd].max()

plt.plot(10.**coadd['LOGLAM'], coadd['FLUX'], linewidth=1, color='black')
plt.plot(10.**coadd['LOGLAM'], coadd['MODEL'], linewidth=1, color='red')
plt.ylim(np.array([-0.05, 1.3]) * gdmax)

plt.xlabel(r'\rm Wavelength (Angstroms)')
plt.ylabel(r'$f_\lambda$ \rm ($10^{-17}$ erg cm$^{-2}$ s$^{-1}$ \AA$^{-1}$)')

We have three visits so we can compare them.

In [None]:
boss_visits = []
mjds = []
for file in spectrum_files[:10]:
    spec_hdulist = astropy.io.fits.open(file)
    visit = np.array(spec_hdulist['COADD'].data)
    boss_visits.append(visit)
    mjds.append(spec_hdulist[0].header['MJD'])

In [None]:
# Let's set sensible limits; the model is better than the data for that
allmax = 0.0
for mjd, visit in zip(mjds, boss_visits):
    gd = visit['IVAR'] > 0
    gdmax = visit['MODEL'][gd].max()
    if(gdmax > allmax):
        allmax = gdmax

    plt.plot(10.**visit['LOGLAM'], visit['FLUX'], linewidth=1, label=str(mjd))

plt.ylim(np.array([-0.05, 1.3]) * allmax)
plt.xlabel(r'\rm Wavelength (Angstroms)')
plt.ylabel(r'$f_\lambda$ \rm ($10^{-17}$ erg cm$^{-2}$ s$^{-1}$ \AA$^{-1}$)')
plt.legend()