# APOGEE on FIRE on SciServer

SciServer now hosts both the raw fits files (via SAS) and database versions (via Casjobs) of the APOGEE-on-FIRE VAC for SDSS DR17 (see
https://www.sdss.org/dr17/data_access/value-added-catalogs/?vac_id=apogee-on-fire-simulation-mocks)

This notebook presents a breif tutorial on how to access the APOGEE-FIRE database using the SciServer CasJobs client, and presents some information on what is contained in the catalog.

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from SciServer import CasJobs as cj

### APOGEE-FIRE databoase context

All catalog tables exist in a database context, which is a unit of access in CasJobs. The default database context in CasJobs is MyDB (your personal storage space and staging area), so we must specify the desired context to query APOGEE-FIRE - which will be passed to queries below.

In [None]:
db_context = 'apogee_fire_test'

### Simulation tables

The Apogee-FIRE suite has 4 simulated catalogs in each of 3 milky-way-like galaxy, 3 full catalogs for different local standard of rest (LSR) locations, and one 1%-sampled test catalog at one of the LSR positions - for a total of 12 tables containing over 40B records.

In [None]:
pd.DataFrame(cj.getTables(db_context)).query('Name.str.contains("m12")', engine='python')

### Explore schema

Each table has over 130 columns. These are largely grouped by type of measurement in the naming scheme, so we can vizualize a tree structure of columns in the dataset. This is mostly an exercise for fun :)

For more details on the columns see (research note reference, if available)

In [None]:
schema = cj.executeQuery('SELECT TOP 1 * FROM m12f_test WHERE parallax > 0 AND feh_apogee IS NOT NULL', db_context)
len(schema.columns)

to make the vizualization, we use the graphviz package, so install dependencies using pip. The capture magic makes things less verbose

In [None]:
%%capture 
%pip install graphviz pydot

In [None]:
import pydot, graphviz
def create_tree_view(df, filt=lambda x: True):
    graph = pydot.Dot()
    graph.add_node(pydot.Node('root', shape='box'))
    for c in [c for c in df.columns if filt(c)]:
        i = c.split('_')
        for j in range(len(i)):
            n = '_'.join(i[:j+1])
            dt = df[n].dtype if n in df else ''
            nl = i[j]
            if not graph.get_node(n):
                c = 'palegreen1' if 'true' in n else 'khaki1' if 'obs' in n else 'coral' if 'error' in n else ''
                s = 'filled' if c else 'dashed' if j != len(i) - 1 else ''
                graph.add_node(pydot.Node(n, label=f'{n}\n{dt}', shape='box', fontsize='7', fillcolor=c, style=s))
            src = 'root' if j == 0 else '_'.join(i[:j])
            if not graph.get_edge(src, n):
                graph.add_edge(pydot.Edge(src, n, weight='1.2', color='gray'))
    graph.set_ranksep('0.2')
    graph.set_nodesep('0.2')
    return graph

In [None]:
graphviz.Source(create_tree_view(schema).to_string())
# Or try with a filter on columns:
#graphviz.Source(create_tree_view(schema, lambda x: x.startswith('v')).to_string())

### Generate Toomre-diagram for "local" stellar population

Select those stars that are within 3 kpc observered heliocentric distance, and magnitude limited to 17 in G. In order to get good dynamic range on the abundance plot, we discard those stars with very low metallicity (which comprise a very small portion). Given the dataset size, for some queries it will be infeasible to download all records to work with locally, so in this case we offload the 2-d histogram binning to CasJobs, calculating stellar density and mean [FE/H] on a 1 km/s x 1 km/s grid in velocity, which can then be plotted using matplotlib's hexbin

In [None]:
sql = '''
SELECT count(*) N
FROM m12f_lsr0
WHERE dhel_obs BETWEEN 0 AND 3 AND phot_g_mean_mag < 17 AND vrho_cyl_obs IS NOT NULL AND feh_apogee > -2
'''
cj.executeQuery(sql, db_context).N

In [None]:
sql = '''
SELECT t_Y, t_X, COUNT(*) AS N, AVG(FEH) AS FEH
FROM (
    SELECT FLOOR(SQRT((POWER(U, 2) + POWER(W, 2)))) AS t_Y, FLOOR(V) as t_X, FEH
    FROM (
      SELECT vrho_cyl_obs AS U, vphi_cyl_obs AS V, vz_cyl_obs AS W, feh_apogee AS FEH
      FROM m12f_lsr1
      WHERE dhel_obs BETWEEN 0 AND 3 AND phot_g_mean_mag < 17 AND vrho_cyl_obs IS NOT NULL AND feh_apogee > -2
    ) AS unbinned
) AS binned
GROUP BY t_Y, t_X
'''
toomre_data = cj.executeQuery(sql, db_context)
toomre_data.head(3)

In [None]:
f,a = plt.subplots(1, 2, figsize=[16, 6])
div_r = np.arange(-np.pi / 2 , np.pi / 2, 0.05)
div_x = 220 + 220 * np.sin(div_r)
div_y = 220 * np.cos(div_r)
plt.sca(a[0])
plt.hexbin(toomre_data.t_X, toomre_data.t_Y, C=toomre_data.N, mincnt=1, bins='log', cmap='jet')
plt.plot(div_x, div_y, c='silver', lw=3, ls='--')
plt.xlabel(r'V$_\phi$')
plt.ylabel(r'$\sqrt{V_\rho^{2} + V_z^{2}}$')
plt.colorbar(label='Density')
plt.sca(a[1])
plt.hexbin(toomre_data.t_X, toomre_data.t_Y, C=toomre_data.FEH, cmap='jet')
plt.plot(div_x, div_y, c='k', lw=3, ls='--')
plt.xlabel(r'V$_\phi$')
plt.ylabel(r'$\sqrt{V_\rho^{2} + V_z^{2}}$')
_ = plt.colorbar(label='mean [Fe/H]')

### Alpha-FE relationship

Using a similar binning scheme, but selecting a resolution of 0.001 dex for binning we can plot the alpha (using [MG/FE] as proxy) to Fe trend in the same stellar population.

In [None]:
sql = '''
SELECT MGFE, FEH, COUNT(*) AS N 
FROM (
 SELECT FLOOR(mgfe_apogee*1000)/1000 AS MGFE, FLOOR(feh_apogee*1000)/1000 as FEH 
 FROM m12f_lsr0
 WHERE dhel_obs BETWEEN 0 AND 3 AND phot_g_mean_mag < 17 AND vrho_cyl_obs IS NOT NULL AND feh_apogee > -2
) AS a
GROUP BY MGFE, FEH
'''
mgfe = cj.executeQuery(sql, db_context)
mgfe.head(3)

In [None]:
from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset
plt.subplots(1, 1, figsize=[16, 8])
plt.hexbin(mgfe.FEH, mgfe.MGFE, C=mgfe.N, mincnt=1, bins='log', gridsize=(400,400))
plt.colorbar(label='Density')
plt.xlabel('[FE/H] (dex)')
plt.ylabel('[MG/FE] (dex)')
outer = plt.gca()
inset = inset_axes(outer, width='50%', height='40%', loc=3, borderpad=4)
plt.sca(inset)
plt.hexbin(mgfe.FEH, mgfe.MGFE, C=mgfe.N, mincnt=1, bins='log', gridsize=(600, 600))
plt.xlim(-1,0.5), plt.ylim(0.1, 0.45)
_ = mark_inset(outer, inset, loc1=2, loc2=4, ec='0.5')

### Healpix IDs

The catalog further contains healpix identifiers to aid in spatial searches and aggregations. The column `heal20id` are the pixel id in which the object lies on a map with *nside* of `2 ^ 20`.

We can use this to downsample to another resoluton by deviding the pixel value by `(2^20)^2 / N**2`, where `N` is the desired value of *nside*. For example, we can create an all sky map by grouping by a downsampled pixel resolution, using the healpy visualization utilities.

**Note** that the healpix ids are using a **Nested** scheme.

In [None]:
%%capture
%pip install healpy

In [None]:
import healpy

calculate the downsample denominator, here we will get a map with *nside* of 64

In [None]:
nside = 64
downsample =  int((2**20)**2 / nside**2)

Here we use the test dataset since we cannot execute this query within the timeout for interactive queries on the larger dataset due to the calculations involved:

In [None]:
hpmap = cj.executeQuery(f'''
SELECT hp, count(*) N, avg(feh) FEH FROM (
 SELECT heal20id / {downsample} as hp, feh_apogee as feh from m12f_test WHERE feh_apogee BETWEEN -1.5 AND 0.5
) as A
GROUP BY hp
ORDER BY hp ASC
''', db_context)

In [None]:
hpmap.shape

#### Spatial density map

For example, we can get the object counts spatially distributed

In [None]:
f = plt.figure(figsize=[14,6])
healpy.visufunc.mollview(hpmap['N'], nest=True, norm='log', cmap='jet', fig=1)

#### [FE/H] distribution

Or we can plot the mean metallicity across spatial components of the galaxy

In [None]:
f = plt.figure(figsize=[14,6])
healpy.visufunc.mollview(hpmap['FEH'], nest=True, norm=None, cmap='jet', fig=1)