# Selecting Gaia-detectable stars

This notebook is intended to walk you through selecting stars detectable in Gaia, which can either be quite straightforward and fast or sophisticated and computationally expensive depending on your use case. Let's start with loading in a file. Please note that some steps in this notebook require the Python packages scanninglaw and GaiaUnlimited, which are not available on Windows since they rely on the healpy package. They can be installed in Windows Subsystem for Linux (WSL), however.

In [None]:
#Import what you need
import numpy as np
from speedystar import starsample
from speedystar.eject import Hills
from speedystar.utils.mwpotential import MWPotential
from galpy.potential import MWPotential2014
import astropy.units as u
from galpy import potential
import mwdust

In [9]:
#Create a fairly large mock HVS sample, because Gaia-detectable stars are rare especially in the radial velocity catalogue
ejectionmodel = Hills(rate=2e-3/u.yr)

# Eject a sample of stars from Sgr A*. 
mysample = starsample(ejectionmodel, name='My Hills catalogue')

default_potential = MWPotential2014
potential.turn_physical_on(default_potential)

mysample.propagate(potential = default_potential)
mysample.dust = mwdust.Combined15()
mysample.photometry()

mysample.save('./cat_photometry_forGaiastuff.fits')

Evolving HVSs: 100%|██████████| 17176/17176 [00:35<00:00, 477.12it/s]
Propagating...: 100%|██████████| 11455/11455 [08:47<00:00, 21.73it/s]
Calculating magnitudes: 100%|██████████| 6/6 [01:32<00:00, 15.44s/it]


## The simple case

Reminder that Gaia-detectable stars can be selected using mysample.subsample(cut). By default, stars detectable in the radial velocity catalogue are selected with simple magnitude and temperature cuts:
 
 cut = 'Gaia_6D_DR2' -> Gaia_GRVS < 12 & T_eff < 6900 K

 cut = 'Gaia_6D_EDR3' -> Gaia_GRVS < 12 & T_eff < 6900 K

 cut = 'Gaia_6D_DR3' -> (Gaia_GRVS < 14 & T_eff < 6900 K) OR (Gaia_GRVS<12 & T_eff < 14500 K)

 cut = 'Gaia_6D_DR4' -> (Gaia_GRVS < 16.2 & T_eff < 6900 K) OR (Gaia_GRVS<14)

T_eff, if not already an attribute of mysample, can be computed with mysample.evolve(), and Gaia_GRVS is computed with mysample.photometry()

For the astrometric catalogue, the selection cuts are all the same:

cut = 'Gaia_*' -> Gaia_G < 20.7

where * is one of 'DR2', 'EDR3', 'DR3', 'DR4'. Gaia_G can be computed with mysample.photometry(). All of the 'Gaia_DR*' cuts will therefore yield samples of identical size, the only difference being the astrometric errors (see below for more on that.)

In [None]:
#Load a pre-existing sample with photometry. 
mysample = starsample('./cat_photometry_forGaiastuff.fits')

print('Faintest star in sample is at G magnitude {:.1f}'.format(np.nanmax(mysample.Gaia_G)))
print('Number of stars in sample: '+str(mysample.size))

#Determine which stars would be in principle detectable in Gaia DR3
mysample.subsample('Gaia_DR3')

#Save the cut sample
mysample.save('./cat_gaiaDR3.fits')

print('Faintest star in Gaia DR3 is at G magnitude {:.1f}'.format(np.max(mysample.Gaia_G)))
print('Number of stars in Gaia DR3: '+str(mysample.size))

#Recall that mysample.subsample() is a destructive operation, so we need to reload the original sample
mysample = starsample('./cat_photometry_forGaiastuff.fits')

#Determine which stars would be in principle detectable in Gaia DR3 6D
mysample.subsample('Gaia_6D_DR3')

#Save the cut sample
mysample.save('./cat_gaiaDR3_6D.fits')

print('Faintest star in Gaia DR3 (6D) is at G_RVS magnitude {:.1f}'.format(np.max(mysample.Gaia_GRVS)))
print('Number of stars in Gaia DR3 (6D): '+str(mysample.size))

## Selection functions using GaiaUnlimited

Depending on your use case, the selection functions above may not be sufficient. Many stars fainter than the quoted faint-end magnitude limit can appear in the catalogue, and the faint-end completeness limit can be brighter in areas of highly crowded and/or dust-extincted environments.

[GaiaUnlimited](https://gaia-unlimited.org/) is a powerful Python package for creating and querying the selection function of the Gaia data releases and their subsamples. Please see the package documentation for more information on how it works. Speedystar allows integration with GaiaUnlimited, though it requires some disk space to download or create the selection functions and querying them is significantly slower than the simple magnitude cuts.

### Querying the pre-built radial velocity selection functions

GaiaUnlimited provides prebuilt selection functions for the Gaia DR2 and DR3 subsets with measured radial velocities. When cut == 'Gaia_6D_DR2', 'Gaia_6D_EDR3' (because it shares a radial velocity selection function with DR2) or 'Gaia_6D_DR3', these selection functions will be queried in starsample.subsample() if the use_rvs_sf Boolean argument is True. In the following, we will select Gaia DR3 6D-detectable stars using the prebuilt selection function as an example.

Calling the cuts like this with the prebuilt selection function assigns each star an attribute 'obsprob' which is the probability of being observed in the given data release. From there, there are then two options for selecting which stars would be included in the catalogue:

1) By default, for each star a random number between 0 and 1 is then drawn and the star is included in the catalogue if the random number is less than obsprob.
2) If an argument 'probcut' is supplied, which is a float between 0 and 1, all stars with a obsprob>=probcut are selected.

Option 1 above is likely more realistic but has some randomness. Option 2 is more flexible and gives reproducible results.

In [None]:
#Downloading the radial velocity selection function requires ~0.5 GB of disk space. By default these will be saved in the current directory. This can be changed by setting the environment variable GAIAUNLIMITED_DATADIR environment variable to the desired directory.

#This can otherwise be done directly in speedystar by uncommenting the following:
#mysample.config_rvssf('/path/to/directory/')

#Calling mysample.subsample() with the argument use_rvs_sf=True where cut='Gaia_6D_DR2' or 'Gaia_6D_EDR3' or 'Gaia_6D_DR3' will apply the radial velocity selection function to the sample.

mysample = starsample('./cat_photometry_forGaiastuff.fits')
mysample.subsample('Gaia_6D_DR3', use_rvs_sf=True)

#Save the cut sample
mysample.save('./cat_gaiaDR3_6D_rvssf.fits')

print('Faintest star in Gaia DR3 (6D) with radial velocity selection function is at G_RVS magnitude {:.1f}'.format(np.max(mysample.Gaia_GRVS)))
print('Number of stars in Gaia 6DR3 (6D) with radial velocity selection function: '+str(mysample.size))

#Do the same thing again but demonstrate the functionality with probcut
mysample = starsample('./cat_photometry_forGaiastuff.fits')
mysample.subsample('Gaia_6D_DR3', use_rvs_sf=True, probcut=0.8)

#Save the cut sample
mysample.save('./cat_gaiaDR3_6D_rvssf2.fits')
print('Faintest star in Gaia DR3 (6D) with radial velocity selection function is at G_RVS magnitude {:.1f}'.format(np.max(mysample.Gaia_GRVS)))
print('Number of stars in Gaia 6DR3 (6D) with radial velocity selection function: '+str(mysample.size))

### Creating your own selection functions with GaiaUnlimited

The radial velocity selection functions are the only ones which come pre-built in GaiaUnlimited at this time. You can construct your own selection function, however, for any criteria that you want. Here we will construct a selection function for all stars in Gaia DR3 with measured five-parameter astrometry

In [None]:
from gaiaunlimited.utils import get_healpix_centers
from gaiaunlimited.selectionfunctions.subsample import SubsampleSelectionFunction

#Define the dependencies and resolutions of the selection function
#'healpix' is the healpy level at which the selection function is defined. The higher the level, the higher the resolution. The risk with too high a resolution is that not enough stars will populate each healpix/colour/magnitude bin, which can lead to noisy or undefined selection functions.
#'phot_g_mean_mag' is the range of G magnitudes covered by the selection function. The range is defined as [min, max, step].
#'g_rp' is the range of G-RP colours covered by the selection function. 
inDict = {'healpix': 4, 'phot_g_mean_mag': [12 ,20 ,0.25] , 'g_rp': [ 0.2 ,1.8,0.2]}

#Create the selection function
#subsample_query is the query used to select the stars from the Gaia DR3 database, i.e. the ADQL query you would use when querying the Gaia archive yourself. In this case we are selecting stars with parallaxes, proper motions, and radial velocities. This may take a long time to run and will be saved as "file_name".csv in the same GAIAUNLIMITED_DATADIR directory as the radial velocity selection functions (see above).
#If the file_name already exists and has an inDict matching the one above, the selection function will be loaded. This will not take long.

#dr3AstrometrySF = SubsampleSelectionFunction(subsample_query = "parallax is not null and pmra is not null and pmdec is not null",file_name = "par_pm", hplevel_and_binning = inDict)

dr3AstrometrySF = SubsampleSelectionFunction(subsample_query = "parallax is not null and pmra is not null and pmdec is not null",file_name = "par_pm_hp4_g12_20_0.25_grp_0.2_1.8_0.2", hplevel_and_binning = inDict)

mysample = starsample('./cat_photometry.fits')
mysample.subsample(dr3AstrometrySF, probcut=0.8)

mysample.save('./cat_gaiaDR3_astsf.fits')

print('Faintest star in Gaia DR3 with custom selection function is at G magnitude {:.1f}'.format(np.max(mysample.Gaia_G)))
print('Number of stars in Gaia with custom selection function: '+str(mysample.size))

#The following code can be useful for debugging, it will calculate which sky/magnitude/colour bin each star falls into and the number of stars in the bin and the number of stars in the bin that satisfy the selection function.


## Gaia errors

Above we have selected stars detectable in Gaia or subsamples of it. It might also be important to know what kind of astrometric and radial velocity errors these stars would have. 

Similar to the Gaia selection itself, estimation of the errors is done in two ways, one fast and simple and one slow but less accurate. 

### The simple case

In the first and default method, astrometric errors are estimated using the Python package pygaia based on each star's G-band magnitude. Note that these estimates don't depend on sky position, so they will be less reliable in observationally tricky regions of the sky (especially the Galactic Centre.) Similarly, radial velocity errors are estimated based on each star's G_RVS magnitude and temperature and surface gravity. Errors can be estimated as follows:

In [13]:
#Load a sample with photometry
mysample = starsample('./cat_photometry_forGaiastuff.fits')

#Get the Gaia DR3 errors
#A data release must be specified. 
#Options are 'DR2', 'EDR3', 'DR3', 'DR4', 'DR5'.
mysample.get_Gaia_errors(release='DR3')

#The chosen data release will also be recorded as a metavariable:

#Save the sample with errors
mysample.save('./cat_photometry_DR3errors.fits')

#The errors are stored in the following attributes by default:
#mysample.e_par, mysample.e_pmra, mysample.e_pmdec, mysample.e_vlos

#To return only selected errors, pass a list of strings to the errors argument.
#This will not save much computational time, but will save disk space.
mysample = starsample('./cat_photometry_forGaiastuff.fits')

#Calculate only parallax errors
mysample.get_Gaia_errors(release='DR3', errors=['e_par'])

#NOTE that errors are calculated without regard to whether or not the star would actually be detectable in the chosen data release. This can be done by calling mysample.subsample() with the desired data release.

### The more complicated case

Astrometric errors can also be estimated using the Gaia astrometric spread function. Similar to the selection function, this astrometric spread function can be queried at a particular sky position and magnitude using the [scanninglaw](https://github.com/gaiaverse/scanninglaw) package. It returns the full astrometric covariance matrix for a source at that position and magnitude, i.e. the position/parallax/proper motion variances and the covariances among them. This is slower but more accurate than using pygaia. The differences in the error estimates will be largest for bright sources.

The DR2 and (E)DR3 spread functions are available to query. If DR4 or DR5 errors are being calculated, the (E)DR3 errors are calculated and scaled down appropriately based on the mission duration.

Functionality to query the astrometric spread function is built into speedystar and can be invoked using the use_ast_sf Boolean argument in starsample.get_Gaia_errors(). The get_correlations Boolean argument can also be set to True to return the correlations among the astrometric errors -- they are not returned by default to save disk space.

In [None]:
#Load a sample with photometry
mysample = starsample('./cat_photometry_forGaiastuff.fits')

#If the spread function has never been called, uncomment this line to download the DR2 and DR3 astro spread functions. Together they're about 500 MB

#mysample.config_astsf('./path/where/you/want/to/save/the/selection/functions')
mysample.get_Gaia_errors(release='DR3',use_ast_sf=True, get_correlations=True)

mysample.save('./cat_photometry_DR3errors_astsf.fits')