In [None]:
import bayes_match

The following notebook gives a rough guide on how to run a Bayesian Cross-Match with the included code. All matches are made to a Gaia DR2 subset, and assume the file has the headers found in the file "". A file with a different format will break the code.

The first step in the process is to do the initial cross-match to the external catalog.

In [None]:
# external catalog that has already been pre-queried around all of the Gaia sources
# this file MUST be sorted by RA
external_file = 'PAN_STARRS_star_pm040mas_30as_radius_extrap_epoch_sorted.txt

# chunk size for large file loading
chunck_size = 4e6

# number of rows in external file
external_file_length = 203974312

# column number for the external catalog ids
id_col = [0]

# column numbers for [ra, dec, epoch]
# it is assumed that the epoch is in mjd
ra_dec_epoch_cols = [1,2,5]

# file name where to store initial best matches
initial_best_save = 'PAN_STARRS_GAIADR2_star_pm040mas_best_matches.txt'

# file where to store all matches within 15"
all_match_save = 'PAN_STARRS_GAIADR2_star_pm040mas_all_matches.txt'

# column numbers of all magntiudes and magntidue errors (alternating)
mag_cols = [6,7,8,9,10,11,12,13,14,15]

# the file name of the Gaia sources the external catalog is being matched to
match_file = 'GAIADR2_star_pm040mas.txt'

best_epochs = bayes_match.cross_match(external_file, chunck_size, external_file_length, id_col,
                                      ra_dec_epoch_cols, initial_best_save,
                                      all_match_save, mag_cols, match_file)

Next, do the same for the displaced sample

In [None]:
# external catalog that has already been pre-queried around all of the Gaia sources
# by displacing the search location by +/- 2 arcminutes
# this file MUST be sorted by RA
external_file = 'PAN_STARRS_star_pm040mas_dis_30as_radius_extrap_epoch_sorted.txt

# chunk size for large file loading
chunck_size = 4e6

# number of rows in external file
external_file_length = 203974312

# column number for the external catalog ids
id_col = [0]

# column numbers for [ra, dec, epoch]
# it is assumed that the epoch is in mjd
ra_dec_epoch_cols = [1,2,5]

# file name where to store initial best matches
initial_best_save = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_best_matches.txt'

# file where to store all matches within 15"
all_match_save = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_all_matches.txt'

# column numbers of all magntiudes and magntidue errors (alternating)
mag_cols = [6,7,8,9,10,11,12,13,14,15]

# the file name of the Gaia sources the external catalog is being matched to
match_file = 'GAIADR2_star_pm040mas.txt'

bayes_match.cross_match_dis(external_file, chunck_size, external_file_length, id_col,
                            ra_dec_epoch_cols, initial_best_save,
                            all_match_save, mag_cols, match_file)

Note, the above two steps can take a long time to run (on the order of a day or more) depending on the size of your catalogs and the machine you are using. You have been warned! (and stay tune for possible future optimizations to make this not the case)

Next, reclculate the angular seperations to the mean epoch of the best matches and rank the matches.

In [None]:
import numpy as np

mean_epoch = np.mean(best_epochs[(best_epochs > 1950.) & (best_epochs < 2050.)])

path = ''
# file where all matches stored
file = 'PAN_STARRS_GAIADR2_star_pm040mas_all_matches.txt'
# file where ranks are saved
file_save = 'PAN_STARRS_GAIADR2_star_pm040mas_all_matches_ranks.txt'

bayes_match.rank_match_check_dups(path, file,file_save, mean_epoch, flag=None, mag_cols=None)

# file where all matches stored for displaced sample
file = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_all_matches.txt'
# file where ranks are saved for displaced sample
file_save = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_all_matches_ranks.txt'

bayes_match.rank_match_check_dups(path, file, file_save, mean_epoch, flag=None, mag_cols=None)

Next, we will create files that contain all of the frequency distirbutions to begin finding our Bayesian probabilit distirbutions. As a note, this requires that a directory called "Distribution_Files" is in the current working directory.

As a note, these distribtuions are divided by various cuts in Gaia G and Galactic latitude (b), as described in Medan, Lepine & Hartman (2021). These are hard coded into these functions, so if they do not suit your needs they will need to be changed within the functions.

In [None]:
path = ''
file = 'PAN_STARRS_GAIADR2_star_pm040mas_all_matches.txt'
file_rank = 'PAN_STARRS_GAIADR2_star_pm040mas_all_matches_ranks.txt'
file_dis = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_all_matches.txt'
file_dis_rank = 'PAN_STARRS_GAIADR2_star_pm040mas_dis_all_matches_ranks.txt'
name = 'PAN_STARRS'
mag_cols = [15,17,19,21,23]
# bins for frequency distirbution for angular seperation axis
xbins = np.arange(0,20.2,0.2)
# bins for frequency distirbution for mag difference axis
ybins = np.arange(-30,30.2,0.2)


bayes_match.fit_mag_ang_dists(path, file,file_rank, file_dis, file_dis_rank,
                              name, mag_cols, xbins, ybins)

Next we are going to model the frequency distributions for the displaced sample

In [None]:
bayes_match.back_mod_2d_gauss('PAN_STARRS',['g','r','i','z','y'])

Finally, we can calculate the Bayesian cross-match probabilities for all stars in the external catalog.

In [None]:
from scipy.ndimage import percentile_filter

all_file = '_GAIADR2_star_pm040mas_all_matches.txt'
rank_file = '_GAIADR2_star_pm040mas_all_matches_ranks.txt'
name = 'PAN_STARRS'
mag_cols = [15,17,19,21,23]
# filter to use for smoothing the true distributions
smooth_filter = percentile_filter
# params for the filtert
filter_params = {'percentile':60,'size':5}

bayes_match.calc_bayes_prob(name, smooth_filter,
                            filter_params,
                            mag_cols, all_file, rank_file)

The last step would be to create the files that only contain the most probable matches at a threshold of p>95%. Optionally, you can also create files that only contain the other possible matches in the field that are not deemed as the "best" match.

In [None]:
all_file = '_GAIADR2_star_pm040mas_all_matches.txt'
rank_file = '_GAIADR2_star_pm040mas_all_matches_ranks.txt'
bayes_file = '_bayes_probs_per_mag_gaia_cut_b_cut.txt'
name = 'PAN_STARRS'
# whether or not you want to also create file for other possible matches
make_rank_2 = True

bayes_match.make_best_and_rank_2_sample(name, all_file, rank_file, bayes_file, make_rank_2)

It is important to note the structure of the resulting files so results can then be used as you like. This final section of the workflow will result in three main files for the best matches (and similar files for the other possible matches). These three files (for this example) are:

**PAN_STARRS_GAIADR2_star_pm040mas_all_matches_bayes_matches_best_match.txt**

With a table structure like (seperated over two lines cause its long):

| Gaia ID | RA (Gaia) | Dec (Gaia) | plx (Gaia) | plx_err (Gaia) | pmra (Gaia) | pmdec (Gaia) | Gmag (Gaia) | ID (External) | RA (External) | Dec (External) | Epoch |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ... | deg | deg | mas | mas| mas/yr | mas/yr | mag | ... | deg | deg | mjd |

| ang_sep_RA_impact | ang_sep_Dec_impact | Epoch_Impact | mag1 (external) | mag1_err (External) | ... | magN (external) | magN_err (External)| line_num |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| arcseconds | arcseconds | Decimal Year | mag | mag | ... | mag | mag | ... |

**PAN_STARRS_GAIADR2_star_pm040mas_all_matches_ranks_bayes_matches_best_match.txt**

With a table structure like:

| Rank | ang_sep_RA_mean | ang_sep_Dec_mean|
| --- | --- | --- |
| ... |  arcseconds | arcseconds |

**PAN_STARRS_GAIADR2_star_pm040mas_all_matches_bayes_probs_per_mag_gaia_cut_b_cut_bayes_matches_best_match.txt**

With a table structure like:

| bayes_prob_mag1 | ... | bayes_prob_magN |
| --- | --- | --- |
| ... |  ... | ... |

All three of these files have the same length and each row in a file matches the corresponding Gaia ID in the first file.