# Banyan Sigma:
## Get Bayesian probabilities for membership in young associations

### Requirements:
Data file with, at minimum, RA, DEC, PMRA, PMDEC, EPMRA (the error), and EPMDEC. Parallax (labeled PLX), radial velocity and its error (RV, ERV), and many other parameters are optional.

I'm using Gaia data that Beck had already pulled for the distance re-estimates. It resolved ~360 of the stars in the database.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#import the python file
import importlib, banyan_sigma
importlib.reload(banyan_sigma)

#import the main function from the file
from banyan_sigma import banyan_sigma

In [None]:
# read in csv (made in a separate folder and then moved to here)
# (Gaia data, trimmed to just the columns we need)
stars = pd.read_csv('follette_banyansigma_input.csv')

#get rid of weird first column
stars.drop(labels='Unnamed: 0',axis=1,inplace=True)

#rename the columns to Banyan Sigma preferred names
#(can also make a dictionary with your column name and their preferred, but this is easier)
stars.columns = ['NAME','RA','DEC','PMRA','PMDEC','EPMRA','EPMDEC','RV','ERV','PLX','EPLX']

#remove rows with negative parallax values, which are clearly unphysical
#(removes 6 negative plx values and 28 with nan values)
stars = stars[stars['PLX']>0]

#reset index of stars. drop other index column and do so in place.
stars.reset_index(drop=True,inplace=True)

In [None]:
#glimpse data
#we recover 327 stars.
stars

### Run Banyan Sigma Query

In [None]:
# run the function: need the data, and setting use parallax and rv to True
# the membership probs will take plx and rv into account when they're available
banyansigma_output = banyan_sigma(stars_data=stars, use_plx=True, use_rv=True)

Note the output has a LOT of columns and sub-structures. We're interested in:
1. The most probable association for each star (or field, if that's most probable)
2. The membership probability for that association (or for the field)

To get there, we'll use two keys (like columns, but they are structures themselves) of the output structure:

1. ALL: "A structure that contains the Bayesian probability (0 to 1) for each of the associations (as individual keys)." (from the README.md file)
2. BEST_HYP: "Most probable Bayesian hypothesis (including the field)"

In the future, we might want to look at METRICS to better understand the strength of these results. For now, this works.

In [None]:
#truncate output structure to the relevant keys
membership_probs = banyansigma_output[['ALL','BEST_HYP']]

In [None]:
#pull the membership probability for the most probable association.

#empty lists for best probability and best hypothesis
#easier to put into output csv
BEST_HYP = []
BEST_PROB = []


#run through each row
for i in np.arange(0,len(stars),1):
    
    #get best hypothesis
    best_hyp = membership_probs['BEST_HYP']['Global'][i]
    best_prob = membership_probs['ALL'][best_hyp][i]
    
    BEST_PROB.append(best_prob)
    BEST_HYP.append(best_hyp)

In [None]:
stars['BEST_PROB'] = BEST_PROB
stars['BEST_HYP'] = BEST_HYP

In [None]:
#kinda unnecessary adding and then re-filtering
#but it's not a computation time problem so it's fine
stars_mem_probs = stars[['NAME','BEST_HYP','BEST_PROB']]
stars_mem_probs

In [None]:
#write names, best hypothesis, and the probability to a CSV
stars_mem_probs.to_csv('follette_membership_probabilities.csv')