# Dataset Build

This notebook will go through my target selection process step by step to make sure the selections and cuts are fully transparent.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from astropy.table import Table

### 1) Read in the [Yu et al. 2018](http://adsabs.harvard.edu/abs/2018arXiv180204455Y) catalogue

In [3]:
sfile = '../../Catalogues/RC_catalogues/Yu+18_table1.txt'
yu18_1 = pd.read_csv(sfile, sep='|')

sfile = '../../Catalogues/RC_catalogues/Yu+18_table2.txt'
yu18_2 = pd.read_csv(sfile, sep='|')
yu18 = pd.merge(yu18_1, yu18_2, on='KICID',how='left')
yu18.rename(columns={'EvoPhase':'stage',
                    'err_x':'numax_err',
                    'err.1_x':'dnu_err',
                    'err_y':'Teff_err',
                     'Fe/H':'[Fe/H]',
                    'err.2':'[Fe/H]_err',
                    'logg':'yu_logg',
                    'err.1_y':'yu_logg_err',
                    'err.3_y':'M_err',
                    'err.4_y':'R_err'},inplace=True) #For consistency
print('Targets: '+str(len(yu18)))

Targets: 16094


### 2) Read in the [Kepler x DR2](https://gaia-kepler.fun/) catalogue (thanks Megan Bedell)

In [4]:
data = Table.read('../data/KepxDR2/kepler_dr2_1arcsec.fits', format='fits')
kdf = data.to_pandas()
kdf.rename(columns={'kepid':'KICID'},inplace=True)
print('Targets: '+str(len(kdf)))

Targets: 195830


### 3) Merge the KeplerxDR2 and Yu18 catalogues on KIC

In [5]:
xyu18 = pd.merge(yu18, kdf, on='KICID',how='left')
print('Targets: '+str(len(xyu18)))

Targets: 16135


#### 3.1) Only keep duplicates with the smallest angular separation

NOTE: THIS SECTION IS INCOMPLETE

In [6]:
xyu18.drop_duplicates(inplace=True)
print('Targets: '+str(len(xyu18)))

Targets: 16125


NOTE: WHY DO I LEAVE WITH MORE DATA?

### 4) Select Red Clump stars only

In [7]:
rcxyu18 = xyu18[xyu18.stage==2]
print('Targets: '+str(len(rcxyu18)))

Targets: 7719


#### 4.1) Removing any infinite parallax values

In [8]:
rcxyu18 = rcxyu18[np.isfinite(rcxyu18.parallax)]
print('Targets: '+str(len(rcxyu18)))

Targets: 7689


#### 4.2) Removing any parallax uncertainties above 35%

In [29]:
rcxyu18 = rcxyu18[rcxyu18.parallax_error/rcxyu18.parallax < .35]
print('Targets: '+str(len(rcxyu18)))

Targets: 7546


### 5) Add photometry from [2MASS](http://vizier.u-strasbg.fr/cgi-bin/VizieR?-source=B/2mass)
I'll also remove any data that have negative or otherwise unphysical magnitudes or uncertainties.

In addition to the 2MASS photometry, we also use the measure of magnitude in the *Gaia* *G* Band provided with the *Gaia* sample.

In [9]:
twomass = pd.read_csv('../data/KepxDR2/asu.tsv',sep='|',skiprows=52)
twomass.head(2)
'''DO THAT MAGIC TOM DID'''
'''RUN A SANITY CHECK ON MAG VALUES AND ERRORS'''
print('Targets: '+str(len(rcxyu18)))

Targets: 7689


### 6) Get reddening & extinction from [Bayestar 17](http://argonaut.skymaps.info/) ([Green et al. 2018](http://adsabs.harvard.edu/abs/2018AAS...23135002G))

We use the [Bailer-Jones estimated distance](https://arxiv.org/abs/1804.10121) to get a measure of the reddening. Any changes in reddening due to using this value over, say, using 1/$\varpi$, falls within the priors on extinction in our model.

For extinction coefficients, please see references on [omnitool.literature_values](https://github.com/ojhall94/omnitool/blob/master/omnitool/literature_values.py)

In [34]:
from omnitool import spyglass
from omnitool.literature_values import Av_coeffs
sg = spyglass()
sg.pass_position(rcxyu18.ra, rcxyu18.dec, frame='icrs')
sg.pass_distance(rcxyu18.r_est)
rcxyu18['Ebv'] = sg.get_Ebv()
rcxyu18['Aks'] = rcxyu18.Ebv * Av_coeffs['Ks'].values[0]
rcxyu18['Aj'] = rcxyu18.Ebv * Av_coeffs['J'].values[0]
rcxyu18['Ah'] = rcxyu18.Ebv * Av_coeffs['H'].values[0]
rcxyu18['Ag'] = rcxyu18.Ebv * Av_coeffs['G'].values[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row

### 7) Calculate (a *basic* value for) the asteroseismic absolute magnitude
We'll do this using my omnitool package to run the asteroseismic scaling relations (no corrections to them for now, hence the basic).

Bolometric Corrections are taken using the method by Casagrande et al. (??)[**SOURCE?**]

#### 7.1) Get the asteroseismic bolometric magnitude

In [24]:
from omnitool.literature_values import Rsol, Msol, Lsol, Zsol
from omnitool import scalings

sc = scalings(rcxyu18, rcxyu18.numax, rcxyu18.dnu, rcxyu18. Teff,\
                      _numax_err = rcxyu18.numax_err, _dnu_err = rcxyu18.dnu_err,\
                      _Teff_err = rcxyu18.Teff_err)
rcxyu18['R'] = sc.get_radius()/Rsol
rcxyu18['R_err'] = sc.get_radius_err()/Rsol
rcxyu18['M'] = sc.get_mass()/Msol
rcxyu18['M_err'] = sc.get_mass_err()/Msol
rcxyu18['logg'] = sc.get_logg()
rcxyu18['logg_err'] = sc.get_logg_err()
rcxyu18['L'] = sc.get_luminosity()/Lsol
rcxyu18['L_err'] = sc.get_luminosity_err()/Lsol
rcxyu18['Mbol'] = sc.get_bolmag()
rcxyu18['Mbol_err'] = sc.get_bolmag_err()
rcxyu18['Z'] = Zsol * 10 ** rcxyu18['[Fe/H]'].values

#### 7.2) Read in the Casagrande Bolometric Corrections

In [None]:
bcdf = pd.read_csv('../data/KepxDR2/casagrande_bcs.csv')
rcxyu18 = pd.merge(rcxyu18, bcdf, on='KICID',how='left')

#### 7.3) Calculate the absolute magnitudes in J, H and Ks

We assume the intrinsic error on the bolometric correction to be of the order of $0.02\rm mag$

In [22]:
err_bc = 0.02#mag
rcxyu18['ast_MKs'] = rcxyu18.Mbol - bcdf.BC_K
rcxyu18['ast_MH'] = rcxyu18.Mbol - bcdf.BC_H
rcxyu18['ast_MJ'] = rcxyu18.Mbol - bcdf.BC_J
rcxyu18['ast_MG'] = rcxyu18.Mbol - bcdf.BC_G
rcxyu18['ast_M_err'] = np.sqrt(rcxyu18.Mbol_err**2 + err_bc**2)

NameError: name 'bcdf' is not defined

In [23]:
print('Targets: '+str(len(rcxyu18)))

Targets: 7689
