# get kepler EB catalog, ask:

In [1]:
! head -n12 ../doc/170127_Kepler_CBPs

Fri 27 Jan 2017 09:26:55 AM EST

QUESTIONS
1. How many detached EBs are in the Kirk+ EB catalog (e.g., re-generate that
distribution from the Martin Fabrycky Mazeh paper)?
2. With what period distribution?
3. With what observing baseline?
--------------------
Repeat above questions for what's looking like the HATN EB catalog.



In [2]:
pwd

'/home/luke/Dropbox/proj/cbp/notebooks'

In [3]:
ls

kepler_eb_catalog.ipynb  [0m[01;34mplots[0m/


In [4]:
! head ../data/kepler_eb_catalog_v3.csv

##
## Kepler Eclipsing Binary Catalog
## Revision Date (Y/M/D): Aug. 9, 2016, 10:12 a.m.
## Download Date (Y/M/D): 
##
## Query String: 
##
#KIC,period,period_err,bjd0,bjd0_err,morph,GLon,GLat,kmag,Teff,SC,
10417986,0.0737309,0.0000000,55000.027476,0.004231,0.99,81.0390,11.0820,9.1280,-1.0000,True,
8912468,0.0948380,0.0000000,54953.576945,0.005326,0.98,80.1095,7.8882,11.7510,6194.0000,False,


In [5]:
from astropy.io import ascii
import numpy as np, matplotlib.pyplot as plt
%matplotlib notebook

In [6]:
keb_path = '../data/kepler_eb_catalog_v3.csv'
cols = 'KIC,period,period_err,bjd0,bjd0_err,morph,GLon,GLat,kmag,Teff,SC'
cols = tuple(cols.split(','))

tab = ascii.read(keb_path)
currentcols = tab.colnames
for ix, col in enumerate(cols):
    tab.rename_column(currentcols[ix], col)
tab.remove_column('col12') # remnant of import

In [7]:
tab[:5]

KIC,period,period_err,bjd0,bjd0_err,morph,GLon,GLat,kmag,Teff,SC
int64,float64,float64,float64,float64,float64,float64,float64,float64,float64,str5
10417986,0.0737309,0.0,55000.027476,0.004231,0.99,81.039,11.082,9.128,-1.0,True
8912468,0.094838,0.0,54953.576945,0.005326,0.98,80.1095,7.8882,11.751,6194.0,False
8758716,0.1072049,0.0,54953.672989,0.006197,1.0,77.7478,11.6565,13.531,-1.0,False
10855535,0.1127824,0.0,54964.629315,0.006374,0.99,79.3949,15.9212,13.87,7555.0,False
9472174,0.1257653,1e-07,54953.643178,0.018318,0.78,79.0187,11.6745,12.264,10645.0,True


In [8]:
len(tab)

2876

This is the total number of identified _eclipsing and ellipsoidal_ binary systems in the Kepler FoV.

They build it by:
```
(1) EB signature detection (Section 3); 
(2) data detrending: all intrinsic variability (such as chromospheric activity, etc.) and extrinsic variability (i.e., third light contamination and instrumental artifacts) are removed by the iterative fitting of the photometric baseline (Prša et al. 2011); 
(3) the determination of the ephemeris: the time-space data are phase-folded and the dispersion minimized; 
(4) Determination of ETVs (Section 8.6);
(5) analytic approximation: every light curve is fit by a polyfit (Prša et al. 2008); 
(6) morphological classification via Locally Linear Embedding (LLE; Section 6), a nonlinear dimensionality reduction tool is used to estimate the “detachedness” of the system (Matijevič et al. 2012, hereafter Paper III); 
(7) EB characterization through geometric analysis and 
(8) diagnostic plot generation for false positive (FP) determination. Additional details on these steps can be found in Papers I, II, III, and IV. 
```
For inclusion in this Catalog we accept bonafide EBs and systems that clearly exhibit binarity through photometric analysis (heartbeats and ellipsoidals (Section 8.1). Throughout the Catalog and online database we use a system of subjective flagging to label and identify characteristics of a given system that would otherwise be difficult to validate quantitatively or statistically. Examples of these flags and their uses can be seen in Section 8. Although best efforts have been taken to provide accurate results, we caution that not all systems marked in the Catalog are guaranteed to be EB systems. There remains the possibility that some grazing EB signals may belong to small planet candidates or are contaminated by non-target EB signals. An in-depth discussion on Catalog completeness is presented in Section 10

Reading the appropriate sections, the "detached" parameter, `morph`, means:

* <0.1: well-detached
* 0.1-0.5: detached
* 0.5-0.7: semi-detached
* 0.7-0.8: over-contact
* \>0.8: ellipsoidal variables.

(N.b. it was obtained via locally linear embedding, a nice ML technique, that maybe I should consider using!)

In [9]:
len(tab[tab['morph']<0.5])

1571

In [10]:
import pandas as pd
cbps = pd.read_csv('../data/all_kepler_cbps.csv', delimiter='|')
cbps.head()

Unnamed: 0,KIC,KOI,name,M_1,M_2,a_in,P_in,e_in,R_p,a_p,P_p,e_p,"\Delta I_p,in",a_crit,ref,reflink
0,,,,M_\odot,M_\odot,au,days,unitless,R_\oplus,au,days,unitless,deg,au,,
1,12644769.0,1611.0,16.0,0.69,0.2,0.22,40.1,0.16,8.27,0.71,228.8,0.01,0.31,0.64,Doyle et al. (2011),http://www.sciencemag.org/cgi/doi/10.1126/scie...
2,8572936.0,2459.0,34.0,1.05,1.02,0.23,28,0.52,8.38,1.09,288.8,0.18,1.86,0.84,Welsh et al. (2012),http://www.nature.com/doifinder/10.1038/nature...
3,9837578.0,2937.0,35.0,0.89,0.81,0.18,20.7,0.14,7.99,0.6,131.4,0.04,1.07,0.5,Welsh et al. (2012),http://www.nature.com/doifinder/10.1038/nature...
4,6762829.0,1740.0,38.0,0.95,0.26,0.15,18.8,0.1,4.35,0.47,106,0.07,0.18,0.39,Orosz et al. (2012b),http://stacks.iop.org/0004-637X/758/i=2/a=87


In [11]:
cbps = cbps.drop(0, axis=0)

In [12]:
## UPON RETURN: PLOT UP THE BINARY PERIODS ON THE BELOW PLOT. N.B. THERE'S BEEN _NO_ DETECTED CBPS W/ P_p<50d

In [13]:
cbps

Unnamed: 0,KIC,KOI,name,M_1,M_2,a_in,P_in,e_in,R_p,a_p,P_p,e_p,"\Delta I_p,in",a_crit,ref,reflink
1,12644769.0,1611,16,0.69,0.2,0.22,40.1,0.16,8.27,0.71,228.8,0.01,0.31,0.64,Doyle et al. (2011),http://www.sciencemag.org/cgi/doi/10.1126/scie...
2,8572936.0,2459,34,1.05,1.02,0.23,28.0,0.52,8.38,1.09,288.8,0.18,1.86,0.84,Welsh et al. (2012),http://www.nature.com/doifinder/10.1038/nature...
3,9837578.0,2937,35,0.89,0.81,0.18,20.7,0.14,7.99,0.6,131.4,0.04,1.07,0.5,Welsh et al. (2012),http://www.nature.com/doifinder/10.1038/nature...
4,6762829.0,1740,38,0.95,0.26,0.15,18.8,0.1,4.35,0.47,106.0,0.07,0.18,0.39,Orosz et al. (2012b),http://stacks.iop.org/0004-637X/758/i=2/a=87
5,10020423.0,"3154, 7273",47b,1.04,0.36,0.08,7.4,0.02,2.98,0.3,49.5,0.04,0.27,0.2,Orosz et al. (2012a),http://www.sciencemag.org/cgi/doi/10.1126/scie...
6,10020423.0,"3154, 7273",47d,1.04,0.36,0.08,7.4,0.02,,0.72,187.3,,,0.2,Orosz et al. (2015),http://adsabs.harvard.edu/abs/2015ESS.....340201W
7,10020423.0,"3154, 7273",47c,1.04,0.36,0.08,7.4,0.02,4.61,0.99,303.1,<.41,1.16,0.2,Orosz et al. (2012a),http://www.sciencemag.org/cgi/doi/10.1126/scie...
8,4862625.0,6464,PH-1/64,1.5,0.4,0.18,20.0,0.21,6.18,0.65,138.5,0.07,2.81,0.54,"Schwamb et al. (2013), Kostov et al. (2013)",http://iopscience.iop.org/article/10.1088/0004...
9,12351927.0,7522,413,0.82,0.54,0.1,10.1,0.04,4.34,0.36,66.3,0.12,4.02,0.26,Kostov et al. (2014),http://iopscience.iop.org/article/10.1088/0004...
10,9632895.0,1451 (FP),3151,0.93,0.19,0.18,27.3,0.05,6.17,0.79,240.5,0.04,2.9,0.44,Welsh et al. (2015),http://iopscience.iop.org/article/10.1088/0004...


In [14]:
bins = np.logspace(-1,3,9)

allvals, binedge = np.histogram(tab['period'], bins=bins)
allvals = np.insert(allvals, 0, allvals[0])

detachedvals, binedge = np.histogram(tab[tab['morph']<0.5]['period'], bins=bins)
detachedvals = np.insert(detachedvals, 0, detachedvals[0])

f, ax = plt.subplots(figsize=(8,5.5))

ax.semilogx(bins, allvals, ls='steps-', lw=2, c='k', label='all morphologies')
ax.semilogx(bins, detachedvals, ls='steps--', lw=2, c='k', label='morph<0.5\n(detached or well-detached)')
ylims = ax.get_ylim()
ax.vlines(cbps['P_in'], ylims[0], ylims[1], color='k', linestyles=':', label='Binary period (CBP systems)')
ax.vlines(cbps['P_p'], ylims[0], ylims[1], color='b', linestyles=':', label='CBP periods')

leg = ax.legend(loc='best', fontsize='small')
leg.get_frame().set_linewidth(0.)

ax.set(xlabel='binary period', ylabel='number', title='Kepler EB catalog')
f.tight_layout()
f.savefig('plots/kepler_eb_period_histogram_bigbins.pdf')
f.show()

<IPython.core.display.Javascript object>

In [15]:
len(tab[tab['morph']<0.5]['period'])

1571

In [16]:
bins = np.logspace(-1,3,27)

allvals, binedge = np.histogram(tab['period'], bins=bins)
allvals = np.insert(allvals, 0, allvals[0])

detachedvals, binedge = np.histogram(tab[tab['morph']<0.5]['period'], bins=bins)
detachedvals = np.insert(detachedvals, 0, detachedvals[0])

f, ax = plt.subplots(figsize=(8,5.5))

ax.semilogx(bins, allvals, ls='steps-', lw=2, c='k', label='all morphologies')
ax.semilogx(bins, detachedvals, ls='steps--', lw=2, c='k', label='morph<0.5\n(detached or well-detached)')
ylims = ax.get_ylim()
ax.vlines(cbps['P_in'], ylims[0], ylims[1], color='k', linestyles=':', label='Binary period (CBP systems)')
ax.vlines(cbps['P_p'], ylims[0], ylims[1], color='b', linestyles=':', label='CBP periods')

leg = ax.legend(loc='best', fontsize='small')
leg.get_frame().set_linewidth(0.)

ax.set(xlabel='binary period', ylabel='number', title='Kepler EB catalog')
f.savefig('plots/kepler_eb_period_histogram_smallbins.pdf')
f.tight_layout()
f.show()

<IPython.core.display.Javascript object>

We want to overplot where known Kepler EBs are.

Then OOM: what is the total "size" of the detached EB dataset?

Note it's mostly 30 minute ("long") cadence for these targets. (I think).
If not, it's 1 minute ("short") cadence.

In [17]:
points_per_hr = 2
points_per_d = points_per_hr * 24.
points_per_yr = points_per_d * 365.25
points_tot = points_per_yr * 4
days_tot_obsd = 4*365.25

In [18]:
points_tot

70128.0

In [19]:
days_tot_obsd

1461.0

Kepler dataset: 1461 days of continuous observation, for 1571 detached EBs.                               

In [20]:
print('{:.4g} detached EB days of data'.format(1461*1571))

2.295e+06 detached EB days of data
