# Kepler Lightcurve Data

### 2021-05-25

$$
$$

<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!---->
<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->
<!--

-->

## Setup

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_notebooks/includes/notebook-setup.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_notebooks/includes/notebook-setup.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In [None]:
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 22})

<!--setupplotcode{import seaborn as sns
sns.set_style('darkgrid')
sns.set_context('paper')
sns.set_palette('colorblind')}-->

## pods

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/pods-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/pods-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In Sheffield we created a suite of software tools for ‘Open Data
Science.’ Open data science is an approach to sharing code, models and
data that should make it easier for companies, health professionals and
scientists to gain access to data science techniques.

You can also check this blog post on [Open Data
Science](http://inverseprobability.com/2014/07/01/open-data-science).

The software can be installed using

In [None]:
%pip install --upgrade git+https://github.com/lawrennd/ods

from the command prompt where you can access your python installation.

The code is also available on github: <https://github.com/lawrennd/ods>

Once `pods` is installed, it can be imported in the usual manner.

In [None]:
import pods

## mlai

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/mlai-software.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_software/includes/mlai-software.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

The `mlai` software is a suite of helper functions for teaching and
demonstrating machine learning algorithms. It was first used in the
Machine Learning and Adaptive Intelligence course in Sheffield in 2013.

The software can be installed using

In [None]:
%pip install --upgrade git+https://github.com/lawrennd/mlai.git

from the command prompt where you can access your python installation.

The code is also available on github: <https://github.com/lawrennd/mlai>

Once `mlai` is installed, it can be imported in the usual manner.

In [None]:
import mlai

## Kepler Lightcurve Data

<span class="editsection-bracket" style="">\[</span><span
class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/kepler-lightcurve-data.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_datasets/includes/kepler-lightcurve-data.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

This data set is from the Kepler Telescope. it was used by David W. Hogg
and Kate Storey-Fisher in their NeurIPS Tutorial “Machine Learning for
Astrophysics and Astrophysics Problems for Machine Learning.”

Their notebook associated with the tutorial can be found here:
<https://colab.research.google.com/drive/1TimsiQhhcK6qX_lD951H-WJDHd92my61>.

From their tutorial:

-   This is an introduction to working with time-series data.
-   Here we obtain a set of stellar photometry (light curves) from the
    NASA Kepler Mission.
-   This is just a tiny teaser: There are more than a hundred thousand
    stars observed, and most light curves span 4 years!

In [None]:
import pandas as pd
import pods

In [None]:
data = pods.datasets.kepler_lightcurves()

In `pods` the data is returned with the usual additional information,
and also the field “datasets” which includes which Kepler IDs are in the
data set.

In [None]:
print(data["datasets"])

We can plot the first few stars for visualization.

In [None]:
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai.mlai as ma

In [None]:
num_stars_plot = 3
count = 0
dataset = data["Y"].columns[0]
for kepler_id, X in data["Y"][dataset].items():
    count += 1
    fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
    ax.plot(X["TIME"], X["SAP_FLUX"])
    ax.set_xlabel("Barycentric Julian Date (d)")
    ax.set_ylabel("SAP Flux (instrumental units)")
    ax.set_title("Kepler ID {kepler_id}".format(kepler_id=kepler_id))
    ma.write_figure("kepler-lightcurve-data-{kepler_id}.svg".format(kepler_id=kepler_id), directory='./datasets')
    if count > num_stars_plot:
        break

<img src="https://inverseprobability.com/talks/slides/diagrams//datasets/kepler-lightcurve-data-001720554.svg" class="" width="60%" style="vertical-align:middle;">

Figure: <i>Light curve from Kepler ID 001720554.</i>

In the notebook associated with their tutorial, Storey-Fisher and Hogg
note that barycentric time is different from earth centric time, to
illustrate, the plot the differences between time values for two
different stars in the same data set, showing that over time, despite
the Earth-centric time staying the same, the barycentric time is varying
for the two different stars.

In [None]:
kepler_id0 = data["datasets"][dataset][0]
kepler_id1 = data["datasets"][dataset][1]

X0 = data["Y"][dataset][kepler_id0]
X1 = data["Y"][dataset][kepler_id1]

In [None]:
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
ax.plot(X0["TIME"], X1["TIME"] - X0["TIME"], linewidth=3)
ax.set_xlabel("Barycentric Julian Date (d)")
ax.set_ylabel("Time differences (d)")
_ = ax.set_title("Barycentric time is freaky")
ma.write_figure("barycentric-time-difference.svg", directory='./datasets')

<img src="https://inverseprobability.com/talks/slides/diagrams//datasets/barycentric-time-difference.svg" class="" width="60%" style="vertical-align:middle;">

Figure: <i>Difference between Barycentric time values for Kepler ID
001720554 and Kepler ID 002696955.</i>

## References