# Quickstart

The latest release of TAPE is installable via pip, using the following command:

In [None]:

%pip install lf-tape --quiet


For more detailed installation instructions, see the [Installation Guide](installation.html).

TAPE provides a scalable framework for analyzing astronomical time series data. Let's walk through a brief example where we calculate the Structure Function for a set of spectroscopically confirmed QSOs. First, we grab the available TAPE Stripe 82 QSO dataset:

In [None]:
from tape import Ensemble

ens = Ensemble()  # Initialize a TAPE Ensemble
ens.from_dataset("s82_qso", sorted=True)

This dataset contains 9,258 QSOs, we can view the first 5 entries in the "object" table to get a sense of the available object-level information:

In [None]:
ens.head("object", 5)

The Ensemble stores data in two `dask` dataframes, object-level information in the "object" table as shown above, and individual time series measurements in the "source" table. As a result, many operations on the Ensemble closely follow operations on `dask` (and by extension `pandas`) dataframes. Let's filter down our large QSO set to a smaller set with the total number of observations per object within a certain range:

In [None]:
ens.calc_nobs()  # calculates number of observations, produces "nobs_total" column
ens = ens.query("nobs_total >= 95 & nobs_total <= 105", "object")

We can now view the entirety of our remaining QSO set:

In [None]:
ens.compute("object")

Finally, we can calculate the Structure Function for each of these QSOs, using the available TAPE Structure Function Module:

In [None]:
from tape.analysis import calc_sf2

result = ens.batch(
    calc_sf2, sf_method="macleod_2012"
)  # The batch function applies the provided function to all individual lightcurves within the Ensemble
result.compute()

The result is a table of delta times (dts) and structure function (sf2) for each unique lightcurve (labeled by lc_id). We can now visualize our delta times versus the computed structure function for each unique object.

In [None]:
import matplotlib.pyplot as plt
from matplotlib import rcParams

%matplotlib inline
%config InlineBackend.figure_format = "retina"
rcParams["savefig.dpi"] = 550
rcParams["font.size"] = 20
plt.rc("font", family="serif")

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5, 4))
plt.scatter(result["dt"], result["sf2"], s=20, alpha=1, color="#353935")
plt.yscale("log")
plt.ylabel("Log(SF) (mag)")
plt.xlabel("Time Lag (days)")
plt.ylim(1e-3, 1e1)
plt.xlim(0, 2e3)

Finally, suppose we want to select the ID with the maximum sf2 value from the computed feature. Using the available `ens.to_timeseries()` that creates a TimeSeries object, we can access the light curve for the target ID.

In [None]:
max_id = result.compute()["sf2"].idxmax()[0]
lc = ens.to_timeseries(max_id)
lc

In [None]:
filter_r = lc.band == "r"  # select filter

plt.figure(figsize=(8, 5))
plt.errorbar(
    lc.time[filter_r], lc.flux[filter_r], lc.flux_err[filter_r], fmt="o", color="red", alpha=0.8, label="r"
)
plt.minorticks_on()
plt.ylabel("Flux (mJy)")
plt.xlabel("Time (MJD)")
plt.legend(title="Band", loc="upper left")