# Introduction

LightCurveLynx is a package for large-scale, time-domain forward-modeling of astronomical light curve data. Simulations incorporate realistic effects, including survey cadence, dust extinction, and instrument noise models. LightCurveLynx is designed to enable user extensibility, such as adding new models, effects, and instruments, while ensuring scalability.

In this tutorial, we discuss the overall flow of LightCurveLynx and how to use it to run simulations. The goal is to get a new user started and allow them to explore the package.

Later tutorials cover topics in more depth, including:
  - Sampling Parameters (sampling.ipynb) - Provides an introduction to parameters and how they are sampled within a simulation run.
  - Adding new model types (adding_models.ipynb) - Provides a more in-depth discussion of ``BasePhysicalModel``, ``BandfluxModel``, and ``SEDModel`` subclasses and how to add new models.
  - Add new effect types (addings_effects.ipynb) - Provides a discussion of the ``EffectModel`` class, how it is used, and how to create new subclasses.
  - Working directly with passbands (passband-demo.ipynb)
  - Working directly with ObsTables / Rubin OpSims (opsim_notebook.ipynb)

## Program Flow

LightCurveLynx generates synthetic light curves using the flow shown in the illustration below. A `BasePhysicalModel` and information about the parameter distributions is used to sample the models. These are combined with information from an `ObsTable`, such as a Rubin `OpSim`, to generate sample flux densities at a given set of times and wavelengths (or passbands), accounting for effects such as redshift. The simulator also applies other relevant effects to the rest frame flux densities (e.g. dust extinction) and the observer frame flux densities (detector noise). At the end the code outputs a series of samples.

![The simulation flow](../_static/lightcurvelynx-intro.png "The simulation flow")

## Models

All light curves are generated from model objects that are a subclass of the `BasePhysicalModel` class. These model objects provide mechanisms for:
  - Sampling their parameters from given distributions,
  - Generating flux densities at given times and wavelengths (or passbands), and
  - Applying noise and other effects to the observations.

A major goal of LightCurveLynx is to be easily extensible so that users can create and analyze their own models. See the `adding_models.ipynb` notebook for examples of how to add a new type of models.

Each "sample" of the data consists of a new sampling of the model's parameters and a generation of flux densities from those parameters. Thus, when a user generates a hundred samples, they are generating 100 light curves from 100 sample objects. For a detailed description of how sampling works, see the `sampling.ipynb` notebook.

We can demonstrate this simulation flow using `SinWaveModel`, a toy model that generates fluxes using a sin wave. The `SinWaveModel` object uses multiple parameters to generate its flux, so we need to speccify how to set these. For some parameters we may have a fixed value, such as a brightness of 100.0. But in most simulations we will want to values of the parameters themselves to vary. We can set these from other nodes (any object that generates or uses parameters). Below we set two of the model's parameters (`frequency` and `t0`) from uniform distributions and two (`RA` and `dec`) are chosen from a Gaussian that matches the toy survey information we will load later in this notebook.

LightCurveLynx provides tools for generating parameters from a range of models and distributions. For example we can sample (RA, dec) directly from the survey data itself. For more information on how to define the parameter settings, see the `sampling.ipynb` notebook.

In [None]:
from lightcurvelynx.math_nodes.np_random import NumpyRandomFunc
from lightcurvelynx.models.basic_models import SinWaveModel

model = SinWaveModel(
    brightness=2000.0,
    amplitude=200.0,
    frequency=NumpyRandomFunc("uniform", low=0.01, high=0.1),
    t0=NumpyRandomFunc("uniform", low=0.0, high=10.0),
    ra=NumpyRandomFunc("normal", loc=200.5, scale=0.01),
    dec=NumpyRandomFunc("normal", loc=-50.0, scale=0.01),
    node_label="sin_wave_model",
)

We use the models, such as `SinWaveModel`, to generate flux densities from the sampled input parameters. We can manually evalute a model using the `evaluate_sed()` function where we provide the wavelengths and times to sample:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

times = np.arange(100.0)
wavelengths = np.array([7000.0])
fluxes = model.evaluate_sed(times, wavelengths)

plt.plot(times, fluxes)
plt.xlabel("Time")
plt.ylabel("Flux")
plt.show()

The power of the simulation software is that we can generate a large number of light curves from a distribution of models. We start by using a `BasePhysicalModel` object's `sample_parameters` function to sample the parameters that can create this distribution of objects. 

Let's start with generating 5 sample objects. We save the samples in a `GraphState` object. Users will not need to deal with this object directly, but it can be used to peek at the underlying parameters.

In [None]:
state = model.sample_parameters(num_samples=3)
print(state)

Most users will not need to interact directly with the `GraphState` object, but at a very high level it can be viewed as a nested dictionary where parameters are indexed by two levels. First, a node label tells the code which Python object is storing the parameter. This level of identification is necessary to allow different stages to use parameters with the same name. Second, the parameter name maps to its stored values.

Each (node name, parameter name) combination corresponds to a list of sample values for that parameter. Parameters are sampled together so that the i-th entires of each parameter represent a single, mutually consistent sampling of parameter space. For example you may want to generate all the parameters for a Type Ia supernova given information about the host galaxy. For a lot more detail see the `GraphState` section in the `sampling.ipynb` notebook. For now it is sufficient to know that `state` is tracking the sampled parameters.

By passing the sampled state into `evaluate_sed()` we can generate multiple light curves (one for each sample) at once:

In [None]:
fluxes = model.evaluate_sed(times, wavelengths, state)

plt.plot(times, fluxes[0, :], color="blue")
plt.plot(times, fluxes[1, :], color="green")
plt.plot(times, fluxes[2, :], color="red")
plt.xlabel("Time")
plt.ylabel("Flux")
plt.show()

## Effects

Users can add effects to their physical model objects to account for real world aspects such as noise and dust extinction. For more detail on effects, including how to define your own, see the `adding_effects.ipynb` notebook.

Note: Detector noise and redshift are not added effects, but rather automatically applied. Redshift effects are applied to SED-type models only based on the object's `redshift` parameter. Detector noise is applied to all model types from the `ObsTable` information (see the `ObsTable` section below for more details).

For this demo, we add a simple white noise effect to the model (rest frame). For real simulations we would want to add a range of effects, such as dust extinction.

In [None]:
from lightcurvelynx.effects.white_noise import WhiteNoise

# Create the white noise effect.
white_noise = WhiteNoise(white_noise_sigma=10.0)
model.add_effect(white_noise)

# Evaluate the model with white noise applied (a single sample).
flux = model.evaluate_sed(times, wavelengths)
plt.plot(times, flux)
plt.xlabel("Time")
plt.ylabel("Flux")
plt.show()

## ObsTable and Passbands

To generate a reasonable simulation we need to provide instrument and survey information. We use two classes `ObsTable` and `PassbandGroup` to load and work with this information.

### OpSim

The `ObsTable` object is used to store survey information, including pointings and weather conditions.  In this notebook we use a specific subclass, `OpSim`, which models Rubin's simulated operations database. For more detail on the `OpSim` class, its capabilities, and how to work with it, see the `opsim_notebook.ipynb` notebook.

The `OpSim` class is also used to extract information about the detector for modeling detector noise.

For this demo we load a small example database included with the code.

In [None]:
from lightcurvelynx.obstable.opsim import OpSim

opsim_file = "../../tests/lightcurvelynx/data/opsim_shorten.db"
ops_data = OpSim.from_db(opsim_file)

print(f"Loaded an opsim database with {len(ops_data)} entries.")
print(f"Columns: {ops_data.columns}")
print(f"Time range: [{ops_data['time'].min()}, {ops_data['time'].max()}]")

### PassbandGroup

The `PassbandGroup` object provides a mechanism for loading and applying the instrument’s passband information. Users can manually specify the passband values, load from given files, or load from a preset (which will download the files if needed). For more detail on the `PassbandGroup` class, see the `passband-demo.ipynb` notebook.

For this demo, we load in the preset LSST filters. When loading from a preset, we provide the option to specify the directory in which the cached passbands are stored. We use a test data directory in this notebook, but in many cases you will want to use `data/passbands/` from the root directory.

In [None]:
from lightcurvelynx.astro_utils.passbands import PassbandGroup

# Use a (possibly older) cached version of the passbands to avoid downloading them.
table_dir = "../../tests/lightcurvelynx/data/passbands"
passband_group = PassbandGroup.from_preset(preset="LSST", table_dir=table_dir)
print(passband_group)

## Generate the simulations

The simulation itself is run using a call to the `simulate_lightcurves()` function. This function will perform the parameter sampling, query the model, and apply any effects. It applies both types of effects (as described in the "Effects" section) and detector noise (as described in the "ObsTable" section).

We redefine the model to use a `t0` that is consistent with the MJDs in the survey.

The data from `simulate_lightcurves()` is returned as a [nested-pandas dataframe](https://github.com/lincc-frameworks/nested-pandas) for easy analysis. Each row corresponds to a single sampled object. The nested columns include the time series information for the light curves.

In [None]:
from lightcurvelynx.simulate import simulate_lightcurves

model = SinWaveModel(
    brightness=2000.0,
    amplitude=200.0,
    frequency=NumpyRandomFunc("uniform", low=0.01, high=0.1),
    t0=60796.0,
    ra=NumpyRandomFunc("normal", loc=200.5, scale=0.01),
    dec=NumpyRandomFunc("normal", loc=-50.0, scale=0.01),
    node_label="sin_wave_model",
)

lightcurves = simulate_lightcurves(
    model,  # The model to simulate (including effects).
    1_000,  # The number of light curves to simulate.
    ops_data,  # The survey information.
    passband_group,  # The passband information.
)
print(lightcurves)

We can drill down into a single row of the results (e.g. sample number 0)

In [None]:
print(lightcurves.iloc[0])

and view the light curve for that sample

In [None]:
print(lightcurves.iloc[0].lightcurve)

As shown each row in the `lightcurves` table includes all the information for that sample and an embedded table containing the object's lightcurve according to the survey strategy.

We can use this information to plot the samples.

In [None]:
from lightcurvelynx.utils.plotting import plot_lightcurves

lc = lightcurves["lightcurve"][0]
plot_lightcurves(
    lc["flux"],
    lc["mjd"],
    fluxerrs=lc["fluxerr"],
    filters=lc["filter"],
)

## Reconstructing the Underlying Model

All of the information needed to reconstruct each sample’s model is included as a (flattened) dictionary in the results’ “params” column:

In [None]:
print(lightcurves["params"][0])

We can convert those flattened dictionaries back to the `GraphState` objects (which allows us to replay the simulation) using `from_dict` for a single state or `from_list` for multiple states:

In [None]:
from lightcurvelynx.graph_state import GraphState

state_0 = GraphState.from_dict(lightcurves["params"][0])
print("First sample: ", state_0)

The simulation tools also have a function to generate the noise free light curves in each band over a given set of times. This returns a dictionary of filter name to band fluxes at each time. We extend the light curve out beyond the two sampled points to give a better idea of the shape.

In [None]:
from lightcurvelynx.simulate import compute_single_noise_free_lightcurve

noise_free_lcs = compute_single_noise_free_lightcurve(
    model,
    state_0,
    passband_group,
    rest_frame_phase_min=-10.0,  # 10 days before t0
    rest_frame_phase_max=40.0,  # 40 days after t0
    rest_frame_phase_step=0.5,  # 2 samples per day
)

We can plot the noise free curves as a background line when plotting the light curves, using the `underlying_model` parameter:

In [None]:
lc_0 = lightcurves["lightcurve"][0]
plot_lightcurves(
    lc_0["flux"],
    lc_0["mjd"],
    fluxerrs=lc_0["fluxerr"],
    filters=lc_0["filter"],
    underlying_model=noise_free_lcs,
)

## Saving Data

We can save the results of a simulation using nested-pandas `to_parquet` function. This will save the entire result set in a single file.

In [None]:
from pathlib import Path

scratch_dir = Path("./scratch")
scratch_dir.mkdir(exist_ok=True)

lightcurves.to_parquet(scratch_dir / "simulated_lightcurves.parquet")

Since individual light curves, such as `lc_0` above, are stored in pandas frames, we can save them individually using any of panda's built-in functions.

We can also output the simulation results as a [LSDB](https://docs.lsdb.io/en/latest/index.html) `Catalog`. These catalogs can be read in and analyzed by the LSDB tools.

In [None]:
from lightcurvelynx.utils.io_utils import write_results_as_hats

write_results_as_hats(scratch_dir / "lsdb_dir", lightcurves, overwrite=True)

## Conclusion

This tutorial barely scratches the surface on what LightCurveLynx can do and how it operates. The goal is to provide an overview. Interested users are encouraged to explore the other tutorial notebooks or reach out directly to the team.