# Using Rubin OpSims in Simulations

A critical aspect to producing realistic simulations of time varying phenomena is correctly modeling the cadence at which the objects will be observed. TDAstro provides an OpSim module that can load a Rubin opsim file and use that to filter observations. This notebook provides an introduction to using opsim data in the simulations.

## OpSim Class

The `OpSim` class provides a wrapper for loading and querying survey-specific information including pointings and survey information. The required information is:
  * the pointing data (RA, dec, and times for each pointing), and
  * the zeropoint information for each pointing (a zero point column or the information needed to derive it)

Internally, the `OpSim` class uses simple column names such as "ra" and "time". In order to allow importing from databases, the constructor allows the user to pass a column-mapping dictionary that maps the short column name to the name used within the database. The default column-mapper corresponds to the Rubin opsim format:
  * "airmass" -> "airmass"
  * "dec" -> "fieldDec"
  * "exptime" -> "visitExposureTime"
  * "filter" -> "filter"
  * "ra" -> "fieldRA"
  * "time" -> "observationStartMJD"

By default the `OpSim` class will look for Rubin column names, such as "observationStartMJD" and "fieldRA".

We can create a simple `OpSim` by manually specifying the data as a pandas dataframe.

In [None]:
import numpy as np
import pandas as pd

from tdastro.astro_utils.opsim import OpSim

input_data = {
    "observationStartMJD": np.array([0.0, 1.0, 2.0, 3.0, 4.0]),
    "fieldRA": np.array([15.0, 30.0, 15.0, 0.0, 60.0]),
    "fieldDec": np.array([-10.0, -5.0, 0.0, 5.0, 10.0]),
    "zp_nJy": np.ones(5),
}
pd_df = pd.DataFrame(input_data)

ops_data = OpSim(pd_df)
print(ops_data.table)

Alternatively, we can load an `OpSim` from a database file using the `from_db()` function.

In [None]:
from pathlib import Path

from tdastro import _TDASTRO_TEST_DATA_DIR

opsim_file = Path(_TDASTRO_TEST_DATA_DIR) / "opsim_shorten.db"
ops_data = OpSim.from_db(opsim_file)

print(f"Loaded an opsim database with {len(ops_data)} entries.")
print(f"Columns: {ops_data.columns}")

## Spatial Matching

The `OpSim` class provides a framework for efficiently determining when an object was observed given its (RA, dec). We use the `range_search()` function to the indices of all pointings that are within a given radius of the query point.

We start by taking the (RA, dec) of the first observation in the table and using that to determine all times this position was observed.

In [None]:
query_ra = ops_data["ra"][0]
query_dec = ops_data["dec"][0]
print(f"Searching for ({query_ra}, {query_dec}).")

# Find everything within 0.5 degrees of the query point.
matches = ops_data.range_search(query_ra, query_dec, 0.5)
print(f"Found {len(matches)} matches at {matches}")

Once we have the indices, we can use those to find the other information about the pointings.

In [None]:
ops_data["time"][matches]

We can run the spatial search in batch mode by providing a lists of RA and dec. The `range_search()` will return a list of numpy arrays where each element in the top-level list represents the matches for a single query (RA, dec). 

In [None]:
num_queries = 10
query_ra = ops_data["ra"][0:num_queries]
query_dec = ops_data["dec"][0:num_queries]

matches = ops_data.range_search(query_ra, query_dec, 0.5)
for idx, m_ids in enumerate(matches):
    print(f"{idx}: ({query_ra[idx]}, {query_dec[idx]}) matched {m_ids}")

Batching the spatial queries can be significantly more more efficient because the operations can be vectorized.

In [None]:
import timeit

num_queries = 100
query_ra = ops_data["ra"][0:num_queries]
query_dec = ops_data["dec"][0:num_queries]

In [None]:
%%timeit
for i in range(num_queries):
    _ = ops_data.range_search(query_ra[i], query_dec[i], 0.5)

In [None]:
%%timeit
_ = ops_data.range_search(query_ra, query_dec, 0.5)