# Your First TrackStar Program

In this tutorial, we'll set up a simple mock data set and compute the likelihood function for the input model and a few simple variations.
We'll start by importing NumPy and TrackStar.

In [1]:
import numpy as np
import trackstar



## A Mock Data Sample

Now let's set up a mock data sample.
The workhorse for storing data in TrackStar is ``sample``.
Let's make one:

In [7]:
sample = trackstar.sample()
print(sample)

sample([
        N = 0
])


Right now, the sample is empty.
Let's add some data vectors to it!
For illustrative purposes, our mock sample will be straight-forward.

We'll simply take $N = 100$ points along the line $x = y = z$ in 3-dimensional space.
The sample will have an intrinsic distribution the follows a Gaussian centered on zero (i.e. $\langle x\rangle = \langle y\rangle = \langle z\rangle = 0$) with standard deviation of $\sigma = 1$.
To mimic the effects of measurement uncertainty, we'll perturb each "measurement" of $x$, $y$, and $z$ by random numbers drawn from a Gaussian distribution with a width $\sigma_x = \sigma_y = \sigma_z = 0.1$.

To demonstrate TrackStar's user-friendliness for fitting non-uniform samples, we'll let only *half* of our data vectors have measurements of $z$ available.
Such instances may arises, e.g., within astrophysics, when not every star has a reliable age measurement.

In [8]:
for i in range(100):
    # draw a vector based on the underlying model x = y = z
    true_x = true_y = true_z = np.random.normal(loc = 0, scale = 1)

    # perturb the value by measurement uncertainties
    x = true_x + np.random.normal(loc = 0, scale = 0.1)
    y = true_y + np.random.normal(loc = 0, scale = 0.1)
    z = true_z + np.random.normal(loc = 0, scale = 0.1)
    
    # now let's make a data vector for these measurements and
    # enter the measurement uncertainties into the covariance matrix
    vector = {"x": x, "y": y}
    if i % 2: vector["z"] = z # only half of the sample
    datum = trackstar.datum(**vector)
    datum.cov[0, 0] = 0.1**2 # x
    datum.cov[1, 1] = 0.1**2 # y
    if i % 2: datum.cov[2, 2] = 0.1**2 # z

    # now we can add the data vector to the sample
    sample.add_datum(datum)

Now that we've added our mock data vectors to our sample, let's inspect it!

In [9]:
print(sample)

sample([
        N = 100
        x --------------> [-3.1499e-01,  7.7927e-01,  4.9084e-03, ...,  6.3932e-01,  1.7028e+00,  1.6719e+00]
        y --------------> [-3.2047e-01,  8.6081e-01,  7.4616e-02, ...,  6.7807e-01,  1.6721e+00,  1.5787e+00]
        z --------------> [ nan       ,  8.2899e-01,  nan       , ...,  5.8805e-01,  nan       ,  1.7248e+00]
])


This output shows a summary of the full sample in a format similar to NumPy arrays.
Note in particular that every other measurement of $z$ is recorded as ``nan``, because we only provided a value of $z$ for every other data vector.
There are a few caveats associated with this behavior, which we recommend new users familiarize themselves with (see the [note on NaNs in TrackStar](#an-important-note-on-nans-in-trackstar) below).

## An Important Note on NaNs in TrackStar

While TrackStar returns ``nan`` values when it does not have a measurement for a particular quantity, they are not stored in the backend. As a consequence, **they do not actually correspond to actual blocks of memory stored on your system**.
The ``nan`` values that it returns therefore do not correspond to an existing memory address; instead, TrackStar simply tracks which measurements are available for which quantities, so it knows when it should return a ``nan``.

Based on this behavior, TrackStar does not allow users to change ``nan`` values to real numbers or vice versa. Doing so would change the dimensionality of the stored data and result in memory errors.
If new measurements are to be added to a data vector that already exists as a ``datum`` variable, one must simply make a new ``datum``.

### Beware when Indexing Non-Uniform Samples

When working with data in which some vectors do not have measurements for every quantity, as is the case in our [mock sample](#a-mock-data-sample) in the $z$-direction, a ``KeyError`` may arise if one is not careful.
The notion that *no memory is allocated for the missing quantities* is central to this behavior.
To demonstrate this behavior, consider the block of code below, where we ask TrackStar for the measurement of $z$ associated with our $0$'th datum, which has no such measurement:

In [11]:
print(sample["z", 0])
print(sample[0, "z"])
print(sample["z"][0])
print(sample[0]["z"])

nan
nan
nan


KeyError: 'Unrecognized datum label: z'

The first three times we indexed our sample, we got the expected ``nan`` value back.
However, a ``KeyError`` was raised the fourth time.
The difference between the fourth line and the preceding lines is that we *first* asked TrackStar for the $0$th datum, and *then* asked that datum for its measurement of $z$, for which there is none.
Unaware that it is embedded in a sample that contains measurements of these quantities, it raises an error.
In the third line, we first asked the sample as a whole for every $z$ measurement, so TrackStar knew to include a ``nan`` for the $0$'th datum.
In the second and third lines, we gave the sample both ``"z"`` and ``0`` simultaneously, so it immediately knew that we were asking for a measurement that does not exist.

As discussed above, it is both faster and more memory efficient to index TrackStar data with all values simultaneously.
Consequently, ``sample[0, "z"]`` and ``sample["z", 0]`` are recommended over ``sample["z"][0]`` anyway.