# Providing Data - Level 1
## How to tell Hyrax what data to give to a model.

Every model needs data. It learns from data during training, 
and it makes predictions from data during inference.
In Hyrax, that flow of information happens through two main pieces: 
a ``HyraxDataset`` and a ``DataProvider``.
A ``HyraxDataset`` is the code that knows how to read specific data from disk.
A ``DataProvider`` is the part we actually ask for data — it calls on one or more datasets,
retrieves the fields we need, and hands everything back as a clean, well-structured Python dictionary.

In this guide, we’ll take our very first steps with data in Hyrax. Here’s what we’ll do:
- Learn how to use a DataProvider to tell Hyrax what data a model should see.
- Look inside the DataProvider to understand what the data looks like once it’s ready.

To keep things simple, we’ll practice with a built-in Dataset called ``HyraxRandomDataset``.
Think of it as “practice data” that stands in for the real thing.

As always, our first move will be to create an instance of the Hyrax class.

In [None]:
from hyrax import Hyrax

h = Hyrax()

Next we'll try to tell Hyrax that we want to use the ``HyraxRandomDataset`` as the source for our data provider.

In [None]:
model_inputs_definition = {"data": {"dataset_class": "HyraxRandomDataset"}}

h.set_config("model_inputs", model_inputs_definition)

# Prepare "model_inputs"
d = h.prepare()

# Print a sample of the data
d.sample_data()

Hooray — success! 🎉 We just told Hyrax which Dataset to use, called ``h.prepare()`` to set it up,
and printed out a sample of the data.
The configuration we used is called ``model_inputs``, because it defines what will be sent into the model.
(And yes, the name is plural — models can take more than one input, but we’ll save that for the next notebook.)

What we’ve created here is the simplest possible setup: it tells Hyrax to use ``HyraxRandomDataset``
as the source of data for both training and inference. That’s all we need to get started!

Before we move on, there are two details worth highlighting:
- The dictionary key ("data") is up to you. You can name it whatever you like, as long as each key is unique.
- The value of "dataset_class" is the Dataset you want Hyrax to use. Here we picked ``HyraxRandomDataset``, but you could swap in any class.

## Examine some of the data
Now that our ``DataProvider`` is set up to return only the fields we care about,
let’s take a closer look at the data itself.

We’ve already seen the ``DataProvider.sample_data()`` function, which returns the first sample it can find.
Because Hyrax retrieves data by index, it’s also easy to explore different parts
of the dataset by sampling at random indices.
This helps you get a feel for the form and structure of the data before feeding it into a model.

In [None]:
d[12]

No big surprise — the output here looks very similar to what we saw with ``d.sample_data()``.
Remember, the data is returned as a nested dictionary, with the top-level key matching the friendly name, "data".

You can use any integer index up to the size of the dataset, but how do we figure out that size?
Simple: just use Python’s ``len(...)`` function!
This lets us see exactly how many samples are available.

In [None]:
len(d)

## The primary id field
One more important option is `primary_id_field` — it tells Hyrax which value to include with each sample so you can trace inference outputs back to the original input.

Each dataset can provide a unique identifier per sample (a name, an index, or any value). In this dataset the identifier is returned by `get_object_id`, so request it by setting "primary_id_field" to "object_id":

In [None]:
model_inputs_definition = {
    "data": {
        "dataset_class": "HyraxRandomDataset",
        "primary_id_field": "object_id",
    },
}

h.set_config("model_inputs", model_inputs_definition)

# Prepare "model_inputs"
d = h.prepare()

# Print a sample of the data
d.sample_data()

With that change the model input is fully specified; training or inference can now be run using the built-in Hyrax models (for instance `h.train()` or `h.infer()`). We'll leave that as an exercise — training and inference are covered in more detail in a later notebook.

## Persisting the configuration

In Hyrax, all configurations — including ``model_inputs`` — are saved to the configuration .toml file,
along with any results. This makes it easy to reuse or share your setup later.

For our example, the saved configuration would look like this:

    [model_inputs]
    [model_inputs.data]
    dataset_class = 'HyraxRandomDataset'
    primary_id_field = 'object_id'

This ensures that Hyrax remembers exactly which Dataset and fields you want for future runs.

## Recap

Great job! We covered a lot of ground in this notebook and learned the basics of providing data to models in Hyrax. 
Here’s a quick summary of what you accomplished:

- Learned how to use `DataProvider` to supply data for your models.
- Configured which dataset to use by updating the "model_inputs" configuration.
- Previewed sample data and checked the dataset size to understand your data better.
- Specified a primary ID field for traceability in your results.
- Saw how Hyrax saves your configuration for easy reuse and sharing.

You’re now ready to set up data for training or inference with Hyrax!
If you want more control of the datasets used for training and inference,
"model_inputs" can accept more configurations.
Checkout out Providing Data - Level 2 for more.