# Hyrax Getting Started

In this getting started notebook we'll create an instance of a Hyrax object, train a builtin model on the CiFAR training dataset, and then use that trained model to run inference on the CiFAR testing dataset.

## Create a Hyrax instance

In [1]:
import hyrax

h = hyrax.Hyrax()

## Update the configuration

In [2]:
h.set_config("model.name", "HyraxAutoencoderV2")
h.set_config("train.epochs", 1)

data_definition = {
    "train": {
        "data": {
            "dataset_class": "HyraxRandomDataset",
            "data_location": "./data",
            "primary_id_field": "object_id",
        },
    },
    "infer": {
        "data": {
            "dataset_class": "HyraxRandomDataset",
            "data_location": "./data",
            "primary_id_field": "object_id",
        },
    },
}

h.set_config("model_inputs", data_definition)



For this demo, we'll make a few adjustments to the default configuration settings that the `hyrax` object was instantiated with.
By accessing the `.config` attribute of the hyrax instance, we can modify any configuration value.
There are many configuration values that can be set, but here, we update only the model to train.

## Train a model

In [3]:
m = h.train()

[2025-10-28 15:46:37,525 hyrax.verbs.train:INFO] [1m[30m[42mTraining model:[0m HyraxAutoencoderV2
[2025-10-28 15:46:37,526 hyrax.verbs.train:INFO] [1m[30m[42mTraining dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HyraxRandomDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, label, meta_field_1, meta_field_2, object_id
, 'infer': Name: data (primary dataset)
  Dataset class: HyraxRandomDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, label, meta_field_1, meta_field_2, object_id
}
2025-10-28 15:46:37,537 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': <hyrax.pytorch_ignite.SubsetSequentialSampler object at 0x30aab0650>, 'batch_size': 512, 'shuffle': False, 'collate_fn': None, 'pin_memory': False}
2025-10-28 15:46:37,537 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: dat

100%|##########| 1/1 [00:00<?, ?it/s]

[2025-10-28 15:46:38,169 hyrax.pytorch_ignite:INFO] Total training time: 0.52[s]
2025/10/28 15:46:38 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/10/28 15:46:38 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!
[2025-10-28 15:46:38,185 hyrax.verbs.train:INFO] Finished Training


The output of the training will be stored in a time-stamped directory under the `./results/`.
By default, a copy of the final configuration used in training is persisted as `runtime_config.toml`.
To train again with the same configuration, you can reference this runtime_config.toml file.

If running in another notebook, instantiate a hyrax object like so:
```
new_hyrax_instance = hyrax.Hyrax(config_file='./results/<timestamped_directory>/runtime_config.toml')
```

Or from the command line:
```
>> hyrax train --runtime-config ./results/<timestamped_directory>/runtime_config.toml
```

Note here we're training on only a small handful of CiFAR data, but Hyrax has demonstrated that it can scale up to training sets with >1M samples.

## Run inference

In [4]:
h.config["data_set"]["test_size"] = 1.0
h.config["data_set"]["train_size"] = 0.0
h.config["data_set"]["validate_size"] = 0.0
h.config["data_loader"]["batch_size"] = 128

output = h.infer()

[2025-10-28 15:46:38,314 hyrax.verbs.infer:INFO] [1m[30m[42mInference model:[0m HyraxAutoencoderV2
[2025-10-28 15:46:38,314 hyrax.verbs.infer:INFO] [1m[30m[42mInference dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HyraxRandomDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, label, meta_field_1, meta_field_2, object_id
, 'infer': Name: data (primary dataset)
  Dataset class: HyraxRandomDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, label, meta_field_1, meta_field_2, object_id
}
2025-10-28 15:46:38,315 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': None, 'batch_size': 128, 'shuffle': False, 'collate_fn': None, 'pin_memory': False}
[2025-10-28 15:46:38,326 hyrax.verbs.infer:INFO] Saving inference results at: /Users/drew/code/hyrax/docs/pre_executed/results/20251028-154638-infer-wXcW


100%|##########| 1/1 [00:00<?, ?it/s]

[2025-10-28 15:46:38,497 hyrax.pytorch_ignite:INFO] Total evaluation time: 0.04[s]
[2025-10-28 15:46:38,580 hyrax.verbs.infer:INFO] Inference Complete.


Once a model has been trained, we can use the model weights file to run inference.
By default running `infer` will look for the latest available model weights file.
A specific model weights file can be specified with `h.config['infer']['model_weights_file'] = <path_to_model_weights_file>`.

Here we'll make use of the last trained model weights file, and update the data set splits so that 100% of the data will be used for inference.

With the configuration updated, we can run inference by calling `h.infer()`.

The results of running inference are saved in the output directory noted in the last log line.
The default output format is batched .npy files.
Additionally a ChromaDB vector database will be populated with the inference results to enable efficient similarity search.