# Hyrax Getting Started

In this getting started notebook we'll create an instance of a Hyrax object, train a builtin model on the CiFAR training dataset, and then use that trained model to run inference on the CiFAR testing dataset.

## Create a Hyrax instance

In [1]:
import hyrax

h = hyrax.Hyrax()

## Update the configuration

In [2]:
h.config["model"]["name"] = "HyraxAutoencoder"

data_request = {
    "train": {
        "data": {
            "dataset_class": "HyraxCifarDataset",
            "data_location": "./data",
            "fields": ["image"],
            "primary_id_field": "object_id",
        },
    },
}
h.set_config("data_request", data_request)

  import pynvml  # type: ignore[import]












For this demo, we'll make a few adjustments to the default configuration settings that the `hyrax` object was instantiated with.
By accessing the `.config` attribute of the hyrax instance, we can modify any configuration value.
There are many configuration values that can be set, but here, we update only the model to train.

## Train a model

In [3]:
h.train()

[2026-02-20 12:23:27,086 hyrax.models.model_registry:INFO] Setting model's self.optimizer from config: torch.optim.SGD with arguments: {'lr': 0.01, 'momentum': 0.9}.


[2026-02-20 12:23:27,087 hyrax.models.model_registry:INFO] Setting model's self.criterion from config: torch.nn.CrossEntropyLoss with default arguments.


[2026-02-20 12:23:27,087 hyrax.models.model_registry:INFO] Setting model's self.scheduler from config: torch.optim.lr_scheduler.ExponentialLR
with arguments: {'gamma': 1}.


[2026-02-20 12:23:27,088 hyrax.verbs.train:INFO] [1m[30m[42mTraining model:[0m HyraxAutoencoder


[2026-02-20 12:23:27,088 hyrax.verbs.train:INFO] [1m[30m[42mTraining dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image
}


2026-02-20 12:23:27,108 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': <hyrax.pytorch_ignite.SubsetSequentialSampler object at 0x137b9d7f0>, 'batch_size': 512, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image
>, 'pin_memory': False}


2026-02-20 12:23:27,109 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': <hyrax.pytorch_ignite.SubsetSequentialSampler object at 0x137b4a710>, 'batch_size': 512, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image
>, 'pin_memory': False}


  return FileStore(store_uri, store_uri)
2026/02/20 12:23:27 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.


2026/02/20 12:23:27 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.


  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

[2026-02-20 12:24:35,961 hyrax.pytorch_ignite:INFO] Total training time: 68.73[s]


2026/02/20 12:24:35 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...


2026/02/20 12:24:35 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!


[2026-02-20 12:24:35,975 hyrax.verbs.train:INFO] Finished Training


HyraxAutoencoder(
  (encoder): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): GELU(approximate='none')
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): GELU(approximate='none')
    (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (5): GELU(approximate='none')
    (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): GELU(approximate='none')
    (8): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (9): GELU(approximate='none')
    (10): Flatten(start_dim=1, end_dim=-1)
    (11): Linear(in_features=1024, out_features=64, bias=True)
  )
  (dec_linear): Sequential(
    (0): Linear(in_features=64, out_features=1024, bias=True)
    (1): GELU(approximate='none')
  )
  (decoder): Sequential(
    (0): ConvTranspose2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (1): GELU(approximate='none')
    (2): 

The output of the training will be stored in a time-stamped directory under the `./results/`.
By default, a copy of the final configuration used in training is persisted as `runtime_config.toml`.
To train again with the same configuration, you can reference this runtime_config.toml file.

If running in another notebook, instantiate a hyrax object like so:
```
new_hyrax_instance = hyrax.Hyrax(config_file='./results/<timestamped_directory>/runtime_config.toml')
```

Or from the command line:
```
>> hyrax train --runtime-config ./results/<timestamped_directory>/runtime_config.toml
```

Note here we're training on only a small handful of CiFAR data, but Hyrax has demonstrated that it can scale up to training sets with >1M samples.

## Run inference

In [4]:
data_request["infer"] = {
    "data": {
        "dataset_class": "HyraxCifarDataset",
        "data_location": "./data",
        "fields": ["image", "object_id"],
        "primary_id_field": "object_id",
        "dataset_config": {
            "use_training_data": False,
        },
    },
}
h.config["data_request"] = data_request
h.config["data_loader"]["batch_size"] = 128

h.infer()

[2026-02-20 12:24:41,388 hyrax.models.model_registry:INFO] Setting model's self.optimizer from config: torch.optim.SGD with arguments: {'lr': 0.01, 'momentum': 0.9}.


[2026-02-20 12:24:41,389 hyrax.models.model_registry:INFO] Setting model's self.criterion from config: torch.nn.CrossEntropyLoss with default arguments.


[2026-02-20 12:24:41,389 hyrax.models.model_registry:INFO] Setting model's self.scheduler from config: torch.optim.lr_scheduler.ExponentialLR
with arguments: {'gamma': 1}.


[2026-02-20 12:24:41,390 hyrax.verbs.infer:INFO] [1m[30m[42mInference model:[0m HyraxAutoencoder


[2026-02-20 12:24:41,390 hyrax.verbs.infer:INFO] [1m[30m[42mInference dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image
, 'infer': Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, object_id
  Dataset config:
    use_training_data: False
}


2026-02-20 12:24:41,390 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': None, 'batch_size': 128, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HyraxCifarDataset
  Data location: ./data
  Primary ID field: object_id
  Requested fields: image, object_id
  Dataset config:
    use_training_data: False
>, 'pin_memory': False}


[2026-02-20 12:24:41,405 hyrax.models.model_utils:INFO] Updated config['infer']['model_weights_file'] to: /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122323-train-LV_w/example_model.pth


[2026-02-20 12:24:41,406 hyrax.verbs.infer:INFO] Saving inference results at: /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122435-infer-efEs


[90m[[0m2026-02-20T20:24:41Z [33mWARN [0m lance::dataset::write::insert[90m][0m No existing dataset at /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122435-infer-efEs/lance_db/results.lance, it will be created


  1%|1         | 1/79 [00:00<?, ?it/s]

[2026-02-20 12:24:44,628 hyrax.pytorch_ignite:INFO] Total evaluation time: 3.21[s]


[2026-02-20 12:24:44,629 hyrax.data_sets.result_dataset:INFO] Optimizing Lance table after 79 batches


[2026-02-20 12:24:44,646 hyrax.data_sets.result_dataset:INFO] Lance table optimization complete


[2026-02-20 12:24:44,647 hyrax.verbs.infer:INFO] Inference Complete.


<hyrax.data_sets.result_dataset.ResultDataset at 0x137a97620>

Once a model has been trained, we can use the model weights file to run inference.
By default running `infer` will look for the latest available model weights file.
A specific model weights file can be specified with `h.config['infer']['model_weights_file'] = <path_to_model_weights_file>`.

Here we add an `infer` section to the data request, specifying the CIFAR test data for inference.
We use `use_training_data: False` to ensure we evaluate on the test split.

The results of running inference are saved in the output directory noted in the last log line.