# Hyrax Getting Started

In this getting started notebook we'll create an instance of a Hyrax object, train a builtin model on the CiFAR training dataset, and then use that trained model to run inference on the CiFAR testing dataset.

## Create a Hyrax instance

In [1]:
import hyrax

h = hyrax.Hyrax()

[2025-03-26 14:46:50,459 hyrax:INFO] Runtime Config read from: /home/drew/code/fibad/src/hyrax/hyrax_default_config.toml


## Update the configuration

In [2]:
h.config["model"]["name"] = "HyraxAutoencoder"

For this demo, we'll make a few adjustments to the default configuration settings that the `hyrax` object was instantiated with.
By accessing the `.config` attribute of the hyrax instance, we can modify any configuration value.
There are many configuration values that can be set, but here, we update only the model to train.

## Train a model

In [3]:
h.train()

  from torch.distributed.optim import ZeroRedundancyOptimizer


Files already downloaded and verified


[2025-03-26 14:46:54,584 hyrax.models.model_registry:INFO] Using criterion: torch.nn.CrossEntropyLoss with default arguments.
2025-03-26 14:46:54,653 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset HyraxCifarDa': 
	{'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f2095f60920>, 'batch_size': 512, 'pin_memory': True}
2025-03-26 14:46:54,655 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset HyraxCifarDa': 
	{'sampler': <torch.utils.data.sampler.SubsetRandomSampler object at 0x7f2095f60980>, 'batch_size': 512, 'pin_memory': True}
  from tqdm.autonotebook import tqdm
2025/03/26 14:46:54 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.
[2025-03-26 14:46:54,902 hyrax.pytorch_ignite:INFO] Training model on device: cuda


  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

  2%|1         | 1/59 [00:00<?, ?it/s]

[2025-03-26 14:48:23,891 hyrax.pytorch_ignite:INFO] Total training time: 88.99[s]
[2025-03-26 14:48:23,891 hyrax.pytorch_ignite:INFO] Latest checkpoint saved as: /home/drew/code/fibad/docs/pre_executed/results/20250326-144653-train-OgeH/checkpoint_epoch_10.pt
[2025-03-26 14:48:23,892 hyrax.pytorch_ignite:INFO] Best metric checkpoint saved as: /home/drew/code/fibad/docs/pre_executed/results/20250326-144653-train-OgeH/checkpoint_10_loss=-127.4221.pt
2025/03/26 14:48:23 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...
2025/03/26 14:48:23 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!
[2025-03-26 14:48:23,903 hyrax.train:INFO] Finished Training
[2025-03-26 14:48:24,250 hyrax.model_exporters:INFO] Exported model to ONNX format: /home/drew/code/fibad/docs/pre_executed/results/20250326-144653-train-OgeH/example_model_opset_20.onnx


The output of the training will be stored in a time-stamped directory under the `./results/`.
By default, a copy of the final configuration used in training is persisted as `runtime_config.toml`.
To train again with the same configuration, you can reference this runtime_config.toml file.

If running in another notebook, instantiate a hyrax object like so:
```
new_hyrax_instance = hyrax.Hyrax(config_file='./results/<timestamped_directory>/runtime_config.toml')
```

Or from the command line:
```
>> hyrax train --runtime-config ./results/<timestamped_directory>/runtime_config.toml
```

Note here we're training on only a small handful of CiFAR data, but Hyrax has demonstrated that it can scale up to training sets with >1M samples.

## Run inference

In [4]:
h.config["data_set"]["test_size"] = 1.0
h.config["data_set"]["train_size"] = 0.0
h.config["data_set"]["validate_size"] = 0.0
h.config["data_loader"]["batch_size"] = 128

h.infer()

Files already downloaded and verified


[2025-03-26 14:48:25,186 hyrax.models.model_registry:INFO] Using criterion: torch.nn.CrossEntropyLoss with default arguments.
[2025-03-26 14:48:25,187 hyrax.infer:INFO] data set has length 50000
2025-03-26 14:48:25,188 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Dataset HyraxCifarDa': 
	{'sampler': None, 'batch_size': 128, 'pin_memory': True}
[2025-03-26 14:48:25,715 hyrax.pytorch_ignite:INFO] Evaluating model on device: cuda
[2025-03-26 14:48:25,717 hyrax.pytorch_ignite:INFO] Total epochs: 1
[2025-03-26 14:48:51,778 hyrax.pytorch_ignite:INFO] Total evaluation time: 26.06[s]
[2025-03-26 14:48:51,848 hyrax.infer:INFO] Inference results saved in: /home/drew/code/fibad/docs/pre_executed/results/20250326-144824-infer-qr2F


Once a model has been trained, we can use the model weights file to run inference.
By default running `infer` will look for the latest available model weights file.
A specific model weights file can be specified with `h.config['infer']['model_weights_file'] = <path_to_model_weights_file>`.

Here we'll make use of the last trained model weights file, and update the data set splits so that 100% of the data will be used for inference.

With the configuration updated, we can run inference by calling `h.infer()`.

The results of running inference are saved in the output directory noted in the last log line.
The default output format is batched .npy files.
Additionally a ChromaDB vector database will be populated with the inference results to enable efficient similarity search.