# Hyrax Visualization

For this demonstration we will train a model on an example dataset and then visualize the results.

In [1]:
import pooch
import hyrax

# Train the model

First we download the sample dataset, configure its format and run training.

In [2]:
file_path = pooch.retrieve(
    # DOI for Example HSC dataset
    url="doi:10.5281/zenodo.14498536/hsc_demo_data.zip",
    known_hash="md5:1be05a6b49505054de441a7262a09671",
    fname="example_hsc_new.zip",
    path="../../data",
    processor=pooch.Unzip(extract_dir="."),
)

h = hyrax.Hyrax()
data_dir = "../../data/hsc_8asec_1000"
data_request_definition = {
    "train": {
        "data": {
            "dataset_class": "HSCDataSet",
            "data_location": data_dir,
            "primary_id_field": "object_id",
        },
    },
    "infer": {
        "data": {
            "dataset_class": "HSCDataSet",
            "data_location": data_dir,
            "primary_id_field": "object_id",
        },
    },
}
h.set_config("model.name", "HyraxAutoencoder")
h.set_config("data_request", data_request_definition)
h.config["data_loader"]["batch_size"] = 16
h.config["train"]["epochs"] = 10
h.train()

  import pynvml  # type: ignore[import]




















[2026-02-20 12:25:12,856 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:12,858 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:12,873 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning


[2026-02-20 12:25:13,063 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:13,065 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:13,075 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning


[2026-02-20 12:25:13,087 hyrax.models.model_registry:INFO] Setting model's self.optimizer from config: torch.optim.SGD with arguments: {'lr': 0.01, 'momentum': 0.9}.


[2026-02-20 12:25:13,088 hyrax.models.model_registry:INFO] Setting model's self.criterion from config: torch.nn.CrossEntropyLoss with default arguments.


[2026-02-20 12:25:13,088 hyrax.models.model_registry:INFO] Setting model's self.scheduler from config: torch.optim.lr_scheduler.ExponentialLR
with arguments: {'gamma': 1}.


[2026-02-20 12:25:13,088 hyrax.verbs.train:INFO] [1m[30m[42mTraining model:[0m HyraxAutoencoder


[2026-02-20 12:25:13,088 hyrax.verbs.train:INFO] [1m[30m[42mTraining dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
, 'infer': Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
}


2026-02-20 12:25:13,105 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': <hyrax.pytorch_ignite.SubsetSequentialSampler object at 0x1336330e0>, 'batch_size': 16, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
>, 'pin_memory': False}


2026-02-20 12:25:13,105 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': <hyrax.pytorch_ignite.SubsetSequentialSampler object at 0x133828690>, 'batch_size': 16, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
>, 'pin_memory': False}


  return FileStore(store_uri, store_uri)
2026/02/20 12:25:13 INFO mlflow.system_metrics.system_metrics_monitor: Skip logging GPU metrics. Set logger level to DEBUG for more details.


2026/02/20 12:25:13 INFO mlflow.system_metrics.system_metrics_monitor: Started monitoring system metrics.


  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

  3%|2         | 1/38 [00:00<?, ?it/s]

[2026-02-20 12:25:25,862 hyrax.pytorch_ignite:INFO] Total training time: 12.64[s]


2026/02/20 12:25:25 INFO mlflow.system_metrics.system_metrics_monitor: Stopping system metrics monitoring...


2026/02/20 12:25:25 INFO mlflow.system_metrics.system_metrics_monitor: Successfully terminated system metrics monitoring!


[2026-02-20 12:25:25,885 hyrax.verbs.train:INFO] Finished Training


HyraxAutoencoder(
  (encoder): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): GELU(approximate='none')
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): GELU(approximate='none')
    (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (5): GELU(approximate='none')
    (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): GELU(approximate='none')
    (8): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (9): GELU(approximate='none')
    (10): Flatten(start_dim=1, end_dim=-1)
    (11): Linear(in_features=9216, out_features=64, bias=True)
  )
  (dec_linear): Sequential(
    (0): Linear(in_features=64, out_features=9216, bias=True)
    (1): GELU(approximate='none')
  )
  (decoder): Sequential(
    (0): ConvTranspose2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (1): GELU(approximate='none')
    (2): 

# Inference

We then run inference and umap the resulting latent space.

In [3]:
h.infer()

[2026-02-20 12:25:25,930 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:25,931 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:25,942 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning


[2026-02-20 12:25:25,978 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:25,979 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:25,989 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning


[2026-02-20 12:25:26,025 hyrax.models.model_registry:INFO] Setting model's self.optimizer from config: torch.optim.SGD with arguments: {'lr': 0.01, 'momentum': 0.9}.


[2026-02-20 12:25:26,025 hyrax.models.model_registry:INFO] Setting model's self.criterion from config: torch.nn.CrossEntropyLoss with default arguments.


[2026-02-20 12:25:26,026 hyrax.models.model_registry:INFO] Setting model's self.scheduler from config: torch.optim.lr_scheduler.ExponentialLR
with arguments: {'gamma': 1}.


[2026-02-20 12:25:26,026 hyrax.verbs.infer:INFO] [1m[30m[42mInference model:[0m HyraxAutoencoder


[2026-02-20 12:25:26,026 hyrax.verbs.infer:INFO] [1m[30m[42mInference dataset(s):[0m
{'train': Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
, 'infer': Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
}


2026-02-20 12:25:26,026 ignite.distributed.auto.auto_dataloader INFO: Use data loader kwargs for dataset 'Name: data (primary': 
	{'sampler': None, 'batch_size': 16, 'shuffle': False, 'collate_fn': <bound method DataProvider.collate of Name: data (primary dataset)
  Dataset class: HSCDataSet
  Data location: ../../data/hsc_8asec_1000
  Primary ID field: object_id
  Requested fields: dec, dim, filename, filter, image, mask, object_id, ra, rerun, sh, sw, tract, type, variance
>, 'pin_memory': False}


[2026-02-20 12:25:26,040 hyrax.models.model_utils:INFO] Updated config['infer']['model_weights_file'] to: /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122512-train-LGR7/example_model.pth


[2026-02-20 12:25:26,042 hyrax.verbs.infer:INFO] Saving inference results at: /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122525-infer-uQYi


[90m[[0m2026-02-20T20:25:26Z [33mWARN [0m lance::dataset::write::insert[90m][0m No existing dataset at /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122525-infer-uQYi/lance_db/results.lance, it will be created


  2%|1         | 1/63 [00:00<?, ?it/s]

[2026-02-20 12:25:29,633 hyrax.pytorch_ignite:INFO] Total evaluation time: 3.58[s]


[2026-02-20 12:25:29,634 hyrax.data_sets.result_dataset:INFO] Optimizing Lance table after 63 batches


[2026-02-20 12:25:29,647 hyrax.data_sets.result_dataset:INFO] Lance table optimization complete


[2026-02-20 12:25:29,647 hyrax.verbs.infer:INFO] Inference Complete.


<hyrax.data_sets.result_dataset.ResultDataset at 0x133633380>

In [4]:
h.umap()

[2026-02-20 12:25:29,674 hyrax.verbs.umap:INFO] Saving UMAP results to /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122529-umap-oMbC


[2026-02-20 12:25:29,691 hyrax.verbs.umap:INFO] Fitting the UMAP


[2026-02-20 12:25:34,030 hyrax.verbs.umap:INFO] Saving fitted UMAP Reducer


Creating lower dimensional representation using UMAP::   0%|          | 0/63 [00:00<?, ?it/s]

[90m[[0m2026-02-20T20:25:35Z [33mWARN [0m lance::dataset::write::insert[90m][0m No existing dataset at /Users/derekjones/code/work/hyrax/docs/pre_executed/results/20260220-122529-umap-oMbC/lance_db/results.lance, it will be created


[2026-02-20 12:25:37,272 hyrax.data_sets.result_dataset:INFO] Optimizing Lance table after 63 batches


[2026-02-20 12:25:37,283 hyrax.data_sets.result_dataset:INFO] Lance table optimization complete


[2026-02-20 12:25:37,284 hyrax.verbs.umap:INFO] Finished transforming all data through UMAP


<hyrax.data_sets.result_dataset.ResultDataset at 0x1412fed50>

# Visualize

Run the visualize command to see the umapped version of the latent space. The lasso, box select, and tap tools in the bokeh interface below will populate the table view once the visualization has rendered.

**NOTE** that the fields must be suffixed with the name of the data provider (`"data"` in this case); this convention helps disambiguate multiple data providers.

In [5]:
h.config["visualize"]["fields"] = ["ra_data", "dec_data"]
h.visualize(width=800, height=800)

[2026-02-20 12:25:38,183 hyrax.verbs.visualize:INFO] UMAP directory not specified at runtime. Reading from config values.


[2026-02-20 12:25:38,419 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:38,420 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:38,431 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning


[2026-02-20 12:25:38,467 hyrax.data_sets.hsc_data_set:INFO] Checking file dimensions to determine standard cutout size...


[2026-02-20 12:25:38,469 hyrax.data_sets.fits_image_dataset:INFO] FitsImageDataSet has 993 objects


[2026-02-20 12:25:38,479 hyrax.data_sets.hsc_data_set:INFO] Processed 993 objects for pruning
