Skip to content

Bringing Unsupervised Models Back in Line With Main#647

Merged
aritraghsh09 merged 6 commits intomainfrom
unsupervised-model-updates
Jan 29, 2026
Merged

Bringing Unsupervised Models Back in Line With Main#647
aritraghsh09 merged 6 commits intomainfrom
unsupervised-model-updates

Conversation

@aritraghsh09
Copy link
Copy Markdown
Collaborator

A collection of different changes to get our unsupervised models in line with

  • HyraxQL Changes
  • Config. name change conventions

@aritraghsh09 aritraghsh09 self-assigned this Jan 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.86%. Comparing base (a0eb385) to head (9e81d64).
⚠️ Report is 225 commits behind head on main.

Files with missing lines Patch % Lines
src/hyrax/models/hyrax_autoencoderv2.py 25.00% 3 Missing ⚠️
src/hyrax/models/image_dcae.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
- Coverage   63.90%   63.86%   -0.05%     
==========================================
  Files          55       55              
  Lines        5550     5552       +2     
==========================================
- Hits         3547     3546       -1     
- Misses       2003     2006       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the ImageDCAE and HyraxAutoencoderV2 models with recent HyraxQL changes and config naming conventions. The changes standardize configuration access patterns to match the conventions used in other models like HyraxAutoencoder and SimCLR.

Changes:

  • Converted ImageDCAE config access from .get() with defaults to nested dictionary structure under [model.ImageDCAE]
  • Added corresponding [model.ImageDCAE] configuration section in default config file
  • Updated HyraxAutoencoderV2's to_tensor method to extract data from nested structure
  • Removed unused Tensor import and improved type hints in HyraxAutoencoderV2

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/hyrax/models/image_dcae.py Updated config access to use model-specific nested section [model.ImageDCAE], removing fallback defaults
src/hyrax/models/hyrax_autoencoderv2.py Modified to_tensor to handle nested data dict structure, removed unused import, improved type hints
src/hyrax/hyrax_default_config.toml Added [model.ImageDCAE] configuration section with default values

data_dict : dict
The dictionary returned from our data source
"""
data_dict = data_dict["data"]
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct access data_dict = data_dict["data"] will raise a KeyError if the "data" key is missing, which is less user-friendly than the RuntimeError raised by other models. This is inconsistent with the pattern used in HyraxAutoencoder (which uses data_dict.get("data", {}) for safe access) and HyraxCNN (which checks for the key and raises a descriptive RuntimeError).

Consider using one of these safer patterns:

  1. Use .get() for safe access: data = data_dict.get("data", {}) (like HyraxAutoencoder)
  2. Check for key existence and raise a descriptive error: if "data" not in data_dict: raise RuntimeError("Unable to find 'data' key in data_dict") (like HyraxCNN)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Implement safer pattern 2 you suggested

Comment on lines +47 to +48
self.latent_dim = config["model"]["ImageDCAE"]["latent_dim"]
self.base_channel_size = config["model"]["ImageDCAE"]["base_channel_size"]
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from .get() with default values to direct dictionary access removes the fallback behavior. While this aligns with the convention used in other models (HyraxAutoencoder, HyraxAutoencoderV2, SimCLR), it's a breaking change for users who have existing configurations that don't include the [model.ImageDCAE] section.

Consider documenting this breaking change in the PR description or release notes, or adding a migration note for users upgrading from previous versions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will ignore this to have consistency across all the models.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 28, 2026

Before [a0eb385] After [76a599d] Ratio Benchmark (Parameter)
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_cifar
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.track_cache_cifar_hyrax_size_undercount
35.5±0.2ms 36.3±0.4ms 1.02 benchmarks.time_nb_obj_construct
66.3±2ms 67.0±0.7ms 1.01 benchmarks.time_import
1000M 1.01G 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(2048, 'chromadb')
9.93±0.01s 9.99±0.1s 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(16384, 'chromadb')
535±10ms 540±6ms 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(256, 'chromadb')
470±5ms 476±0.3ms 1.01 vector_db_benchmarks.VectorDBSearchBenchmarks.time_search_by_vector_many_shards(64, 'qdrant')
1.86±0.08s 1.86±0.02s 1.00 benchmarks.time_lookup_help
1.88±0.05s 1.88±0.05s 1.00 benchmarks.time_rebuild_manifest_help

Click here to view all benchmarks.

Copy link
Copy Markdown
Contributor

@gitosaurus gitosaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but I don't know much!

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 29, 2026

@aritraghsh09 I've opened a new pull request, #650, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits January 29, 2026 18:35
Co-authored-by: aritraghsh09 <6251857+aritraghsh09@users.noreply.github.com>
Implement safer pattern for data key access in HyraxAutoencoderV2
@aritraghsh09 aritraghsh09 merged commit 1a2b1f7 into main Jan 29, 2026
6 of 9 checks passed
@aritraghsh09 aritraghsh09 deleted the unsupervised-model-updates branch January 29, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants