Test updates: stabler debup hash and cli integration coverage#462
Test updates: stabler debup hash and cli integration coverage#462
Conversation
| test_file = tmp_path / "test_file.txt" | ||
| test_file.write_text("test file") | ||
| aws_handler.upload_file(str(test_file)) |
There was a problem hiding this comment.
a bit out of scope but a simple change, switched tests to write via tmp_path so they leave no trace like test_file.txt in the repo root
| pack(recipe=recipe_data, config_path=str(config_path)) | ||
|
|
||
|
|
||
| def test_pack_with_default_config(tmp_path, monkeypatch): |
There was a problem hiding this comment.
similar to tmp_path in aws tests, we don't want test files written inside the repo, so we change directory to tmp_path with monkeypatch.chdir()
There was a problem hiding this comment.
Pull request overview
This PR strengthens recipe deduplication by normalizing recipe JSON prior to hashing (so semantically equivalent recipes hash the same), and adds/updates tests to cover both the hashing behavior and the CLI workflow.
Changes:
- Add
DataDoc._normalize_for_hashing()and updategenerate_hash()to canonicalize order-insensitive lists while preserving positional numeric lists. - Add integration tests for
cellpack/bin/pack.pycovering recipe path input, recipe dict input, and default config behavior. - Refactor AWS handler tests to write temporary files under
tmp_pathinstead of the working directory.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cellpack/autopack/DBRecipeHandler.py |
Implements normalization logic to produce stable dedup hashes across semantically equivalent JSON. |
cellpack/tests/test_data_doc.py |
Adds test cases that assert hash stability across dict key order and order-insensitive list permutations. |
cellpack/tests/test_pack_cli.py |
Introduces CLI integration tests covering recipe-as-path, recipe-as-dict, and default config flows. |
cellpack/tests/test_aws_handler.py |
Updates tests to use tmp_path for safer cross-platform temp file handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_pack_with_recipe_path(tmp_path): | ||
| recipe_path = tmp_path / "recipe.json" | ||
| recipe_path.write_text(json.dumps(recipe_data)) | ||
| config_path = _write_config(tmp_path) | ||
| pack(recipe=str(recipe_path), config_path=str(config_path)) |
There was a problem hiding this comment.
recipe_data is a module-level dict that gets passed into pack() (directly or via json.dumps). RecipeLoader mutates the input dict when recipe is a dict (it reuses nested references and later replaces objects[*]["representations"]/partners with class instances), so test order can become flaky (e.g., json.dumps(recipe_data) can fail after another test runs). Use a per-test fresh recipe dict (e.g., a pytest fixture returning a new dict or copy.deepcopy(recipe_data) at call sites) before writing/packing.


Problem
follow-up to #419, closes #460
Solution
_normalize_for_hashingto sort lists whose element order is not meaningful (e.g."interior": ["A", "B", "C"],"interior": [ {"object": "green_sphere", "count": 5}, "outer_sphere"]), while leaving order-sensitive lists(vectors, colors, etc.) unchanged.test_pack_cli.pytmp_pathinstead of the working dirOne note: this change can alter hash values for recipes that have lists, so re-run the same recipe after this merge may hash differently. Shouldn't cause issues, but worth noting so we can keep an eye on database tidiness.
Type of change