Change Q-cut workflow to return data per-arc#71
Conversation
- Create TestBifrostQCutWorkflow class with two comprehensive tests - test_cut_along_q_norm_and_energy_transfer: Tests cutting along |Q| and energy transfer - test_cut_along_qx_direction_and_energy_transfer: Tests cutting along Qx direction - Both tests verify: * Correct output dimensions and coordinates * Proper unit conversions * Count preservation (no events lost during cutting operation) - Tests follow existing patterns in workflow_test.py - All tests pass successfully Original prompt: We need to add tests for BifrostQCutWorkflow. TL;DR is that we want to be able to run something like `workflow.compute(CutData[SampleRun])` and verify the result (as far as feasible). @tests/bifrost/workflow_test.py demonstrates how similar workflows are created, thw Q-cut workflow is an extension. Follow-up: Please group the test in a class TestBifrostQCutworflow. Use better test names. Try if you can add an assertion for the sum over the result, to see if no counts were lost. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Modified BifrostQCutWorkflow and the cut method to preserve the arc dimension (renamed from triplet) in the output. The output is now 3-D with dimensions (arc, axis_1, axis_2) instead of 2-D. Changes: - Added ArcEnergy type (NewType alias for sc.Variable) in types.py - Added arc_energy() provider with values [2.7, 3.2, 3.8, 4.4, 5.0] meV - Modified cut() function to: - Rename triplet dimension to arc - Concat only over non-arc dimensions - Add arc coordinate with proper energy values - Updated existing tests to expect 3-D output with arc dimension - Added new test test_cut_preserves_arc_dimension to verify arc preservation All tests passing. Implemented using TDD (red-green cycle). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --- Original prompt: We need to modify BifrostQCutWorkflow and in particular the cut method in @src/ess/bifrost/live.py --- the new requirement is to preserve the arc dimension (which should have length 5). I think the input EnergyData already has this dim, so all we might need to do is specify which dims to concat() (instead of all dims). The new output should be 3-D, with an arc dimension. There should be a coord with values [2.7, 3.2, 3.8, 4.4, 5.0] and unit=meV. Add a provider in the workflow instead of hardcoding (unless there is one already). Use TDD red/green. Follow-up: So the old code seems to use triplet instead of arc. We cannot change this right now, but our modified cut function should rename triplet->arc. The arc_number function is weird, not sure why it is needed (tracking down the original author!). For now, let us just create our own ArcEnergy type (NewType alias for sc.Variable) and set & use that.
| ``data`` projected and histogrammed along the cut axes. | ||
| ``data`` projected and histogrammed along the cut axes, with arc dimension. | ||
| """ | ||
| # Rename triplet dimension to arc |
There was a problem hiding this comment.
Greg and Rasmus refer to "arcs". I did not want to change the rest of the package, but we should consider doing so separately from this PR.
| ``data`` projected and histogrammed along the cut axes, with arc dimension. | ||
| """ | ||
| # Rename triplet dimension to arc | ||
| data = data.rename_dims(triplet='arc') |
There was a problem hiding this comment.
I don't think this is correct. In the workflow, a 'triplet' is a bank. So there are 45 of them, not 5. How come the tests pass?
There was a problem hiding this comment.
I inspected this before implementing and it had length 5. I think there is just a confusion in terminology: Yes, a bank contains a triplet of tubes, and the "triplet" refers to 3 tubes within a channel. So there are 9*5 triplets, but the triplet dim has length 5 (9 is the channel)?
There was a problem hiding this comment.
This is a coincidence. The tests only use the first 5 banks:
@pytest.fixture(scope='module')
def simulation_detector_names() -> list[NeXusDetectorName]:
with snx.File(simulated_elastic_incoherent_with_phonon()) as f:
names = list(f['entry']['instrument'][snx.NXdetector].keys())
return names[:5] # These should be enough to test the workflow.
You need to compute the arc number from the 'triplet' index (or from the position as in Greg's original code).
There was a problem hiding this comment.
If I use the triplet index, this relies on the (predictable???) order of groups in the NeXus file?
There was a problem hiding this comment.
Or worse, it relies on the order of what the user passes into the workflow at creation?
There was a problem hiding this comment.
Maybe I can simply use the detector_number? The arc should be detector_number // (3*900), right?
But I still don't like the unpredictable shape/ordering. Is this really necessary, or could we refactor all the Bifrost workflows to use something predictable like (arc, channel, tube, pixel), with the option to have fewer than the full arc and channel count loaded (but no free triplet selection, breaking from what is possible now)?
There was a problem hiding this comment.
How does the workflow know how to order and shape the array? It has to ultimately rely on one of the mechanisms you described.
There was a problem hiding this comment.
Well, instead of accepting a list of detector names the package could define this. We thus have control over the order, and know how to concat the banks into the (arc, channel) subspace.
There was a problem hiding this comment.
That would require hard-coding the names of the detector groups. That seems too fragile given that they have already changed.
I think the only robust implementation is one that loads the detector positions or analyzer positions or d-spacings and groups by those. Similarly to how Greg's code works. But maybe we can do this without computing final energies; grouping in L2 or analyzer d-spacing should be enough.
| The final energies for each of the 5 BIFROST analyzer arcs. | ||
| """ | ||
| return ArcEnergy( | ||
| sc.array(dims=['arc'], values=[2.7, 3.2, 3.8, 4.4, 5.0], unit='meV') |
There was a problem hiding this comment.
Instead of relying on low precision, herd coded numbers, can we use the computed final energies?
There was a problem hiding this comment.
I don't know. Is this available here?
There was a problem hiding this comment.
Note that the purpose of this is as a label for plots and UI.
There was a problem hiding this comment.
I don't think coords are the right place for storing pure plotting labels because they participate in computations.
I would have expected to store the arc index as a coord and then build a plot based on that and a table of energies if they are too difficult to compute. You could just add 'final_energy' to
There was a problem hiding this comment.
But aren't all the energies fixed? If there is calibration, wouldn't it differ for every channel (whereas here we want an energy coord per-arc)?
|
@jl-wynen Updated as discussed, does this make sense? |
| # to determine arc and channel indices | ||
| detector_numbers = [] | ||
| for triplet in triplets: | ||
| if 'detector_number' not in triplet.coords: |
There was a problem hiding this comment.
When does this happen? I thought we always have a detector number.
There was a problem hiding this comment.
Before looking at it in detail, can't you assign an arc number in the same place where we fold the detector into (tube, length)? Then you wouldn't have to loop through detectors multiple times here.
There was a problem hiding this comment.
Good point, I'll try!
…ated_detector_bifrost - Add scalar 'arc' and 'channel' coordinates in get_calibrated_detector_bifrost using sc.index() to avoid unnecessary units - Simplify merge_triplets to read pre-computed arc/channel coordinates instead of recalculating them from detector_number - Remove ~50 lines of redundant arc/channel calculation logic - Update test to allow for new scalar coordinates in comparison Original prompt: Can we simplify `merge_triplets` by assigning a scalar arc and channel coord in get_calibrated_detector_bifrost? Follow-up: Use sc.index instead of sc.scalar to avoid units, and don't specify dtype since the data is small. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
| arcs = [pair[0] for pair in sorted_pairs] | ||
| channels = [pair[1] for pair in sorted_pairs] |
There was a problem hiding this comment.
You can directly construct sets here to avoid the intermediate lists.
| if len(sorted_triplets) == expected_count: | ||
| # Verify that all (arc, channel) combinations are present | ||
| expected_pairs = [ | ||
| (arc, channel) for arc in unique_arcs for channel in unique_channels | ||
| ] | ||
| if sorted_pairs == expected_pairs: |
There was a problem hiding this comment.
| if len(sorted_triplets) == expected_count: | |
| # Verify that all (arc, channel) combinations are present | |
| expected_pairs = [ | |
| (arc, channel) for arc in unique_arcs for channel in unique_channels | |
| ] | |
| if sorted_pairs == expected_pairs: | |
| expected_pairs = [ | |
| (arc, channel) for arc in unique_arcs for channel in unique_channels | |
| ] | |
| if sorted_pairs == expected_pairs: |
| return folded.squeeze() | ||
|
|
||
| # Fall back to simple concatenation if not a regular grid | ||
| return sc.concat(triplets, dim="triplet") |
There was a problem hiding this comment.
Can you also use sorted_triplets here to be more consistent? You can just construct concatenated before the if and return it here.
| sizes={'arc': len(unique_arcs), 'channel': len(unique_channels)}, | ||
| ) | ||
| # Remove size-1 dimensions (e.g., if only one channel is present) | ||
| return folded.squeeze() |
There was a problem hiding this comment.
This doesn't match the docstring which claims that both dims as always present. And I think that is what we should return here so that downstream code doesn't have to support every possible combination of dims but only (arc, channel, ...) and (triplet, ...).
|
|
||
| # Add arc coordinate if we're working with arc dimension | ||
| # Note that this is for identifying the arc, it should NOT be used | ||
| # as a replacement for a precise final_energy coordinate! |
There was a problem hiding this comment.
There is no way people will see this comment when using the code. But I also don't know an alternative given that coords encode both compute and display data.
| # After slicing, rename arc -> triplet for comparison | ||
| expected = expected.rename_dims(triplet='arc') | ||
| elif 'arc' in energy_data.dims: | ||
| expected = expected.rename_dims(triplet='arc') |
There was a problem hiding this comment.
Can you just make a new reference instead of writing a complicated test?
| sc.testing.assert_allclose(energy_data.bins.data, expected.bins.data) | ||
|
|
||
|
|
||
| class TestBifrostQCutWorkflow: |
There was a problem hiding this comment.
Please move to a separate file to match the file structure of the package and make it easier to find tests.
|
|
||
| # Compute both cut data and energy data to compare total counts | ||
| cut_data = qcut_workflow.compute(CutData[SampleRun]) | ||
| energy_data = qcut_workflow.compute(EnergyData[SampleRun]) |
There was a problem hiding this comment.
This is not how we normally test workflow. We just check the final result. If you compare two steps like this, you effectively test a single provider. So it shouldn't be a workflow test but a provider test.
| total_counts_after = sc.sum(cut_data).value | ||
| assert total_counts_before == total_counts_after | ||
|
|
||
| def test_cut_preserves_arc_dimension(self, qcut_workflow: sciline.Pipeline) -> None: |
There was a problem hiding this comment.
Redundant? The other tests already check the arc dim.
Avoid intermediate lists when extracting unique arcs and channels by using set comprehension directly. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Remove redundant expected_count check and just compare the sorted pairs directly with expected pairs. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Construct concatenated before the if statement and return it in both the regular grid and fallback cases to ensure consistent ordering. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Remove squeeze() to ensure the output always has either (arc, channel) dimensions or (triplet) dimension, never a mix like just (arc) or just (channel). This ensures downstream code only needs to handle two cases. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Instead of complex logic to handle dimension transitions, create a new reference file that matches the current output format with arc and channel dimensions (5x2 grid). The reference file needs to be uploaded to: https://public.esss.dk/groups/scipp/ess/bifrost/3/computed_energy_data_simulated_5x2.h5 User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Move TestBifrostQCutWorkflow from workflow_test.py to live_test.py to match the package structure (src/ess/bifrost/live.py). User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Create cutting_test.py for provider tests that verify the cut function behavior (e.g., count preservation). Keep only workflow output verification in live_test.py. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
Merge arc coordinate value checks into the first test to avoid redundancy. The arc dimension presence and values are already verified in test_cut_along_q_norm_and_energy_transfer. User request: Can you have a look a the comments in #71 (review) (only the latest review from today). Please address each comment, one per commit.
As requested by scientists. See also scipp/esslivedata#503.