Support 2d sources and sources away from the origin by nvaytet · Pull Request #106 · scipp/tof

nvaytet · 2025-11-18T15:10:08Z

In this PR, we add support for sources that have a 2d probability distribution.
Previously, a source was defined by a 1d time distribution and a 1d wavelength distribution.
This meant that intrinsically the distribution was square in the (time, wavelength) parameter space.

We now generalize this to a 2d p distribution that can represent sources such as the one for Odin, taken some meters away from the moderator:

This last parameter of the source (2.5m away from the origin) meant that we also needed to modify slightly how the model works to handle sources that are not located at 0m.

A library of sources was created and the data is saved in https://github.com/scipp/tof-sources.

Note that the old 1D p_time and p_wav still work, a 2d broadcast of the profiles is performed internally.
We also removed a behind-the-scenes linear interpolation between data points given by the source profiles, which was getting expensive when working in 2d. We now require users to supply a profile with enough resolution to satisfy their needs (note changes in docs where some linspace was added instead of using only 3 data points in the profiles).

nvaytet · 2025-11-18T15:14:55Z

We keep the pulse_profile.py around for backward compatibility (the ess_pulse submoduleneeds it and this is needed in some places inessreduce`). But in principle, we don't need it anymore.

nvaytet · 2025-11-18T15:16:07Z

-        mask = ~one_mask(self.data.masks)
-        mask.unit = ""
-        return (self.data.coords[self.dim] * mask).min()
+        da = self.data.copy(deep=False)


Fixed a bug here: the min value was always returning 0 because something multiplied by a False would give 0 and that is the min. Instead, the masked values need to be ignored.

nvaytet · 2025-11-18T17:14:43Z

    if wmax is None:
-        wmax = p_wav.coords[w_dim].max()
-
-    time_interpolator = interp1d(p_time, dim=t_dim, fill_value="extrapolate")


Removed the internal linear interpolation and the somewhat unusual sampling arg.
We now require users to provide a distribution with enough data points.

SimonHeybrock · 2025-11-19T05:04:22Z

Reviewed with Claude:

Review of PR #106: Support 2D Sources and Sources Away from Origin

Reviewer: Claude
Date: 2025-11-19
Branch: 2d-sources
Test Results: 95/97 tests passing (2 failures unrelated - missing ipywidgets)

Summary

This PR adds important scientific capability: 2D probability distributions for neutron sources and support for sources positioned away from the origin. The implementation is well-designed with solid mathematics and good test coverage for the main use cases.

Key Changes:

2D distributions in (time, wavelength) space
Sources can be positioned at non-zero distances
Removes scipy dependency, adds pooch for remote data fetching
Removes automatic interpolation (breaking change, well-documented)

Overall Assessment: The core feature is sound and ready, but there are several edge cases that should be addressed before merging.

Issues to Address

High Priority

1. Division by Zero with Single-Point Distributions

Location: src/tof/source.py:99-100

dt = 0.5 * (tmax - tmin).value / (p.sizes[t_dim] - 1)
dw = 0.5 * (wmax - wmin).value / (p.sizes[w_dim] - 1)

If a distribution has only 1 point in a dimension, this divides by zero, producing NaN. The rejection sampling loop then runs indefinitely since rng.normal(scale=nan) produces invalid values.

Test case:

p_time = sc.DataArray(
    data=sc.array(dims=['birth_time'], values=[1.0]),
    coords={'birth_time': sc.array(dims=['birth_time'], values=[1000.0], unit='us')}
)
# Result: dt = nan, infinite loop

Suggested fix:

if p.sizes[t_dim] < 2 or p.sizes[w_dim] < 2:
    raise ValueError(
        f"Distribution must have at least 2 points in each dimension. "
        f"Got {p.sizes[t_dim]} in {t_dim}, {p.sizes[w_dim]} in {w_dim}"
    )

2. IndexError with Empty Components List

Location: src/tof/model.py:276

components = sorted(...)

if components[0].distance < self.source.distance:  # IndexError if empty
    raise ValueError(...)

Creating a model with no choppers or detectors causes an IndexError.

Test case:

model = tof.Model(source=source, choppers=[], detectors=[])
result = model.run()  # IndexError: list index out of range

Suggested fix:

if components and components[0].distance < self.source.distance:
    raise ValueError(...)

Or validate in __init__:

if not self.choppers and not self.detectors:
    raise ValueError("Model must have at least one chopper or detector")

3. Uninitialized Distance Attribute

Location: src/tof/source.py:__init__

The __init__ method only sets self._distance when facility is not None. If someone tries to create Source(facility=None, ...) directly (bypassing the class methods), accessing source.distance raises AttributeError.

Suggested fix:
Add at end of __init__:

if self._facility is None:
    # Should only be created via class methods
    if not hasattr(self, '_distance'):
        raise ValueError(
            "Cannot create Source with facility=None directly. "
            "Use Source.from_neutrons() or Source.from_distribution()"
        )

Medium Priority

4. Zero or Negative Wavelengths

Location: src/tof/source.py (in from_neutrons and via distributions)

No validation prevents wavelength ≤ 0, which produces infinite speed via the physics formula v = h/(m·λ). This doesn't crash but gives physically incorrect results.

Test case:

wavelengths = sc.array(dims=['event'], values=[0.0, 5.0], unit='angstrom')
source = Source.from_neutrons(birth_times=times, wavelengths=wavelengths)
# Result: speed = [inf, 791.207] m/s

Suggested fix: Add validation in both from_neutrons and after sampling in _make_pulses:

if sc.any(wavelengths <= sc.scalar(0.0, unit=wavelengths.unit)):
    raise ValueError("All wavelengths must be positive")

5. Zero Distribution Values

Location: src/tof/source.py:112

If a user provides an all-zero distribution (or negative values), the normalization p_flat /= p_flat.data.sum() fails with a cryptic numpy error.

Current error: ValueError: Probabilities contain NaN

Suggested fix: Add helpful error before normalization:

prob_sum = p_flat.data.sum()
if prob_sum <= 0:
    raise ValueError(
        "Distribution must have at least one positive probability value. "
        f"Sum of probabilities is {prob_sum.value}"
    )

6. Error Message Clarity

Location: src/tof/model.py:277-279

The error message says "At least one component is located before the source" but the check only looks at components[0] (the closest component). More specific wording would help:

if components and components[0].distance < self.source.distance:
    raise ValueError(
        f"The first component ({components[0].name} at {components[0].distance}) "
        f"is located before the source (at {self.source.distance})"
    )

Low Priority / Polish

7. Network Dependency

Location: src/tof/facilities/__init__.py:12-17

The new pooch-based data fetching introduces a network dependency:

No explicit timeout (could hang on slow connections)
No offline fallback
Could fail behind corporate firewalls

Suggestions:

Document the network requirement in the docstring
Consider adding a timeout parameter if pooch supports it
Provide clearer error messages when downloads fail

8. Better Error for Unknown Source

Location: src/tof/facilities/__init__.py:20-21

def get_source_path(name: str) -> str:
    return _source_registry.fetch(_source_library[name]["path"])  # KeyError

Suggested improvement:

def get_source_path(name: str) -> str:
    if name not in _source_library:
        available = ', '.join(_source_library.keys())
        raise ValueError(
            f"Unknown source '{name}'. Available sources: {available}"
        )
    return _source_registry.fetch(_source_library[name]["path"])

9. Fully Masked Data in Reading

Location: src/tof/reading.py:40-56

If all neutrons are blocked by choppers (fully masked data), the min() and max() methods may give undefined results. This affects the __repr__ output.

Suggested fix:

def min(self):
    if self.data.sum().value == 0:
        return sc.scalar(float('nan'), unit=self.data.coords[self.dim].unit)
    da = self.data.copy(deep=False)
    da.data = da.coords[self.dim]
    return da.min().data

Verified Correct ✓

I did deep mathematical verification on the core algorithms:

Distance calculations: Manually verified with source at 2.5m, detector at 10m. Time-of-arrival matches hand calculation to 0.01 µs precision.
2D distribution normalization: The broadcast operation p = (p_wav / sum) * (p_time / sum) correctly produces an outer product that sums to 1.0.
Flattening and coordinate handling: Verified that flattened 2D arrays maintain proper coordinate correspondence.
Sampling mathematics: The rejection sampling approach with added noise is sound. The algorithm expects uniform point spacing (as produced by linspace), which is the standard practice in all examples and facility sources.

The physics is correct, the math is solid, and the implementation is clean.

Code Quality Observations

Strengths:

Clean abstraction for handling both 1D and 2D distributions
Backward compatible (old API still works)
Well-documented breaking changes with migration examples
Good test coverage for main use cases

Suggestions:

Add validation earlier in the pipeline (distribution points, wavelength values)
More helpful error messages for edge cases

Test Coverage Gaps

Missing tests for edge cases:

Single-point distributions
Empty component lists
Zero/negative wavelengths
All-zero distributions
Fully masked data (all neutrons blocked)

Well covered:

2D distributions (parametrized test)
Source not at origin
Multiple pulses
Backward compatibility

Documentation

The documentation updates are excellent:

Clear warning boxes about the interpolation change
Updated examples with proper sampling
New 2D distribution example

Minor suggestion: Add explanation of the difference between facility="ess" and facility="ess-odin" in the docs so users know which to choose.

Recommendation

APPROVE after addressing high-priority issues

The 2D source feature is well-designed and the core implementation is mathematically sound. The main concerns are edge cases that could cause crashes or confusing errors. I'd recommend:

Before merge:

Fix the division-by-zero issue (Add CI setup #1)
Fix the empty components issue (Add docs skeleton #2)
Fix the uninitialized distance issue (Add type hints #3)
Add wavelength validation (Add api ref page to docs #4)
Improve zero-distribution error message (Handle counter-rotating choppers #5)

Can be follow-up:

Network timeout and error handling (Use Scipp as internal data structure with physical units #7)
Improved error messages (Use the Nexus standard specification to define the parameters of the Chopper class #6, Make detector plots as a function of wavelength also #8)
Fully masked data handling (Optionally draw rays that are blocked as well #9)

These are all straightforward fixes that will make the feature more robust without changing the core design.

Summary of Changes Needed

# In src/tof/source.py:_make_pulses, after line 78:
if p.sizes[t_dim] < 2 or p.sizes[w_dim] < 2:
    raise ValueError(...)

# In src/tof/source.py:_make_pulses, after line 111:
if p_flat.data.sum() <= 0:
    raise ValueError(...)

# In src/tof/source.py:from_neutrons and _make_pulses result:
if sc.any(wavelength <= 0):
    raise ValueError("All wavelengths must be positive")

# In src/tof/model.py:276:
if components and components[0].distance < self.source.distance:
    raise ValueError(...)

# In src/tof/source.py:__init__, at end:
if self._facility is None and self._data is None:
    raise ValueError(...)

All of these are quick validation additions that will prevent confusing errors.

SimonHeybrock · 2025-11-19T08:06:58Z

-        unit=WAV_UNIT,
-    ).fold(dim=dim, sizes={"pulse": pulses, dim: neutrons})
+    wavelengths = np.concatenate(wavs)
+    if np.any(wavelengths < 0):


Should it be <=?

nvaytet · 2025-11-19T09:19:18Z

Uninitialized Distance Attribute

I won't fix this one. The same goes for the frequency.
The default source constructor is designed to use parameters from a facility, where distance and frequency will always be defined.

nvaytet added 11 commits November 18, 2025 11:06

add ess source as 2d h5 data array

8eb2caa

fix tests

6276840

update sources notebook

6034d79

handle source not at the origin

02abff4

add 2d example in docs

730916c

use odin source in odin notebook

45fe2e2

use pooch to get files

33a70a3

remove ess source file

6903f2d

remove scipy dependency and add pooch

a97a82c

update deps

cc99701

formatting

99bcb0a

nvaytet requested a review from SimonHeybrock November 18, 2025 15:10

nvaytet commented Nov 18, 2025

View reviewed changes

fix min max bug in repr also

709bebe

nvaytet commented Nov 18, 2025

View reviewed changes

nvaytet added 2 commits November 18, 2025 18:28

remove unused import

f52b9df

update dashboard

948590f

address review comments - still need to add tests

2d3245c

SimonHeybrock reviewed Nov 19, 2025

View reviewed changes

nvaytet added 3 commits November 19, 2025 10:20

less or equal to 0 wavelength

0ccd72e

add tests

4b984b2

formatting

840469e

SimonHeybrock approved these changes Nov 19, 2025

View reviewed changes

nvaytet merged commit e0062d7 into main Nov 19, 2025
4 checks passed

nvaytet deleted the 2d-sources branch November 19, 2025 09:50

nvaytet mentioned this pull request Dec 8, 2025

Use a spread width consistent with a potentially cropped wavelength range #109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 2d sources and sources away from the origin#106

Support 2d sources and sources away from the origin#106
nvaytet merged 18 commits intomainfrom
2d-sources

nvaytet commented Nov 18, 2025

Uh oh!

nvaytet Nov 18, 2025

Uh oh!

nvaytet Nov 18, 2025

Uh oh!

nvaytet Nov 18, 2025

Uh oh!

SimonHeybrock commented Nov 19, 2025

Uh oh!

SimonHeybrock Nov 19, 2025

Uh oh!

nvaytet commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nvaytet commented Nov 18, 2025

Uh oh!

nvaytet Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

nvaytet Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

nvaytet Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock commented Nov 19, 2025

Review of PR #106: Support 2D Sources and Sources Away from Origin

Summary

Issues to Address

High Priority

1. Division by Zero with Single-Point Distributions

2. IndexError with Empty Components List

3. Uninitialized Distance Attribute

Medium Priority

4. Zero or Negative Wavelengths

5. Zero Distribution Values

6. Error Message Clarity

Low Priority / Polish

7. Network Dependency

8. Better Error for Unknown Source

9. Fully Masked Data in Reading

Verified Correct ✓

Code Quality Observations

Test Coverage Gaps

Documentation

Recommendation

Summary of Changes Needed

Uh oh!

SimonHeybrock Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

nvaytet commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants