Skip to content

Support 2d sources and sources away from the origin#106

Merged
nvaytet merged 18 commits intomainfrom
2d-sources
Nov 19, 2025
Merged

Support 2d sources and sources away from the origin#106
nvaytet merged 18 commits intomainfrom
2d-sources

Conversation

@nvaytet
Copy link
Copy Markdown
Member

@nvaytet nvaytet commented Nov 18, 2025

In this PR, we add support for sources that have a 2d probability distribution.
Previously, a source was defined by a 1d time distribution and a 1d wavelength distribution.
This meant that intrinsically the distribution was square in the (time, wavelength) parameter space.
Screenshot_20251118_160359

We now generalize this to a 2d p distribution that can represent sources such as the one for Odin, taken some meters away from the moderator:
Screenshot_20251118_160439

This last parameter of the source (2.5m away from the origin) meant that we also needed to modify slightly how the model works to handle sources that are not located at 0m.

A library of sources was created and the data is saved in https://github.com/scipp/tof-sources.

Note that the old 1D p_time and p_wav still work, a 2d broadcast of the profiles is performed internally.
We also removed a behind-the-scenes linear interpolation between data points given by the source profiles, which was getting expensive when working in 2d. We now require users to supply a profile with enough resolution to satisfy their needs (note changes in docs where some linspace was added instead of using only 3 data points in the profiles).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep the pulse_profile.py around for backward compatibility (the ess_pulse submoduleneeds it and this is needed in some places inessreduce`). But in principle, we don't need it anymore.

Comment thread src/tof/reading.py
mask = ~one_mask(self.data.masks)
mask.unit = ""
return (self.data.coords[self.dim] * mask).min()
da = self.data.copy(deep=False)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a bug here: the min value was always returning 0 because something multiplied by a False would give 0 and that is the min. Instead, the masked values need to be ignored.

Comment thread src/tof/source.py
if wmax is None:
wmax = p_wav.coords[w_dim].max()

time_interpolator = interp1d(p_time, dim=t_dim, fill_value="extrapolate")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the internal linear interpolation and the somewhat unusual sampling arg.
We now require users to provide a distribution with enough data points.

@SimonHeybrock
Copy link
Copy Markdown
Member

Reviewed with Claude:

Review of PR #106: Support 2D Sources and Sources Away from Origin

Reviewer: Claude
Date: 2025-11-19
Branch: 2d-sources
Test Results: 95/97 tests passing (2 failures unrelated - missing ipywidgets)


Summary

This PR adds important scientific capability: 2D probability distributions for neutron sources and support for sources positioned away from the origin. The implementation is well-designed with solid mathematics and good test coverage for the main use cases.

Key Changes:

  • 2D distributions in (time, wavelength) space
  • Sources can be positioned at non-zero distances
  • Removes scipy dependency, adds pooch for remote data fetching
  • Removes automatic interpolation (breaking change, well-documented)

Overall Assessment: The core feature is sound and ready, but there are several edge cases that should be addressed before merging.


Issues to Address

High Priority

1. Division by Zero with Single-Point Distributions

Location: src/tof/source.py:99-100

dt = 0.5 * (tmax - tmin).value / (p.sizes[t_dim] - 1)
dw = 0.5 * (wmax - wmin).value / (p.sizes[w_dim] - 1)

If a distribution has only 1 point in a dimension, this divides by zero, producing NaN. The rejection sampling loop then runs indefinitely since rng.normal(scale=nan) produces invalid values.

Test case:

p_time = sc.DataArray(
    data=sc.array(dims=['birth_time'], values=[1.0]),
    coords={'birth_time': sc.array(dims=['birth_time'], values=[1000.0], unit='us')}
)
# Result: dt = nan, infinite loop

Suggested fix:

if p.sizes[t_dim] < 2 or p.sizes[w_dim] < 2:
    raise ValueError(
        f"Distribution must have at least 2 points in each dimension. "
        f"Got {p.sizes[t_dim]} in {t_dim}, {p.sizes[w_dim]} in {w_dim}"
    )

2. IndexError with Empty Components List

Location: src/tof/model.py:276

components = sorted(...)

if components[0].distance < self.source.distance:  # IndexError if empty
    raise ValueError(...)

Creating a model with no choppers or detectors causes an IndexError.

Test case:

model = tof.Model(source=source, choppers=[], detectors=[])
result = model.run()  # IndexError: list index out of range

Suggested fix:

if components and components[0].distance < self.source.distance:
    raise ValueError(...)

Or validate in __init__:

if not self.choppers and not self.detectors:
    raise ValueError("Model must have at least one chopper or detector")

3. Uninitialized Distance Attribute

Location: src/tof/source.py:__init__

The __init__ method only sets self._distance when facility is not None. If someone tries to create Source(facility=None, ...) directly (bypassing the class methods), accessing source.distance raises AttributeError.

Suggested fix:
Add at end of __init__:

if self._facility is None:
    # Should only be created via class methods
    if not hasattr(self, '_distance'):
        raise ValueError(
            "Cannot create Source with facility=None directly. "
            "Use Source.from_neutrons() or Source.from_distribution()"
        )

Medium Priority

4. Zero or Negative Wavelengths

Location: src/tof/source.py (in from_neutrons and via distributions)

No validation prevents wavelength ≤ 0, which produces infinite speed via the physics formula v = h/(m·λ). This doesn't crash but gives physically incorrect results.

Test case:

wavelengths = sc.array(dims=['event'], values=[0.0, 5.0], unit='angstrom')
source = Source.from_neutrons(birth_times=times, wavelengths=wavelengths)
# Result: speed = [inf, 791.207] m/s

Suggested fix: Add validation in both from_neutrons and after sampling in _make_pulses:

if sc.any(wavelengths <= sc.scalar(0.0, unit=wavelengths.unit)):
    raise ValueError("All wavelengths must be positive")

5. Zero Distribution Values

Location: src/tof/source.py:112

If a user provides an all-zero distribution (or negative values), the normalization p_flat /= p_flat.data.sum() fails with a cryptic numpy error.

Current error: ValueError: Probabilities contain NaN

Suggested fix: Add helpful error before normalization:

prob_sum = p_flat.data.sum()
if prob_sum <= 0:
    raise ValueError(
        "Distribution must have at least one positive probability value. "
        f"Sum of probabilities is {prob_sum.value}"
    )

6. Error Message Clarity

Location: src/tof/model.py:277-279

The error message says "At least one component is located before the source" but the check only looks at components[0] (the closest component). More specific wording would help:

if components and components[0].distance < self.source.distance:
    raise ValueError(
        f"The first component ({components[0].name} at {components[0].distance}) "
        f"is located before the source (at {self.source.distance})"
    )

Low Priority / Polish

7. Network Dependency

Location: src/tof/facilities/__init__.py:12-17

The new pooch-based data fetching introduces a network dependency:

  • No explicit timeout (could hang on slow connections)
  • No offline fallback
  • Could fail behind corporate firewalls

Suggestions:

  • Document the network requirement in the docstring
  • Consider adding a timeout parameter if pooch supports it
  • Provide clearer error messages when downloads fail

8. Better Error for Unknown Source

Location: src/tof/facilities/__init__.py:20-21

def get_source_path(name: str) -> str:
    return _source_registry.fetch(_source_library[name]["path"])  # KeyError

Suggested improvement:

def get_source_path(name: str) -> str:
    if name not in _source_library:
        available = ', '.join(_source_library.keys())
        raise ValueError(
            f"Unknown source '{name}'. Available sources: {available}"
        )
    return _source_registry.fetch(_source_library[name]["path"])

9. Fully Masked Data in Reading

Location: src/tof/reading.py:40-56

If all neutrons are blocked by choppers (fully masked data), the min() and max() methods may give undefined results. This affects the __repr__ output.

Suggested fix:

def min(self):
    if self.data.sum().value == 0:
        return sc.scalar(float('nan'), unit=self.data.coords[self.dim].unit)
    da = self.data.copy(deep=False)
    da.data = da.coords[self.dim]
    return da.min().data

Verified Correct ✓

I did deep mathematical verification on the core algorithms:

  1. Distance calculations: Manually verified with source at 2.5m, detector at 10m. Time-of-arrival matches hand calculation to 0.01 µs precision.

  2. 2D distribution normalization: The broadcast operation p = (p_wav / sum) * (p_time / sum) correctly produces an outer product that sums to 1.0.

  3. Flattening and coordinate handling: Verified that flattened 2D arrays maintain proper coordinate correspondence.

  4. Sampling mathematics: The rejection sampling approach with added noise is sound. The algorithm expects uniform point spacing (as produced by linspace), which is the standard practice in all examples and facility sources.

The physics is correct, the math is solid, and the implementation is clean.


Code Quality Observations

Strengths:

  • Clean abstraction for handling both 1D and 2D distributions
  • Backward compatible (old API still works)
  • Well-documented breaking changes with migration examples
  • Good test coverage for main use cases

Suggestions:

  • Add validation earlier in the pipeline (distribution points, wavelength values)
  • More helpful error messages for edge cases

Test Coverage Gaps

Missing tests for edge cases:

  • Single-point distributions
  • Empty component lists
  • Zero/negative wavelengths
  • All-zero distributions
  • Fully masked data (all neutrons blocked)

Well covered:

  • 2D distributions (parametrized test)
  • Source not at origin
  • Multiple pulses
  • Backward compatibility

Documentation

The documentation updates are excellent:

  • Clear warning boxes about the interpolation change
  • Updated examples with proper sampling
  • New 2D distribution example

Minor suggestion: Add explanation of the difference between facility="ess" and facility="ess-odin" in the docs so users know which to choose.


Recommendation

APPROVE after addressing high-priority issues

The 2D source feature is well-designed and the core implementation is mathematically sound. The main concerns are edge cases that could cause crashes or confusing errors. I'd recommend:

Before merge:

  1. Fix the division-by-zero issue (Add CI setup #1)
  2. Fix the empty components issue (Add docs skeleton #2)
  3. Fix the uninitialized distance issue (Add type hints #3)
  4. Add wavelength validation (Add api ref page to docs #4)
  5. Improve zero-distribution error message (Handle counter-rotating choppers #5)

Can be follow-up:

These are all straightforward fixes that will make the feature more robust without changing the core design.


Summary of Changes Needed

# In src/tof/source.py:_make_pulses, after line 78:
if p.sizes[t_dim] < 2 or p.sizes[w_dim] < 2:
    raise ValueError(...)

# In src/tof/source.py:_make_pulses, after line 111:
if p_flat.data.sum() <= 0:
    raise ValueError(...)

# In src/tof/source.py:from_neutrons and _make_pulses result:
if sc.any(wavelength <= 0):
    raise ValueError("All wavelengths must be positive")

# In src/tof/model.py:276:
if components and components[0].distance < self.source.distance:
    raise ValueError(...)

# In src/tof/source.py:__init__, at end:
if self._facility is None and self._data is None:
    raise ValueError(...)

All of these are quick validation additions that will prevent confusing errors.

Comment thread src/tof/source.py Outdated
unit=WAV_UNIT,
).fold(dim=dim, sizes={"pulse": pulses, dim: neutrons})
wavelengths = np.concatenate(wavs)
if np.any(wavelengths < 0):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be <=?

@nvaytet
Copy link
Copy Markdown
Member Author

nvaytet commented Nov 19, 2025

  1. Uninitialized Distance Attribute

I won't fix this one. The same goes for the frequency.
The default source constructor is designed to use parameters from a facility, where distance and frequency will always be defined.

@nvaytet nvaytet merged commit e0062d7 into main Nov 19, 2025
4 checks passed
@nvaytet nvaytet deleted the 2d-sources branch November 19, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants