Skip to content

Conversation

@SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Jun 25, 2024

Changes

  • Bypass the same single-threading problem encountered in SANS, I think we either have to put a helper into ESSreduce, or see if the underlying problem should be fixed in Scipp.
  • Avoid storing position for every event. This will only affect Geant4 data, so the improvements from that will not be seen for NeXus data. Note that this can lead to NaN positions if there are no events. Is this a problem (not in the tests)? If so we either need to revert, or find a better solution.
  • Remove tof and wavelength event coord that are not needed any more.
  • Fixes Check workflow performance #41 (but I guess it has to be revisited sooner or later). I did not spot any other "obvious" problems in the functions taking the most time.

Baseline

Baseline for Geant4 workflow:

2.47 seconds load_geant4_csv
4.75 seconds load_geant4_csv
3.00 seconds extract_geant4_detector_data
6.11 seconds extract_geant4_detector_data
0.62 seconds normalize_by_proton_charge
4.38 seconds add_scattering_coordinates_from_positions
0.35 seconds apply_masks
0.99 seconds normalize_by_proton_charge
4.36 seconds convert_to_dspacing
5.20 seconds add_scattering_coordinates_from_positions
0.23 seconds apply_masks
2.94 seconds convert_to_dspacing
9.75 seconds focus_data_dspacing
10.12 seconds focus_data_dspacing
1.19 seconds normalize_by_vanadium_dspacing

I artificially increased the data size by modifying the data after loading:

def extract_geant4_detector_data(
    detector: RawDetector[RunType],
) -> RawDetectorData[RunType]:
    """Extract the histogram or event data from a loaded GEANT4 detector."""
    data = extract_detector_data(detector)
    data = data.bins.concatenate(data)
    data = data.bins.concatenate(data)
    data = data.bins.concatenate(data)
    data = data.bins.concatenate(data)
    data = data.bins.concatenate(data)
    data = data.bins.concatenate(data) 
    data = data.bins.concatenate(data)  # ~ 10 GByte (sample run)
    return RawDetectorData[RunType](data)

The timings reported for this function are thus irrelevant in reality.

This branch

RawDetectorData[SampleRun] is now 6 GByte instead of 10 GByte:

2.64 seconds load_geant4_csv
5.61 seconds load_geant4_csv
2.01 seconds extract_geant4_detector_data
3.84 seconds extract_geant4_detector_data
0.34 seconds normalize_by_proton_charge
0.19 seconds apply_masks
0.18 seconds convert_to_dspacing
1.38 seconds focus_data_dspacing
0.47 seconds normalize_by_proton_charge
0.22 seconds apply_masks
0.20 seconds convert_to_dspacing
1.48 seconds focus_data_dspacing
0.83 seconds normalize_by_vanadium_dspacing

@SimonHeybrock SimonHeybrock requested a review from jl-wynen June 25, 2024 14:37
@SimonHeybrock SimonHeybrock marked this pull request as draft June 25, 2024 14:46
@SimonHeybrock
Copy link
Member Author

Converted to draft, since it looks like I messed up memory consumption (at least CI is having trouble, it seems).

@SimonHeybrock
Copy link
Member Author

Looking at https://github.com/scipp/essdiffraction/actions/workflows/ci.yml it seems the CI timings were a perfectly normal outlier. Ready for review once more!

@SimonHeybrock SimonHeybrock marked this pull request as ready for review June 25, 2024 15:46
"intermediates[MaskedData[SampleRun]].bins.concat().hist(\n",
" two_theta=300, wavelength=300\n",
").plot(norm=\"log\")"
"two_theta = sc.linspace(\"two_theta\", 0.8, 2.4, 301, unit=\"rad\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the explicit limits? Does this really matter so much for performance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have some voxels with NaN positions, and get an exception otherwise.

res = da.group("sector", *elements)
else:
res = da.group(*elements)
res.coords['position'] = res.bins.coords.pop('position').bins.mean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check whether this changes the final result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping there are tests for that...?

Copy link
Member

@jl-wynen jl-wynen Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for the complete $I(d)$. We only test a subset of properties of the output.

The problem is that the workflow is incomplete. Most updates in the near future will break regression tests.

Comment on lines +17 to +20
def _drop_grouping_and_bin(
data: sc.DataArray, *, dims_to_reduce: tuple[str, ...] | None = None, edges: dict
) -> sc.DataArray:
all_pixels = data if dims_to_reduce == () else data.bins.concat(dims_to_reduce)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _drop_grouping_and_bin(
data: sc.DataArray, *, dims_to_reduce: tuple[str, ...] | None = None, edges: dict
) -> sc.DataArray:
all_pixels = data if dims_to_reduce == () else data.bins.concat(dims_to_reduce)
def _drop_grouping_and_bin(
data: sc.DataArray, *, edges: dict
) -> sc.DataArray:
all_pixels = data.bins.concat(dims_to_reduce)

because dims_to_reduce is never used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using it for the two-theta case, but then refactored. Since I wanted to consider moving this to ESSreduce I thought I'd keep it.

# inferior performance when binning (no/bad multi-threading?).
# We operate on the content buffer for better multi-threaded performance.
if all_pixels.ndim == 0:
content = all_pixels.bins.constituents['data']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safer to use this? Because it excludes constituents that are out of range?

Suggested change
content = all_pixels.bins.constituents['data']
content = all_pixels.value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point that looks simpler, I will try.

# inferior performance when binning (no/bad multi-threading?).
# We operate on the content buffer for better multi-threaded performance.
if all_pixels.ndim == 0:
content = all_pixels.bins.constituents['data']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be done by bin automatically. Is this possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See what I wrote in the PR:

Bypass the same single-threading problem encountered in SANS, I think we either have to put a helper into ESSreduce, or see if the underlying problem should be fixed in Scipp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Can this be done in Scipp now instead of adding extra code here and potentially in other projects? If not, please open an issue!

Copy link
Member Author

@SimonHeybrock SimonHeybrock Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the larger issue is that we have to concat first, which in itself is avoiding other performance issues. Making many->many mapping is a long-standing issues and has seen a couple improvements in the past, but is apparently not fully solved (See scipp/scipp#1846).

What could probably be done in Scipp is bypass the single-threading issue, but it also required some thought (not sure there aren't any subtleties yet), but it is slightly odd to put an optimization for a solution that bypasses another problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonHeybrock SimonHeybrock merged commit f4d0421 into main Jun 28, 2024
@SimonHeybrock SimonHeybrock deleted the performance branch June 28, 2024 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check workflow performance

3 participants