Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of LoadEventNexus with compression #36594

Open
peterfpeterson opened this issue Dec 29, 2023 · 0 comments
Open

New implementation of LoadEventNexus with compression #36594

peterfpeterson opened this issue Dec 29, 2023 · 0 comments
Labels
ORNL Team Issues and pull requests managed by the ORNL development team

Comments

@peterfpeterson
Copy link
Member

peterfpeterson commented Dec 29, 2023

Motivation

It has been frequently observed that LoadEventNexus is one of the slowest steps in any workflow. With the recent addition of disk read/write speed to mantid-profiler, it can be more clearly seen that LoadEventNexus is spending much of its time doing things other than reading from disk (see screenshot in the pull request). Some observations:

  • dd for a ~10GB file can copy to /dev/null at approximately 9Gbps
  • according to mantid-profiler, peak disk read for this file is closer to 15Gbps
  • the loadEvents portion of LoadEventNexus has an overall average of 4.7Gbps
  • profilers (prof and intel's vtune) show that ~50% of the time in this portion of code is spent in std::vector::emplace_back creating events and allocating memory
  • many workflows do not require event filtering and would work perfectly well with compressed events from the loaded data
  • other software (in python and idl) histograms as it reads the file and demonstrates significantly better performance

The current method for when CompressTolerance>0 is to create all the EventLists with EventType::TOF then call EventList::compressEvents(). EventList::compressEvents() then sorts the events by time-of-flight followed by compression. While this will create a relatively small EventWorkspace, it will still allocate memory for all of the EventType::TOF events (large temporary memory), then sort each EventList in serial (which is slow). These shortcomings were the main motivation for the creation of the LoadEventAndCompress algorithm which loads the file in chunks that are compressed and accumulated.

Talking with various CIS at ORNL, more than 2/3 of measurements could use this method for "large" files.

Suggested solution

Rather than allocating all of the events then re-using code for sorting and compressing, create a histogram while reading through the events, then convert the data to an EventList of EventType::WEIGHTED_NOTIME. This can be done by making a separate implementation of the ProcessBankData class which would be selected by specifying a compression tolerance and not having periods (see note below). The method for the new ProcessBankCompressed class will:

  • introspect the time-of-flights to determine the full range (taking into account user specifying reduced range)
  • configure an object that can calculate the bin index for a given time-of-flight - look at EventList::generateCountsHistogram() for details on the calculation. The method may need to be refactored to aid code re-use.
  • configure an object that stores the temporary histogram for each detector-id
    • has a std::vector<float> to store the information about the total of all time-of-flight for each bin. The effective time of flight is calculated by dividing by the number of events in the bin
    • has a std::vector<int> to store the number of events in each bin
  • once all of the events have been processed, this temporary histogram will create weighted events in a method that supplies the EventList to append them to. This should generate events in a similar manner to how CompressEvents algorithm does.
    • [tof] = [sum of contributing tof] / [number of events] <- convert to double
    • [weight] = [number of events]
    • [errorSquared] = [number of events]

By adding a single class, the changes to the code can be localized to the new class, and connecting an option in LoadBankFromDiskTask to select the correct bank processor. The downside is that event_index and pulse information will be read from disk even though they will not be used.

There will be cases for which this method will not be used:

  • "Small" event files this will be significantly more memory intensive than loading the events and compressing them. It is up to the user to use LoadEventNexus + CompressEvents instead. The cross-over point will be related to when the number of events is equal to the number of bins in the temporary histogram.
  • Files with period data would require a temporary histogram for each period. This will default to the current method.
  • Files with weighted events will default to the current method.

This could be used by even more workflows if FilterBadPulses (default is off) is included as an additional parameter in LoadEventNexus. Similarly, it would be more useful if the behavior of veto pulses were included. This can be included in later versions by creating a TimeROI in LoadEventNexus and supplying it to the underlying code.

New classes

There will be a number of classes created in an effort to make this more organized and maintainable.

  • ProcessBankCompressed is mostly described above. Additionally, it will provide whatever functionality is necessary to filter events (e.g. based on time-of-flight-range or wall-clock time). It is also responsible for handling the life-cycle of classes it uses to process the individual events.
  • CompressedEventsAccumulator is a concept of something that takes events and updates the fine-bin histogram. It is unlikely to be an actual concrete or abstract class. The concept is that after being configured it is used in two phases: (1) events are added and stored in a way that can be later used and (2) to create actual events that are appended to EventList.
  • CompressedEventsBankAccumulator will contain information about the fine histogram parameters and a collection of CompressedEventSpectrumAccumulators. There will be a single CompressedEventBankAccumulator in each ProcessBankCompressed task.
  • CompressedEventSpectrumAccumulator will accumulate the supplied events for a single spectrum into a fine histogram. Above, this is described as two parallel vectors of vector<int> and vector<float> with information for EventList::FindLinearBin(). Measurements should be made to determine if the fine histogram could be stored as map<int, pair<int, float> and used with enough performance for a variety of cases. If the performance is equal, storing the temporary, fine histogram as a map is preferred since it will only contain bin boundaries where events exist.

Describe alternatives you've considered

If the individual Event objects had accessors and mutators, then rather than reserving the correct amount of space in the EventList and using std::vector::emplace_back, it would be possible to create a vector of the correct size using the std::vector constructor then modify the time-of-flight and pulse time information. There has not been an exploration of this technique and the benefits are unknown. Creating a fine histogram during load had been demonstrated by other software.

Additional context

While much of this information is anecdotal, the performance for a particular "large" event file is easily measured. The information at the top of this issue were observed with a laptop loading VULCAN_218092 which is 10GiB in size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ORNL Team Issues and pull requests managed by the ORNL development team
Projects
None yet
Development

No branches or pull requests

1 participant