Data reduction wrapper interface #118

YooSunYoung · 2025-02-27T14:23:01Z

This interface was requested by IDS member since it is expected to be run on a cluster, not in a notebook.

YooSunYoung · 2025-02-27T14:25:57Z

@jokasimr we can try the comparison tomorrow...!

YooSunYoung · 2025-02-27T14:36:37Z

src/ess/nmx/mcstas/executables.py

+def main() -> None:
+    parser = argparse.ArgumentParser(description="McStas Data Reduction.")
+    parser.add_argument("--input_file", type=str, help="Path to the input file")
+    parser.add_argument(
+        "--output_file",
+        type=str,
+        default="scipp_output.h5",
+        help="Path to the output file",
+    )
+    parser.add_argument(
+        "--verbose", action="store_true", help="Increase output verbosity"
+    )
+    parser.add_argument(
+        "--chunk_size",
+        type=int,
+        default=10_000_000,
+        help="Chunk size for processing",
+    )
+    parser.add_argument(
+        "--detector_ids",
+        type=int,
+        nargs="+",
+        default=[0, 1, 2],
+        help="Detector indices to process",
+    )
+
+    args = parser.parse_args()
+
+    input_file = pathlib.Path(args.input_file).resolve()
+    output_file = pathlib.Path(args.output_file).resolve()
+
+    logger = logging.getLogger(__name__)
+    if args.verbose:
+        logger.setLevel(logging.INFO)
+        logger.addHandler(logging.StreamHandler(sys.stdout))
+
+    wf = McStasWorkflow()
+    # Set the crystal rotation manually for now ...
+    wf[CrystalRotation] = sc.vector([0, 0, 0.0], unit='deg')
+    reduction(
+        input_file=input_file,
+        output_file=output_file,
+        chunk_size=args.chunk_size,
+        detector_ids=args.detector_ids,
+        logger=logger,
+        wf=wf,
+    )


Hi @aaronfinke ...! Sorry for such a delay...
This part should do the same job as the process.py you showed me before.

It doesn't have performance counter, but we can add performance counting if you want.
For 17GB of data, it took 40 seconds with chunk size of 10_000_000

Here is a simple benchmark results I collected so far:

~1e8 events per detector performance

chunk size time [s]

1_000_000 128

10_000_000 40

50_000_000 34

100_000_000 48

Do we know what is the slow part? Is it the McStas load part? I am used to a bit better performance (at least when data is on an SSD).

I haven't done a thorough profiling.
This benchmark includes loading the data on the memory and also writing the output into a file.

YooSunYoung · 2025-02-28T12:26:36Z

requirements/base.txt

 email-validator==2.2.0
    # via scippneutron
-essreduce==25.2.4
+essreduce==25.2.5


The newer version of essreduce had a breaking change in the accumulator interface.

I should also add it in the nightly dependency.

Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>

aaronfinke · 2025-03-04T14:26:55Z

What would also be useful is a way to distinguish between McStas simulated processed data and (what will be) processed real data. I suggest either putting "McStas" in the name entry /entry/sample/name or perhaps as an attribute in /entry/instrument.attrs('source') = "McStas" or both.

YooSunYoung · 2025-03-04T16:43:00Z

src/ess/nmx/mcstas/executables.py

+    export_metadata_as_nxlauetof(
+        *detector_metas,
+        experiment_metadata=experiment_metadata,
+        output_file=output_file,
+        # Arbitrary metadata falls into ``entry`` group as a variable.
+        mcstas_weight2count_scale_factor=scale_factor,


@aaronfinke
About the mcstas indicator, we also need to export mcstas specific parameters like this.
Currently if you use this particular interface, you can freely add arbitrary metadata dataset.
But maybe attributes are more convenient and easy to read... I'll work on it: #121

YooSunYoung · 2025-03-11T10:01:13Z

Continued in #122

YooSunYoung and others added 14 commits February 27, 2025 09:45

Separate metatadata from event data for easy export.

c6952f5

Add raw data metadata retrieval part.

cd89c09

Lauetof export interface.

03a5f00

Raw data metadata as dataclass

c4fdb2c

Allow arbitrary metadata and export time of flight from the coordinate.

005f890

Separate metatadata from event data for easy export.

cfa2a0e

Add raw data metadata retrieval part.

19cc8e7

Merge branch 'fix-loader' into lauetof

bae722d

Satety check in the export function.

3c6c896

Add warning filter.

462d1ef

Apply automatic formatting

9ce7fdc

Apply automatic formatting

52e0feb

Fix typo

aab78a8

Merge branch 'lauetof' into validity-check

832c942

YooSunYoung added this to Development Board Feb 27, 2025

YooSunYoung moved this to Selected in Development Board Feb 27, 2025

YooSunYoung commented Feb 27, 2025

View reviewed changes

YooSunYoung commented Feb 28, 2025

View reviewed changes

Move functions to more proper module.

75b9bb4

YooSunYoung mentioned this pull request Mar 3, 2025

Lauetof format export function #115

Merged

YooSunYoung and others added 9 commits March 3, 2025 20:34

Lauetof export interface.

8f74b82

Raw data metadata as dataclass

9e3edc1

Allow arbitrary metadata and export time of flight from the coordinate.

f2308fc

Specify unit

77cb6c9

Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>

Add docstring to export methods.

80eb190

Add missing attributes.

466ce19

Remove comments

01483e4

Merge branch 'lauetof' into validity-check

f2575cf

Fix typo.

0984094

YooSunYoung added 7 commits March 3, 2025 20:40

Move default parameter dictionary.

0ce7349

Expose executable reduction function as a script.

9693323

Data reduction script interface.

5b47c7a

Data reduction script interface.

c48a165

Fix progress log.

3dfe188

Update accumulator to match the interface.

b96563f

Update essreduce.

1cd2528

YooSunYoung force-pushed the executable branch from c33f60e to 1cd2528 Compare March 3, 2025 19:40

YooSunYoung mentioned this pull request Mar 4, 2025

Bump pin of essreduce to use accumulators. #120

Merged

YooSunYoung mentioned this pull request Mar 4, 2025

Export McStas related parameters and indicator in the output file. #121

Open

YooSunYoung commented Mar 4, 2025

View reviewed changes

Base automatically changed from validity-check to lauetof March 7, 2025 08:38

YooSunYoung mentioned this pull request Mar 11, 2025

Data reduction wrapper interface (cleaned) #122

Merged

YooSunYoung closed this Mar 11, 2025

github-project-automation bot moved this from Selected to Done in Development Board Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data reduction wrapper interface #118

Data reduction wrapper interface #118

Uh oh!

YooSunYoung commented Feb 27, 2025 •

edited

Loading

Uh oh!

YooSunYoung commented Feb 27, 2025

Uh oh!

YooSunYoung Feb 27, 2025

Uh oh!

SimonHeybrock Mar 2, 2025

Uh oh!

YooSunYoung Mar 3, 2025

Uh oh!

YooSunYoung Feb 28, 2025

Uh oh!

aaronfinke commented Mar 4, 2025

Uh oh!

YooSunYoung Mar 4, 2025

Uh oh!

YooSunYoung commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Data reduction wrapper interface #118

Data reduction wrapper interface #118

Uh oh!

Conversation

YooSunYoung commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YooSunYoung commented Feb 27, 2025

Uh oh!

YooSunYoung Feb 27, 2025

Choose a reason for hiding this comment

~1e8 events per detector performance

Uh oh!

SimonHeybrock Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

YooSunYoung Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

YooSunYoung Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

aaronfinke commented Mar 4, 2025

Uh oh!

YooSunYoung Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

YooSunYoung commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YooSunYoung commented Feb 27, 2025 •

edited

Loading