-
Notifications
You must be signed in to change notification settings - Fork 1
Data reduction wrapper interface #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jokasimr we can try the comparison tomorrow...! |
| def main() -> None: | ||
| parser = argparse.ArgumentParser(description="McStas Data Reduction.") | ||
| parser.add_argument("--input_file", type=str, help="Path to the input file") | ||
| parser.add_argument( | ||
| "--output_file", | ||
| type=str, | ||
| default="scipp_output.h5", | ||
| help="Path to the output file", | ||
| ) | ||
| parser.add_argument( | ||
| "--verbose", action="store_true", help="Increase output verbosity" | ||
| ) | ||
| parser.add_argument( | ||
| "--chunk_size", | ||
| type=int, | ||
| default=10_000_000, | ||
| help="Chunk size for processing", | ||
| ) | ||
| parser.add_argument( | ||
| "--detector_ids", | ||
| type=int, | ||
| nargs="+", | ||
| default=[0, 1, 2], | ||
| help="Detector indices to process", | ||
| ) | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| input_file = pathlib.Path(args.input_file).resolve() | ||
| output_file = pathlib.Path(args.output_file).resolve() | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| if args.verbose: | ||
| logger.setLevel(logging.INFO) | ||
| logger.addHandler(logging.StreamHandler(sys.stdout)) | ||
|
|
||
| wf = McStasWorkflow() | ||
| # Set the crystal rotation manually for now ... | ||
| wf[CrystalRotation] = sc.vector([0, 0, 0.0], unit='deg') | ||
| reduction( | ||
| input_file=input_file, | ||
| output_file=output_file, | ||
| chunk_size=args.chunk_size, | ||
| detector_ids=args.detector_ids, | ||
| logger=logger, | ||
| wf=wf, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aaronfinke ...! Sorry for such a delay...
This part should do the same job as the process.py you showed me before.
It doesn't have performance counter, but we can add performance counting if you want.
For 17GB of data, it took 40 seconds with chunk size of 10_000_000
Here is a simple benchmark results I collected so far:
~1e8 events per detector performance
| chunk size | time [s] |
|---|---|
| 1_000_000 | 128 |
| 10_000_000 | 40 |
| 50_000_000 | 34 |
| 100_000_000 | 48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what is the slow part? Is it the McStas load part? I am used to a bit better performance (at least when data is on an SSD).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a thorough profiling.
This benchmark includes loading the data on the memory and also writing the output into a file.
| email-validator==2.2.0 | ||
| # via scippneutron | ||
| essreduce==25.2.4 | ||
| essreduce==25.2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newer version of essreduce had a breaking change in the accumulator interface.
I should also add it in the nightly dependency.
Co-authored-by: Simon Heybrock <12912489+SimonHeybrock@users.noreply.github.com>
|
What would also be useful is a way to distinguish between McStas simulated processed data and (what will be) processed real data. I suggest either putting "McStas" in the name entry |
| export_metadata_as_nxlauetof( | ||
| *detector_metas, | ||
| experiment_metadata=experiment_metadata, | ||
| output_file=output_file, | ||
| # Arbitrary metadata falls into ``entry`` group as a variable. | ||
| mcstas_weight2count_scale_factor=scale_factor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronfinke
About the mcstas indicator, we also need to export mcstas specific parameters like this.
Currently if you use this particular interface, you can freely add arbitrary metadata dataset.
But maybe attributes are more convenient and easy to read... I'll work on it: #121
|
Continued in #122 |
This interface was requested by IDS member since it is expected to be run on a cluster, not in a notebook.