Skip to content

Conversation

@YooSunYoung
Copy link
Member

@YooSunYoung YooSunYoung commented Feb 27, 2025

This interface was requested by IDS member since it is expected to be run on a cluster, not in a notebook.

@YooSunYoung
Copy link
Member Author

@jokasimr we can try the comparison tomorrow...!

Comment on lines +222 to +268
def main() -> None:
parser = argparse.ArgumentParser(description="McStas Data Reduction.")
parser.add_argument("--input_file", type=str, help="Path to the input file")
parser.add_argument(
"--output_file",
type=str,
default="scipp_output.h5",
help="Path to the output file",
)
parser.add_argument(
"--verbose", action="store_true", help="Increase output verbosity"
)
parser.add_argument(
"--chunk_size",
type=int,
default=10_000_000,
help="Chunk size for processing",
)
parser.add_argument(
"--detector_ids",
type=int,
nargs="+",
default=[0, 1, 2],
help="Detector indices to process",
)

args = parser.parse_args()

input_file = pathlib.Path(args.input_file).resolve()
output_file = pathlib.Path(args.output_file).resolve()

logger = logging.getLogger(__name__)
if args.verbose:
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler(sys.stdout))

wf = McStasWorkflow()
# Set the crystal rotation manually for now ...
wf[CrystalRotation] = sc.vector([0, 0, 0.0], unit='deg')
reduction(
input_file=input_file,
output_file=output_file,
chunk_size=args.chunk_size,
detector_ids=args.detector_ids,
logger=logger,
wf=wf,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aaronfinke ...! Sorry for such a delay...
This part should do the same job as the process.py you showed me before.

It doesn't have performance counter, but we can add performance counting if you want.
For 17GB of data, it took 40 seconds with chunk size of 10_000_000

Here is a simple benchmark results I collected so far:

~1e8 events per detector performance

chunk size time [s]
1_000_000 128
10_000_000 40
50_000_000 34
100_000_000 48

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what is the slow part? Is it the McStas load part? I am used to a bit better performance (at least when data is on an SSD).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a thorough profiling.
This benchmark includes loading the data on the memory and also writing the output into a file.

email-validator==2.2.0
# via scippneutron
essreduce==25.2.4
essreduce==25.2.5
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newer version of essreduce had a breaking change in the accumulator interface.

I should also add it in the nightly dependency.

@aaronfinke
Copy link
Collaborator

What would also be useful is a way to distinguish between McStas simulated processed data and (what will be) processed real data. I suggest either putting "McStas" in the name entry /entry/sample/name or perhaps as an attribute in /entry/instrument.attrs('source') = "McStas" or both.

Comment on lines +168 to +173
export_metadata_as_nxlauetof(
*detector_metas,
experiment_metadata=experiment_metadata,
output_file=output_file,
# Arbitrary metadata falls into ``entry`` group as a variable.
mcstas_weight2count_scale_factor=scale_factor,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfinke
About the mcstas indicator, we also need to export mcstas specific parameters like this.
Currently if you use this particular interface, you can freely add arbitrary metadata dataset.
But maybe attributes are more convenient and easy to read... I'll work on it: #121

Base automatically changed from validity-check to lauetof March 7, 2025 08:38
@YooSunYoung
Copy link
Member Author

Continued in #122

@github-project-automation github-project-automation bot moved this from Selected to Done in Development Board Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants