<div class="alert alert-block alert-info">paraprobe-toolbox.</div>

# A comprehensive example with a small "silicon post"

**Markus Kühbach** (Department of Physics, Humboldt-Universität zu Berlin)<br>
Jesse Smith, Marcus Young (both at their times with the University of North Texas, Denton)

## Contextualization / Problem statement
***

This is a tutorial on how to use the paraprobe-toolbox for processing a dataset useful for<br>
testing purposes and development.<br>
The dataset is a "silicon post" which is used in atom probe as the base onto which the<br>
actual samples are placed. <a href="https://gitlab.com/jesseds/apav/-/tree/JOSS/apav/tests">The dataset is part of the APAV Python library</a>.<br>

## Get the toolbox ready
***

### Specify the location of the paraprobe-toolbox.

In [1]:
# specify where the toolbox is installed
# MYTOOLBOXPATH="<<YOUR PREFIX>>/paraprobe-toolbox"
MYTOOLBOXPATH="../../../"
from jupyterlab_h5web import H5Web
from IPython.display import Image
import sys, os, h5py, numpy as np
sys.path.append(f"{MYTOOLBOXPATH}/code")
print(f"Current working directory is\n{os.getcwd()}")

Current working directory is
/mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow


### Load the tools of the toolbox.

In [2]:
from paraprobe_utils.numerics import EPSILON, get_file_size, get_std
from paraprobe_utils.primscontinuum import RoiRotatedCuboid, RoiRotatedCylinder, RoiSphere

from paraprobe_parmsetup.transcoder_config import ParmsetupTranscoder, TranscodingTask
from paraprobe_parmsetup.ranger_config import ParmsetupRanger, ApplyExistentRanging
from paraprobe_parmsetup.selector_config import ParmsetupSelector, RoiSelectionTask
from paraprobe_parmsetup.surfacer_config import ParmsetupSurfacer, SurfaceMeshingTask
from paraprobe_parmsetup.distancer_config import ParmsetupDistancer, PointToTriangleSetDistancing
from paraprobe_parmsetup.tessellator_config import ParmsetupTessellator, TessellationTask
from paraprobe_parmsetup.spatstat_config import ParmsetupSpatstat, SpatstatTask
from paraprobe_parmsetup.nanochem_config import ParmsetupNanochem, Delocalization, InterfaceMeshing, OnedProfiles
from paraprobe_parmsetup.intersector_config import ParmsetupIntersector, VolumeFeatureSubSet, VolumeFeatureSet, VolumeVolumeTask
from paraprobe_parmsetup.clusterer_config import ParmsetupClusterer, ClustererTask
# Python transcoder utility tool which imports file formats from the atom probe community
from paraprobe_transcoder import ParaprobeTranscoder
from paraprobe_clusterer.paraprobe_clusterer import ParaprobeClusterer

In [6]:
# Python parmsetup utility tool which creates NeXus/HDF5 configuration files



# C/C++ tools of the toolbox
# you can use the path in the respective paraprobe-<<toolname>>/build/paraprobe_<<toolname>>
# Python reporter utility tool for reproducible Python post-processing and visualization
from reporter.src.python.ranger_report import ReporterRanger
from reporter.src.python.selector_report import ReporterSelector
from reporter.src.python.surfacer_report import ReporterSurfacer
from reporter.src.python.distancer_report import ReporterDistancer
from reporter.src.python.tessellator_report import ReporterTessellator
from reporter.src.python.spatstat_report import ReporterSpatstat
from reporter.src.python.nanochem_report import ReporterNanochem
from reporter.src.python.intersector_report import ReporterIntersector
from reporter.src.python.clusterer_report import ReporterClusterer
# comment in or out the relevant H5Web lines in those cells where you would like to inspect
# configuration file or results file pieces of information
# by default the H5Web lines have been commented out to run the notebook through
# with a single click of a button and not getting stopped with having to
# inspect H5Web interactive widgets
MYOMP=int(os.cpu_count() / 2)  # assuming that most CPUs are built from hyperthreading-capable core pairs
print(f"Multithreaded processing with {MYOMP} OpenMP threads.")

ModuleNotFoundError: No module named 'utils'

To learn how to handle and work with iontypes in paraprobe please inspect the specific tutorials.

# 1. Pre-processing
***

### Specify the location(s) of the your dataset(s).

In [3]:
import os
import numpy as np

In [4]:
# specify the location where you have your data on the system
MY_MEASURED_DATA_PATH = os.getcwd()


# specify disjoint identifier with which all config and results files for this analysis will be tagged.
jobids = [1]
for jobid in jobids:
    assert isinstance(jobid, int), "identifier needs to be an unsigned integer !"
    assert jobid != 0, "identifier must not be 0 !"
    assert jobid <= np.iinfo(np.uint32).max, "identifier needs to be on interval [1, 4294967295]"
print(jobids)

RECONSTRUCTION_AND_RANGING = {}
RECONSTRUCTION_AND_RANGING[jobids[0]] = ("Si.apt", "Si.RNG")

[1]


## Import your data (from e.g. IVAS/APSuite, community tool) into the paraprobe-toolbox.

In [5]:
# configure the paraprobe-transcoder tool
TRANSCODER_CONFIG = {}
for jobid in jobids:
    transcoder = ParmsetupTranscoder()
    TRANSCODER_CONFIG[jobid] = transcoder.load_reconstruction_and_ranging(
        recon_fpath=f"{MY_MEASURED_DATA_PATH}/{RECONSTRUCTION_AND_RANGING[jobid][0]}",
        range_fpath=f"{MY_MEASURED_DATA_PATH}/{RECONSTRUCTION_AND_RANGING[jobid][1]}",
        jobid=jobid)
print(TRANSCODER_CONFIG)

Computing SHA256 hash for file named /mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow/Si.apt
Computing SHA256 hash for file named /mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow/Si.RNG
Inspecting whether NeXus/HDF5 is used...
The reconstruction and ranging come from files of technology
partners but the paraprobe-toolbox uses NeXus/HDF5.
Hence, paraprobe-transcoder will transcode to NeXus/HDF5.
Writing configuration file ...
PARAPROBE.Transcoder.Config.SimID.1.nxs was written successfully
{1: 'PARAPROBE.Transcoder.Config.SimID.1.nxs'}


In [6]:
# optional inspect configuration
get_file_size(TRANSCODER_CONFIG[jobid])
# H5Web(TRANSCODER_CONFIG[jobid])

0.019 MiB


In [7]:
# execute paraprobe-transcoder Python tool
TRANSCODER_RESULTS = {}
for jobid in jobids:
    transcoder = ParaprobeTranscoder(TRANSCODER_CONFIG[jobid])
    TRANSCODER_RESULTS[jobid] = transcoder.execute()
print(TRANSCODER_RESULTS)

/entry1/transcode/reconstruction/path, /entry1/transcode/ranging/path
True
Processing configuration file: PARAPROBE.Transcoder.Config.SimID.1.nxs
Processing reconstruction: /mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow/Si.apt
Processing ranging: /mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow/Si.RNG
Results file: PARAPROBE.Transcoder.Results.SimID.1.nxs
Input reconstruction and ranging definitions use files from
technology partners (POS, ePOS, APT, RRNG, RNG) or other
file formats from the community. These will be transcoded to NeXus...
Computing SHA256 hash for file named PARAPROBE.Transcoder.Config.SimID.1.nxs
Reading /mnt/c/Users/menon/Documents/repos/projects-iuc09/paraprobe-workflow/Si.apt which is 37810040 B
File describes 945211 ions
Currently at byte_offset 540 B
keyword: tofc, found_section: [([83, 69, 67,  0], 148, 2, [116, 111, 102,  99,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 

In [8]:
# inspect paraprobe-transcoder results
get_file_size(TRANSCODER_RESULTS[jobid])
# H5Web(TRANSCODER_RESULTS[jobid])

16.239 MiB


In [None]:
# perform additional customized data analyses here if desired

## Apply existent ranging definitions.

In [10]:
# configure paraprobe-ranger
RANGER_CONFIG = {}
for jobid in jobids:
    ranger = ParmsetupRanger()
    RANGER_CONFIG[jobid] = ranger.apply_existent_ranging_definitions(
        recon_fpath=f"PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs",
        range_fpath="", jobid=jobid)
print(RANGER_CONFIG)

Computing SHA256 hash for file named PARAPROBE.Transcoder.Results.SimID.1.nxs
Writing configuration file ...
PARAPROBE.Ranger.Config.SimID.1.nxs was written successfully
{1: 'PARAPROBE.Ranger.Config.SimID.1.nxs'}


In [11]:
# inspect config file if desired
get_file_size(RANGER_CONFIG[jobid])
# H5Web(RANGER_CONFIG[jobid])

0.02 MiB


In [21]:
# execute paraprobe-ranger C/C++ tool
RANGER_RESULTS = {}
for jobid in jobids:
    CONFIG = RANGER_CONFIG[jobid]
    STDOUT, STDERR = get_std("ranger", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_ranger 1 PARAPROBE.Ranger.Config.SimID.1.nxs
    RANGER_RESULTS[jobid] = f"PARAPROBE.Ranger.Results.SimID.{jobid}.nxs"
print(RANGER_RESULTS)


libgomp: Invalid value for environment variable OMP_NUM_THREADS: 
paraprobe-ranger
A tool of the paraprobe-toolbox supporting FAIR materials science research
Supporting the community with strong-scaling and open tools for robust and automated uncertainty quantification...

The compiled code of this tool uses the source code with the following GitSha:
v0.4
"v0.5.1"
Preprocessor run
Jun 20 2024
10:51:13
Paraprobe can be cited via the following papers...
Collecting 5 publications for the tool to cite:
Article
M. K"uhbach and P. Bajaj and A. Breen and E. A. J"agle and B. Gault
On Strong Scaling Open Source Tools for Mining Atom Probe Tomography Data
Microscopy and Microanalysis, 2019, Volume 25, Supplement S2, pp298-299
https://doi.org/10.1017/S1431927619002228


Article
M. K"uhbach and P. Bajaj and H. Zhao and M. H. C"{c}elik E. A. J"agle and B. Gault
On strong-scaling and open-source tools for analyzing atom probe tomography data
npj Computational Materials, 2021, Volume 7, ppArticle nu

In [18]:
! paraprobe_ranger 1 PARAPROBE.Ranger.Config.SimID.1.nxs

paraprobe-ranger
A tool of the paraprobe-toolbox supporting FAIR materials science research
Supporting the community with strong-scaling and open tools for robust and automated uncertainty quantification...

The compiled code of this tool uses the source code with the following GitSha:
v0.4
"v0.5.1"
Preprocessor run
Jun 20 2024
10:51:13
Paraprobe can be cited via the following papers...
Collecting 5 publications for the tool to cite:
Article
M. K"uhbach and P. Bajaj and A. Breen and E. A. J"agle and B. Gault
On Strong Scaling Open Source Tools for Mining Atom Probe Tomography Data
Microscopy and Microanalysis, 2019, Volume 25, Supplement S2, pp298-299
https://doi.org/10.1017/S1431927619002228


Article
M. K"uhbach and P. Bajaj and H. Zhao and M. H. C"{c}elik E. A. J"agle and B. Gault
On strong-scaling and open-source tools for analyzing atom probe tomography data
npj Computational Materials, 2021, Volume 7, ppArticle number 21
https://doi.org/10.1038/s41524-020-00486-1


Article
M. K"u

In [13]:
# inspect paraprobe-ranger results
get_file_size(RANGER_RESULTS[jobid])
# H5Web(RANGER_RESULTS[jobid])

FileNotFoundError: [Errno 2] No such file or directory: 'PARAPROBE.Ranger.Results.SimID.1.nxs'

In [None]:
# perform additional customized data analyses here if desired, either through writing analysis code directly in the notebook

See how results can be post-processed specific for each tool using the convenience reporting<br>
and visualization Python functions offered through paraprobe-reporter.

In [None]:
for jobid in jobids:
    ranger_report = ReporterRanger(RANGER_RESULTS[jobid])
    ranger_report.get_summary()

## Create a triangle surface mesh model for the edge of the dataset.

In [None]:
# configure paraprobe-surfacer
SURFACER_CONFIG = {}
for jobid in jobids:
    surfacer = ParmsetupSurfacer()
    SURFACER_CONFIG[jobid] = surfacer.compute_convex_hull_edge_model(
        recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs",
        range_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs",
        jobid=jobid)
print(SURFACER_CONFIG)

In [None]:
# inspect config if necessary
get_file_size(SURFACER_CONFIG[jobid])
# H5Web(SURFACER_CONFIG[jobid])

In [None]:
# execute paraprobe-surfacer C/C++ tool
SURFACER_RESULTS = {}
for jobid in jobids:
    CONFIG = SURFACER_CONFIG[jobid]
    STDOUT, STDERR = get_std("surfacer", jobid)
    ! export OMP_NUM_THREADS=1 && mpiexec -n 1 paraprobe_surfacer $jobid $CONFIG 1>$STDOUT 2>$STDERR
    SURFACER_RESULTS[jobid] = f"PARAPROBE.Surfacer.Results.SimID.{jobid}.nxs"
print(SURFACER_RESULTS)

In [None]:
# inspect paraprobe-surfacer results
get_file_size(SURFACER_RESULTS[jobid])
# H5Web(SURFACER_RESULTS[jobid])

In [None]:
# perform additional customized data analyses here if desired, either through writing analysis code directly in the notebook

## Compute Euclidean distances of all ions to the edge of the dataset.

In [None]:
# configure paraprobe-distancer
DISTANCER_CONFIG = {}
for jobid in jobids:
    distancer = ParmsetupDistancer()
    DISTANCER_CONFIG[jobid] = distancer.compute_ion_to_edge_model_distances(
        recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs",
        range_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs",
        edge_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Surfacer.Results.SimID.{jobid}.nxs",
        jobid=jobid)
print(DISTANCER_CONFIG)

In [None]:
# inspect config if desired
get_file_size(DISTANCER_CONFIG[jobid])
# H5Web(DISTANCER_CONFIG[jobid])

In [None]:
# execute paraprobe-distancer C/C++ tool
DISTANCER_RESULTS = {}
for jobid in jobids:
    CONFIG = DISTANCER_CONFIG[jobid]
    STDOUT, STDERR = get_std("distancer", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_distancer $jobid $CONFIG 1>$STDOUT 2>$STDERR
    DISTANCER_RESULTS[jobid] = f"PARAPROBE.Distancer.Results.SimID.{jobid}.nxs"
print(DISTANCER_RESULTS)

In [None]:
# inspect config if desired
get_file_size(DISTANCER_RESULTS[jobid])
# H5Web(DISTANCER_RESULTS[jobid])

See how to post-process results from paraprobe-distancer using paraprobe-reporter.

In [None]:
# additional corporate design/preconfigured analyses from paraprobe-reporter
DISTANCER_PLOT = {}
for jobid in jobids:
    distancer_report = ReporterDistancer(DISTANCER_RESULTS[jobid], entry_id=1)
    distancer_report.get_summary(quantiles=[0.01, 0.50, 0.99], threshold=1.)
    DISTANCER_PLOT[jobid] = distancer_report.get_ion2mesh_distance_cdf(quantile_based=True)
    # set quantile_based=False if you would like to generate the complete curve which might take much longer computationally though
print(DISTANCER_PLOT)

In [None]:
Image(filename=DISTANCER_PLOT[jobid], width=500, height=500)

The idea of paraprobe-reporter is that you can use it to register frequently used scripts and allow for sharing of Python code.

## Tessellate the ion point cloud.

In [None]:
# configure paraprobe-tessellator
TESSELLATOR_CONFIG = {}
for jobid in jobids:
    tessellator = ParmsetupTessellator()
    TESSELLATOR_CONFIG[jobid] = tessellator.compute_complete_voronoi_tessellation(
        recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs",
        range_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs",
        jobid=jobid)
print(TESSELLATOR_CONFIG)

In [None]:
# inspect config if desired
get_file_size(TESSELLATOR_CONFIG[jobid])
# H5Web(TESSELLATOR_CONFIG[jobid])

In [None]:
# execute paraprobe-tessellator C/C++ tool
TESSELLATOR_RESULTS = {}
for jobid in jobids:
    CONFIG = TESSELLATOR_CONFIG[jobid]
    STDOUT, STDERR = get_std("tessellator", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_tessellator $jobid $CONFIG 1>$STDOUT 2>$STDERR
    TESSELLATOR_RESULTS[jobid] = f"PARAPROBE.Tessellator.Results.SimID.{jobid}.nxs"
print(TESSELLATOR_RESULTS)

In [None]:
# inspect paraprobe-tessellator results
get_file_size(TESSELLATOR_RESULTS[jobid])
# H5Web(TESSELLATOR_RESULTS[jobid])

See how to post-process paraprobe-tessellator results with paraprobe-reporter.

In [None]:
# additional corporate design/preconfigured analyses from paraprobe-reporter
TESSELLATOR_PLOT = {}
for jobid in jobids:
    tessellator_report = ReporterTessellator(TESSELLATOR_RESULTS[jobid], entry_id=1)
    tessellator_report.get_summary()
    TESSELLATOR_PLOT[jobid] = tessellator_report.get_cell_volume_cdf(task_id=1, quantile_based=True)
print(TESSELLATOR_PLOT)

In [None]:
Image(filename=TESSELLATOR_PLOT[jobid], width=500, height=500)

In [None]:
# alternatively you can of course also post-process the results yourself and inspect beyond what paraprobe-reporter offers

## Optionally select a specific ROI and find which ions are inside using paraprobe-selector.

Paraprobe-selector is a tool which can be used to identify which ions are located inside the volume<br>
(or on the surface ) of a complex set of geometric primitives. The tool yields a mask which can be<br>
used for many applications.

Here is an example how we can compute classically an axis-aligned box about and the center of mass of the dataset.

In [None]:
# get axis-aligned bounding box of the reconstructed volume
with h5py.File("PARAPROBE.Transcoder.Results.SimID.1.nxs", "r") as h5r:
    xyz = h5r["/entry1/atom_probe/reconstruction/reconstructed_positions"][:, :]
    aabb = np.zeros([3, 2], np.float32)
    center = [0., 0., 0.]  # np.zeros([3], np.float32)
    for i in np.arange(0, 3):
        aabb[i, 0] = np.min(xyz[:, i])
        aabb[i, 1] = np.max(xyz[:, i])
        center[i] = 0.5 * (aabb[i, 0] + aabb[i, 1])
    del xyz
    print("aabb bounding box")
    print(aabb)
    print("aabb center")
    print(center)
    print(np.shape(center))

In [None]:
# configure paraprobe-selector
SELECTOR_CONFIG = {}
for jobid in jobids:
    selector = ParmsetupSelector()
    # define two tasks, first instantiate a task object
    task = RoiSelectionTask()
    task.load_reconstruction(recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs")
    task.load_ranging(iontypes_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs")
    task.flt.add_spatial_filter(primitive_list=[RoiRotatedCuboid(center=center, boxdims=[10., 10., 40.])])
    task.flt.add_evaporation_id_filter(lival=(0, 2, np.iinfo(np.uint32).max))  # each second ion only
    # task.flt.add_iontyp_filter()
    # task.flt.add_hit_multiplicity_filter()
    selector.add_task(task)
    SELECTOR_CONFIG[jobid] = selector.configure(jobid)  #, verbose=True)
print(SELECTOR_CONFIG)

In [None]:
# inspect paraprobe-selector config, if desired
get_file_size(SELECTOR_CONFIG[jobid])
# H5Web(SELECTOR_CONFIG[jobid])

In [None]:
# execute paraprobe-selector C/C++ tool
SELECTOR_RESULTS = {}
for jobid in jobids:
    CONFIG = SELECTOR_CONFIG[jobid]
    STDOUT, STDERR = get_std("selector", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_selector $jobid $CONFIG 1>$STDOUT 2>$STDERR
    SELECTOR_RESULTS[jobid] = f"PARAPROBE.Selector.Results.SimID.{jobid}.nxs"
print(SELECTOR_RESULTS)

In [None]:
# inspect paraprobe-selector results
get_file_size(SELECTOR_RESULTS[jobid])
# H5Web(SELECTOR_RESULTS[jobid])

An example how to access and decode the bitmask using paraprobe-reporter.

In [None]:
for jobid in jobids:
    print(f"{jobid}")
    selector_report = ReporterSelector(SELECTOR_RESULTS[jobid])
    selector_report.get_summary()

## Optionally compute spatial statistics.

In [None]:
# configure paraprobe-spatstat
SPATSTAT_CONFIG = {}
for jobid in jobids:
    spatstat = ParmsetupSpatstat()   
    # define two tasks, first instantiate a task object
    task = SpatstatTask()
    task.load_reconstruction(
        recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs")
    task.load_ranging(
        iontypes_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs")
    # add filter as is exemplified for paraprobe-selector
    task.load_ion_to_edge_distances(
        fpath=f"PARAPROBE.Distancer.Results.SimID.{jobid}.nxs",
        dset_name=f"/entry1/point_to_triangle/distance",
        d_edge=0.160)
    task.load_ion_to_feature_distances(
        fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Distancer.Results.SimID.{jobid}.nxs",
        dset_name=f"/entry1/point_to_triangle/distance",
        d_feature=0.678)
    # from the 1% to the 5% of the signed distance distribution
    # this will place ROIs at all those (source) ions laying at least d_edge from the edge
    # we also use the edge for the feature mesh (here for demonstration purposes, one could also use something else e.g. interface mesh, isosurface etc.)
    # do not place ROIs at all those (source) ions laying farther away than d_feature to the edge
    # here the feature is the edge, in this case we can compute spatial statistics with ions
    # within a specifically customized shell closer to the edge than feature distance but not placing ROIs the subset
    # of these ions which are closer to the edge than edge distance        super().__init__("spatstat", "spatial_statistics1")
    # in addition we do not need to analyze the entire dataset, but we can also restrict the analysis to a particular window filter region of interest
    # task.flt.add_spatial_filter(primitive_list=[RoiRotatedCuboid(center=center, boxdims=[10., 10., 40.])])
    # task.flt.add_evaporation_id_filter(lival=(0, 1, np.iinfo(np.uint32).max))
    # task.flt.add_iontyp_filter()
    # task.flt.add_hit_multiplicity_filter()
    task.ion_types_source(method="resolve_all")
    task.ion_types_target(method="resolve_all")
    # either or
    task.set_knn(kth=1, binwidth=0.01, rmax=2.)
    # task.set_rdf(binwidth=0.01, rmax=2.)
    spatstat.add_task(task)
    
    SPATSTAT_CONFIG[jobid] = spatstat.configure(jobid)  # , verbose=True)
print(SPATSTAT_CONFIG)

In [None]:
# inspect paraprobe-spatstat config, if desired
get_file_size(SPATSTAT_CONFIG[jobid])
# H5Web(SPATSTAT_CONFIG[jobid])

In [None]:
# execute paraprobe-spatstat C/C++ tool
SPATSTAT_RESULTS = {}
for jobid in jobids:
    CONFIG = SPATSTAT_CONFIG[jobid]
    STDOUT, STDERR = get_std("spatstat", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_spatstat $jobid $CONFIG 1>$STDOUT 2>$STDERR
    SPATSTAT_RESULTS[jobid] = f"PARAPROBE.Spatstat.Results.SimID.{jobid}.nxs"
print(SPATSTAT_RESULTS)

In [None]:
# inspect paraprobe-spatstat results
get_file_size(SPATSTAT_RESULTS[jobid])
# H5Web(SPATSTAT_RESULTS[jobid])

See how to post-process paraprobe-spatstat results with paraprobe-reporter.

<div class="alert alert-block alert-warning">
For computing RDFs according to B. Gault et al. (https://dx.doi.org/10.1007/978-1-4614-3436-8)<br>
do not forget to define a proper value for $\frac{1}{\rho}$ the scaling density of the RDF.</div>

In [None]:
# additional corporate design/preconfigured analyses from paraprobe-reporter
SPATSTAT_PLOT = {}
for jobid in jobids:
    spatstat_report = ReporterSpatstat(SPATSTAT_RESULTS[jobid], entry_id=1)
    spatstat_report.get_knn(task_id=1)
    # spatstat_report.get_rdf(1, normalizer=1./1.) # dont forget to make a proper estimate for rho!

In [None]:
Image(filename="PARAPROBE.Spatstat.Results.SimID.1.nxs.EntryId.1.TaskId.1.Knn.Pdf.png", width=500, height=500)
# Image(filename="PARAPROBE.Spatstat.Results.SimID.1.nxs.EntryId.1.TaskId.1.Rdf.png", width=500, height=500)

# 2. High-throughput characterization of Cr iso-composition surfaces
***

## Characterize Cr rich objects if existent

We use paraprobe-nanochem, the same tool that can also be used for high-throughput iso-surface based analyses, composition profiling, and iso-surface-based edge modelling.

In [None]:
# configure paraprobe-nanochem
# high-throughput scanning of iso-surfaces and analyses of closed objects represented by patches of iso-surfaces, use this also as the first step for coprecipitation analyses
NANOCHEM_CONFIG = {}
for jobid in jobids:    
    nanochem = ParmsetupNanochem()
    
    task = Delocalization()
    task.load_reconstruction(
        recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs")
    task.load_ranging(
        iontypes_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs")
    task.load_edge_model(
        fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Surfacer.Results.SimID.{jobid}.nxs",
        vertices_dset_name="/entry1/point_set_wrapping/alpha_complex1/triangle_set/triangles/vertices",
        facet_indices_dset_name="/entry1/point_set_wrapping/alpha_complex1/triangle_set/triangles/faces")
    task.load_ion_to_edge_distances(
        fpath=f"PARAPROBE.Distancer.Results.SimID.{jobid}.nxs",
        dset_name=f"/entry1/point_to_triangle/distance")
    # no optional filters, we want to analyze the entire dataset
    # define high-throughput job of e.g. multiple delocalization tasks with each creating multiple iso-surfaces
    task.set_delocalization_input(method="compute")
    task.set_delocalization_normalization(method="composition")  # normalize to atomic fraction (at.-%)
    task.set_delocalization_whitelist(method="resolve_element", nuclide_hash=["Cr"], charge_state=[])  # iso-surface defined by all atoms of (molecular) ions with Cr in it ...
    task.set_delocalization_gridresolutions(length=[1.])  # nm, list of voxel edge length, for each length one analysis
    task.set_delocalization_kernel(sigma=[1.], size=2)  # nm and pixel respectively
    task.set_delocalization_isosurfaces(phi=np.linspace(start=0.01, stop=0.05, num=5, endpoint=True)) # isosurface starting at 4 at.-% in steps of 1 at.-% until 5 at.-%
    # task.set_delocalization_isosurfaces(phi=[0.04]) # isosurface only for 4 at.-% 
    task.set_delocalization_edge_handling(method="keep_edge_triangles")
    task.set_delocalization_edge_threshold(1.)
    task.report_fields_and_gradients(True)
    task.report_triangle_soup(True)
    task.report_objects(True)
    task.report_objects_properties(True)
    task.report_objects_geometry(True)
    task.report_objects_optimal_bounding_box(True)
    task.report_objects_ions(True)
    task.report_objects_edge_contact(True)
    # combinatorial closure of objects that are not watertight
    task.report_proxies(False)
    task.report_proxies_properties(False)
    task.report_proxies_geometry(False)
    task.report_proxies_optimal_bounding_box(False)
    task.report_proxies_ions(False)
    task.report_proxies_edge_contact(False)
    
    nanochem.add_task(task)
    NANOCHEM_CONFIG[jobid] = nanochem.configure(jobid)  # , verbose=True)
print(NANOCHEM_CONFIG)

In [None]:
# inspect config if desired
get_file_size(NANOCHEM_CONFIG[jobid])
# H5Web(NANOCHEM_CONFIG[jobid])

In [None]:
# execute paraprobe-nanochem C/C++ tool
NANOCHEM_RESULTS = {}
for jobid in jobids:
    CONFIG = NANOCHEM_CONFIG[jobid]
    STDOUT, STDERR = get_std("nanochem", jobid)
    ! export OMP_NUM_THREADS=$MYOMP && mpiexec -n 1 paraprobe_nanochem $jobid $CONFIG 1>$STDOUT 2>$STDERR
    NANOCHEM_RESULTS[jobid] = f"PARAPROBE.Nanochem.Results.SimID.{jobid}.nxs"
print(NANOCHEM_RESULTS)

In [None]:
# inspect paraprobe-nanochem results
get_file_size(NANOCHEM_RESULTS[jobid])
# H5Web(NANOCHEM_RESULTS[jobid])

See how to post-process paraprobe-nanochem results with paraprobe-reporter.

In [None]:
# directly with e.g. Python
with h5py.File(NANOCHEM_RESULTS[jobid], "r") as h5r:
    print(np.sum(h5r["/entry1/delocalization1/grid/scalar_field_magn_total/xdmf_intensity"][:]))

In [None]:
# or additional corporate design/preconfigured analyses from paraprobe-reporter
for jobid in jobids:
    nanochem_report = ReporterNanochem(NANOCHEM_RESULTS[jobid])
    nanochem_report.get_delocalization(deloc_task_id=1)
    print(nanochem_report.delocalization.isosurface_tasks.keys())
    for isrf_task_id in nanochem_report.delocalization.isosurface_tasks.keys():
        print(f"isosurface_task_id {isrf_task_id}")
        print(nanochem_report.delocalization.isosurface_tasks[isrf_task_id].isovalue)
    nanochem_report.get_isosurface_objects_volume_and_number_over_isovalue(deloc_task_id=1)

In [None]:
NANOCHEM_PLOT = "PARAPROBE.Nanochem.Results.SimID.1.nxs.EntryId.1.DelocTaskId.1.VolOverIsoComposition.png"
NANOCHEM_PLOT = "PARAPROBE.Nanochem.Results.SimID.1.nxs.EntryId.1.DelocTaskId.1.NumberOverIsoComposition.png"
Image(filename=NANOCHEM_PLOT, width=500, height=500)

When probing from 1at.-% to 21 at.-% Cr iso-composition, the number of objects and their accumulated volume<br>
changes as a function of iso-value. Shown here exemplarily are objects which are located sufficiently inside the<br>
dataset, i.e. $\geq d_{edge}$.<br>

# 3. High-throughput clustering analyses
***

In [None]:
# configure paraprobe-clusterer
CLUSTERER_CONFIG = {}
for jobid in jobids:
    clusterer = ParmsetupClusterer()
    
    task = ClustererTask()
    task.load_reconstruction(recon_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Transcoder.Results.SimID.{jobid}.nxs")
    task.load_ranging(iontypes_fpath=f"{MY_PROCESSED_DATA_PATH}/PARAPROBE.Ranger.Results.SimID.{jobid}.nxs")
    # add filter as is exemplified for paraprobe-selector
    # no edge, no feature loaded    
    task.set_ion_types_filter(
        method='resolve_element', symbol_lst=[['O']])
    
    task.set_dbscan_task(
        high_throughput_method="combinatorics",
        eps=[0.2, 0.8, 5.0],
        min_pts=[1, 5])  #eps=np.linspace(0.2, 5.0, num=49, endpoint=True), 5.0, num=241, endpoint=True), 
    task.set_hdbscan_task(
        high_throughput_method="combinatorics",
        min_cluster_size=[1, 2, 3, 4, 5],
        min_samples=[1, 5],
        cluster_selection_epsilon=[0.5, 1.0],
        alpha=[1.0])
    
    clusterer.add_task(task)
    CLUSTERER_CONFIG[jobid] = clusterer.configure(simid=jobid)  # verbose=True)
print(CLUSTERER_CONFIG)

In [None]:
# optional inspect configuration
get_file_size(CLUSTERER_CONFIG[jobid])
# H5Web(CLUSTERER_CONFIG[jobid])

In [None]:
# execute paraprobe-clusterer
CLUSTERER_RESULTS = {}
for jobid in jobids:
    CONFIG = CLUSTERER_CONFIG[jobid]
    clusterer = ParaprobeClusterer(CLUSTERER_CONFIG[jobid])
    CLUSTERER_RESULTS[jobid] = clusterer.execute()
print(CLUSTERER_RESULTS)

In [None]:
# post-process paraprobe-clusterer
get_file_size(CLUSTERER_RESULTS[jobid])
# H5Web(CLUSTERER_RESULTS[jobid])

# Conclusions
***

A typical workflow with the paraprobe-toolbox was exemplified for a Silicon test specimen.<br>
Apart from offering guidance on how the tools can be used the tutorial shows that:<br>
* Care has to be taken especially for small datasets when it comes to characterizing the number and properties of precipitates.<br>
* The here presented tools enable users to perform these analyses with an automated provenance tracking to align better with the<br>
  aims of the FAIR data stewardship principles. Specifically formatted HDF5 files use NeXus application definitions to document<br>
  each result and keep track of versioned configuration files and versioned datasets to maximize efforts aiming at numerical reproducibility.<br>
* Each tool keeps automatically track of the software tools called and stores timestamps when this happened.<br>
* Furthermore, this tutorial shows how one can run parameter sweeping studies for clustering analyses on a dataset.<br>
* This workflow can directly be applied to other dataset by changing foremost the name and path to the datasets and specific settings<br>.

## Questions?
***

If you run in problems or have suggestions how we can improve these tools, if you feel you can contribute a dataset<br>
to support us with developing further these tools, or if you would like to get support with specific data analyses:<br>
Feel free to contact me directly or members of the FAIRmat team: <a href="https://www.fairmat-nfdi.eu/fairmat/about-fairmat/team-fairmat">M. Kühbach et al.</a>

## References, acknowledgements, funding
***

<a href="https://www.github.com/FAIRmat-NFDI/nexus_definitions">Used NeXus/HDF5 data schemes can be found here.</a><br>
(c) Markus Kühbach, 2024/04<br>

<a href="https://www.fairmat-nfdi.eu/fairmat/">FAIRmat</a> is a consortium on research data management which is part of the German NFDI.<br>
The project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project 460197019.