# How to convert atom probe (meta)data to NeXus/HDF5

The aim of this tutorial is to guide users how to create a NeXus/HDF5 file to parse and normalize pieces of information<br>
from typical file formats of the atom probe community into a common form. The tool assures that this NeXus file matches<br>
to the NXapm application definition. Such documented conceptually, the file can be used for sharing atom probe research<br>
with others (colleagues, project partners, the public), for uploading a summary of the (meta)data to public repositories<br>
and thus avoiding additional work that is typically with having to write documentation of metadata in such repositories<br>
or a research data management systems like NOMAD Oasis.<br>

The benefit of the data normalization that pynxtools-apm performs is that all pieces of information are represents in the<br>
same conceptual way with the benefit that most of the so far required format conversions when interfacing with software<br>
from the technology partners or scientific community are no longer necessary.<br>

### **Step 1:** Check that packages are installed and working in your local Python environment.

Check the result of the query below specifically that `jupyterlab_h5web` and `pynxtools` are installed in your environment.<br>
Note that next to the name pynxtools you should see the directory in which it is installed. Otherwise, make sure that you follow<br>
the instructions in the `README` files:  
- How to set up a development environment as in the main README  
- Lauch the jupyter lab from this environement as in the README of folder `examples`

In [None]:
! pip list | grep "h5py\|nexus\|jupyter\|jupyterlab_h5web\|pynxtools\|pynxtools-apm"

Set the pynxtools directory and start H5Web for interactive exploring of HDF5 files.

In [None]:
import os
import zipfile as zp
from jupyterlab_h5web import H5Web
print(f"Current working directory: {os.getcwd()}")
print(f"So-called base, home, or root directory of the pynxtools: {os.getcwd().replace('/examples/apm', '')}")

### **Step 2:** Use your own data or download an example

Example data can be found on Zenodo https://www.zenodo.org/record/7986279.

In [None]:
! curl --output usa_denton_smith_apav_si.zip https://zenodo.org/records/7986279/files/usa_denton_smith_apav_si.zip?download=1

In [None]:
zp.ZipFile("usa_denton_smith_apav_si.zip").extractall(path="", members=None, pwd=None)

<div class="alert alert-block alert-danger">
Please note that the metadata inside the provided apm.oasis.specific.yaml and eln_data_apm.yaml files<br>
contain exemplar values. These do not necessarily reflect the conditions when the raw data of example<br>
above-mentioned were collected by the scientists. Instead, these file are meant to be edited by you,<br>
either and preferably programmatically e.g. using output from an electronic lab notebook or manually.</div>

This example shows the types of files from which the parser collects and normalizes pieces of information:<br>
* **eln_data_apm.yaml** metadata collected with an electronic lab notebook (ELN) such as a NOMAD Oasis custom schema<br>
* **apm.oasis.specific.yaml** frequently used metadata that are often the same for many datasets to avoid having to<br>
  type it every time in ELN templates. This file can be considered a configuration file whereby e.g. coordinate system<br>
  conventions can be injected or details about the atom probe instrument communicated if that is part of frequently used<br>
  lab equipment. The benefit of such an approach is that eventual all relevant metadata to an instrument can be read from<br>
  this configuration file via guiding the user e.g. through the ELN with an option to select the instrument.<br>
* **reconstructed ion positions** in community, technology partner format with<br>
  the ion positions and mass-to-charge state ratio values for the tomographic reconstruction.<br>
* **ranging definitions** in community / technology partner formatting with<br>
  the definitions how mass-to-charge-state-ratio values map on ion species.<br>

The tool supports the most commonly used information exchange formats of the atom probe community.<br>
Consult the reference part of the documentation to get a detailed view on how specific formats are supported.<br>

<div class="alert alert-block alert-info">
Please note that the proprietary file formats RRAW, STR, ROOT, RHIT, and HITS from AMETEK/Cameca are currently not processable<br>
with pynxtools-apm although we have investigated the situation and were able confirm that a substantial number of metadata have been<br>
documented by Cameca and are technically extractable and interpretable using Python. This would enable automated mapping and<br>normalizing of these metadata into NeXus via simpler than the current route where an additional ELN or supplementary file like yaml has to be<br>
used for and eventually users have to enter the same information more than once. AMETEK/Cameca is currently working on<br>
the implementation of features in AP Suite to make some of these metadata available through the open-source APT file format<br>
when this is available we will work on an update of pynxtools-apm to support this functionality.</div>

### **Step 3:** Run the parser

In [None]:
eln_data_file_name = ["eln_data.yaml"]
deployment_specific = ["apm.oasis.specific.yaml"]
input_recon_file_name = ["Si.apt",
                         "Si.epos",
                         "Si.pos"]
input_range_file_name = ["Si.RRNG",
                         "Si.RNG",
                         "Si.RNG"]
output_file_name = ["apm.case1.nxs",
                    "apm.case2.nxs",
                    "apm.case3.nxs"]
for case_id in range(0, 3):
    ELN = eln_data_file_name[0]
    CFG = deployment_specific[0]
    RECON = input_recon_file_name[case_id]
    RANGE = input_range_file_name[case_id]
    OUTPUT = output_file_name[case_id]

    ! dataconverter $ELN $CFG $RECON $RANGE --reader apm --nxdl NXapm --output $OUTPUT

### **Step 4:** Inspect the NeXus/HDF5 file using H5Web.

In [None]:
H5Web(OUTPUT)

The NeXus file an also be viewed with H5Web by opening it via the file explorer panel to the left side of this Jupyter lab window.

# Conclusions:
***

This tutorial showed how you can call the pynxtools-apm via a jupyter notebook.<br>
This opens many possibilities like processing the results further with Python such as through e.g.<br>
<a href="https://conda.io/projects/conda/en/latest/user-guide/install/index.html">conda</a> on your local computer, <a href="https://docs.python.org/3/tutorial/venv.html">a virtual environment</a>, to interface with AMETEK/Cameca\'s AP Suite<br>
<a href="https://github.com/CamecaAPT/cameca-customanalysis-interface/wiki">extension interface</a> to do processing of the data with scientific software from the atom probe<br>
such as <a href="https://github.com/FAIRmat-NFDI/AreaB-software-tools">open-source tools</a> (paraprobe-toolbox and others</a>) or IVAS / AP Suite.<br>

### Contact person for pynxtools-apm and related examples in FAIRmat:
Dr.-Ing. Markus Kühbach, 2024/09/12<br>

### Funding
<a href="https://www.fairmat-nfdi.eu/fairmat">FAIRmat</a> is a consortium on research data management which is part of the German NFDI.<br>
The project is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project 460197019.