# SPMpy Quickstart v0.1

SPMpy is an open-source collection of Python tools for analyzing multi-dimensional scanning probe microscopy (SPM) data,
including STM/S and AFM. It uses **`xarray`** as the primary data container to preserve both data and metadata.

**Authors:** Dr. Jewook Park (CNMS, ORNL)  
**Contact:** parkj1@ornl.gov

### Stages in this notebook
- **Stage 0:** Environment check + repository bootstrap
- **Stage 1:** Data loading (Nanonis `.sxm`, `.3ds`) into `xarray.Dataset`
- **Stage 2 (planned):** Visualization and analysis utilities

### License note
This repository is provided for internal and collaborative review. Licensing terms will be finalized according to ORNL/DOE policies.


## Notebook Navigation

- **Stage 0** — Environment check & bootstrap
  - Step 1: Set `REPO_ROOT` and import `spmpy`
  - Step 2: Run structured environment diagnostics
  - Step 3: Decision & next action
- **Stage 1** — Data loading (STM/SPM files)
  - Stage 1.1: 2D image data (`.sxm`) → `xarray.Dataset`
  - Stage 1.2 (planned): GridSpectroscopy (`.3ds`) → `xarray.Dataset`
- **Stage 2 (planned)** — Visualization & analysis

**Tip:** Run cells from top to bottom. Markdown cells describe what to do and what to expect.


## Stage-0 Step 1 — Local repository bootstrap

## Stage 0 — Step 1: Bootstrap the local repository

Set `REPO_ROOT` to your local SPMpy clone folder, add it to `sys.path`, then import `spmpy`.


In [1]:
import sys
from pathlib import Path

# IMPORTANT: set this to your local SPMpy repository root
REPO_ROOT = Path(r"C:\\Users\\gkp\\Documents\\GitHub\\SPMpy")

if not REPO_ROOT.exists():
    raise RuntimeError(
        f"[SPMpy] Repo root does not exist: {REPO_ROOT}\n"
        "[Action] Edit REPO_ROOT to match your local clone location."
    )

if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

import spmpy
print('[SPMpy] Imported from:', spmpy.__file__)


[SPMpy] Imported from: C:\Users\gkp\Documents\GitHub\SPMpy\spmpy\__init__.py


## Stage-0 Step 2 — Environment diagnostic (read-only)

In [2]:
# Safe import of env_check module (explicit module path)
try:
    import spmpy.utils.env_check_v2025Dec_30_revised as env_check
except ImportError as e:
    raise RuntimeError(
        '[SPMpy] Failed to import env_check module.\n'
        'Reason: module file not found or misnamed.\n'
        'Action: verify file name and restart kernel.'
    ) from e


[Info] Package manager priority: mamba -> conda
[Info] Detected JupyterLab major version: 4
[Step 1] Widgets ...
[OK] ipywidgets already importable.
[OK] jupyterlab_widgets already importable.
[Info] JupyterLab >=4 detected: no lab build required.
[Step 2] Plotly ...
[OK] plotly already importable.
[Step 3] HoloViz stack ...
[OK] panel already importable.
[OK] holoviews already importable.
[OK] jupyter_bokeh already importable.
[Step 4] Stage-1 minimum scientific stack ...
[OK] numpy already importable.
[OK] xarray already importable.
[OK] matplotlib already importable.
[OK] scipy already importable.
[OK] pandas already importable.
[OK] skimage already importable.
[OK] xrft already importable.
[OK] hvplot already importable.
[OK] gwyfile already importable.
[OK] netCDF4 already importable.
[OK] h5netcdf already importable.
[OK] pptx already importable.
[OK] PyQt5 already importable.
[Step 5] Best-effort visualization backend config ...


[Config] Bokeh and HoloViews activated.

[Done] Base environment setup cell finished.
[OK] torch: 2.9.1+cpu
[OK] scikit-learn: 1.7.2

[SPMpy] ENV OK ✅  (Required deps satisfied)

[Action] You may proceed to Quickstart.



In [3]:
from dataclasses import dataclass

@dataclass
class EnvStatus:
    ok: bool = False
    needs_restart: bool = False
    inconclusive: bool = False
    missing: list | None = None

def interpret_env_check(env):
    status = EnvStatus()

    if hasattr(env, 'ENV_OK'):
        status.ok = bool(env.ENV_OK)
        status.needs_restart = bool(getattr(env, 'INSTALLED_NOW', False))
        status.missing = getattr(env, 'MISSING_REQUIRED', None)
        return status

    status.inconclusive = True
    return status

status = interpret_env_check(env_check)


## Stage 0 — Step 3: Decision & next action

Based on the diagnostic result, follow the instruction printed by the next cell.


## Stage 0 — Step 2: Run environment diagnostics

This checks whether required packages are installed and whether a kernel restart is needed.
The next cell will create a `status` object used by the decision step.


In [4]:
if status.ok and not status.needs_restart:
    print('[SPMpy] ✅ Environment ready.')
    print('[Next] Continue to Stage-1 below (Data Loading).')

elif status.ok and status.needs_restart:
    print('[SPMpy] ✅ Environment updated.')
    print('[Action] Restart the kernel, then re-run Stage-0 in this notebook.')

elif status.inconclusive:
    print('[SPMpy] ⚠ Environment status inconclusive.')
    print('[Action] Run the diagnostic notebook:')
    print('        notebooks/env_check_v_2025Dec_30_revised.ipynb')
    print('[Then] Return here, restart kernel if needed, and re-run Stage-0.')

else:
    print('[SPMpy] ❌ Environment not ready.')
    if status.missing:
        print('Missing packages:')
        for m in status.missing:
            print('  -', m)
    print('[Action] Fix the environment, restart kernel, then re-run Stage-0.')


[SPMpy] ✅ Environment ready.
[Next] Continue to Stage-1 below (Data Loading).


# Stage 1 — File Loading (SXM, 3DS)

Stage 1 loads **Nanonis files** and standardizes them into **`xarray.Dataset`** objects.

- **Stage 1.1:** 2D image data (`.sxm`) → `xarray.Dataset`
- **Stage 1.2 (planned):** grid data (`.3ds`) → `xarray.Dataset`

**Important:** Stage 1 performs *loading only* (no plane fit, no flattening, no filtering).
Processing functions will be organized separately under a data-processing module.


## Stage 1.1 — 2D Image Data Loading (`.sxm`)

### What you will do in this section
1. Select a working folder (GUI folder picker).
2. List files in the folder as a DataFrame (for reproducible selection).
3. Choose an `.sxm` file name from the table.
4. Load the file into an **`xarray.Dataset`** using `img2xr`.

### Why this workflow
This is intentionally designed to support future workflows where you load **multiple files** and build a dataset collection in a consistent way.


### Imports for Stage 1

In this Quickstart, the I/O logic is **not** defined inline.
Instead, we import the legacy-compatible I/O functions from the package:

- `select_folder()` — GUI folder picker
- `files_in_folder()` — folder inventory → DataFrame (**no `os.chdir()`**)
- `img2xr()` — `.sxm` → `xarray.Dataset`
- `grid2xr()` — `.3ds` → `xarray.Dataset` (used in Stage 1.2)

This keeps the Quickstart focused on workflow, while the implementation lives in `spmpy/io/`.


In [5]:
# I/O function set (paired .py lives in: spmpy/io/spmpy_io_library_v0_1.py)
from spmpy.io import spmpy_io_library_v0_1 as io

select_folder = io.select_folder
files_in_folder = io.files_in_folder
img2xr = io.img2xr
grid2xr = io.grid2xr


### Step 1 — Select a working folder

Run the next cell to pick a folder that contains your `.sxm` / `.3ds` files.


In [7]:
selected_folder = select_folder()
if selected_folder:
    print(f"Selected folder: {selected_folder}")
else:
    print("No folder selected.")


Selected folder: C:/Users/gkp/OneDrive - Oak Ridge National Laboratory/0_mK STM DATA/2025/20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook


### Step 2 — Inventory the folder as a DataFrame

This creates a DataFrame inventory so you can reproducibly select files by name.

**Note:** Because we do not use `os.chdir()`, the DataFrame includes a full `file_path` column.
Use `file_path` when loading files, and define an explicit `output_dir` when saving results later.


In [8]:
folder_path = selected_folder
print(f"Selected folder: {folder_path}")

files_df = files_in_folder(folder_path)
files_df

Selected folder: C:/Users/gkp/OneDrive - Oak Ridge National Laboratory/0_mK STM DATA/2025/20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook
Current Path = C:\Users\gkp\OneDrive - Oak Ridge National Laboratory\Research\Data Analysis (python)\SPMpy_ORNL
Target Folder = C:\Users\gkp\OneDrive - Oak Ridge National Laboratory\0_mK STM DATA\2025\20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook
sxm file groups: Fe5GeTe1_PtIr32_LHeT_5_400mT_x1_a20251016_10 : # of files = 3
sxm file groups: Fe5GeTe1_PtIr32_LHeT_7_POS100mT_x1_20251020_40 : # of files = 9
sxm file groups: Fe5GeTe1_PtIr32_LHeT_4_x1_a20251016_20 : # of files = 1
sxm file groups: Fe5GeTe1_PtIr32_LHeT_6_200mT_x1_20251019_30 : # of files = 1
sxm file groups: Fe5GeTe1_PtIr32_LHeT_6_200mT_x1_20251017_20 : # of files = 7
sxm file groups: Fe5GeTe1_PtIr32_LHeT_6_200mT_x1_20251018_20 : # of files = 8
sxm file groups: Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20 : # of files = 7
sxm file groups: Grid Spectroscopy_400mT_001_t : # of files = 1
sxm file g

Unnamed: 0,group,num,file_name,type,folder_path,file_path
0,Cu(111)_PtIr32_LHeT_2_x1_a20251016_20,001,Cu(111)_PtIr32_LHeT_2_x1_a20251016_20001.sxm,sxm,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
1,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40,001,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40001.sxm,sxm,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
2,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40,002,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40002.sxm,sxm,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
3,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40,003,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40003.sxm,sxm,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
4,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40,004,Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40004.sxm,sxm,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
...,...,...,...,...,...,...
113,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20003,,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20003...,gwy,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
114,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20004,,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20004...,gwy,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
115,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20005,,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20005...,gwy,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...
116,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20006,,Fe5GeTe1_PtIr32_LHeT_6_400mT_x1_20251017_20006...,gwy,C:\Users\gkp\OneDrive - Oak Ridge National Lab...,C:\Users\gkp\OneDrive - Oak Ridge National Lab...


### Step 3 — Select an `.sxm` file from the inventory

Pick a file name from the DataFrame. You can keep a list for future multi-file loading.


In [9]:
# List all SXM files
file_list = files_df[files_df.type=='sxm'].file_name
file_list

0          Cu(111)_PtIr32_LHeT_2_x1_a20251016_20001.sxm
1       Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40001.sxm
2       Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40002.sxm
3       Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40003.sxm
4       Fe5GeTe1_PtIr32_LHeT_0_0T_x1_20251020_40004.sxm
                            ...                        
85    Fe5GeTe1_PtIr32_LHeT_7_POS100mT_x1_20251020_40...
86    Fe5GeTe1_PtIr32_LHeT_7_POS100mT_x1_20251020_40...
87    Fe5GeTe1_PtIr32_LHeT_7_POS100mT_x1_20251020_40...
88    Fe5GeTe1_PtIr32_LHeT_7_POS100mT_x1_20251020_40...
89                 Grid Spectroscopy_400mT_001_topo.sxm
Name: file_name, Length: 90, dtype: object

In [10]:
# Choose one file (edit as needed)
sxm_name = file_list.iloc[0] if len(file_list) else None
sxm_name

'Cu(111)_PtIr32_LHeT_2_x1_a20251016_20001.sxm'

### Step 4 — Load the SXM file into an `xarray.Dataset`

No plotting is performed here. The returned `xarray.Dataset` is sufficient for validation.


In [11]:
from pathlib import Path

if sxm_name is None:
    raise RuntimeError('No .sxm files found in the selected folder.')

# Prefer explicit file_path if provided by files_in_folder()
if 'file_path' in files_df.columns:
    sxm_path = Path(files_df.loc[files_df.file_name == sxm_name, 'file_path'].iloc[0])
else:
    sxm_path = Path(folder_path) / sxm_name

print('[SPMpy] Loading:', sxm_path)

ds_sxm = img2xr(str(sxm_path), center_offset=False)
ds_sxm

[SPMpy] Loading: C:\Users\gkp\OneDrive - Oak Ridge National Laboratory\0_mK STM DATA\2025\20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook\Cu(111)_PtIr32_LHeT_2_x1_a20251016_20001.sxm


### Step 5 — Add experiment metadata (attrs)

SPMpy keeps experiment context in `Dataset.attrs`. Edit the values below to match your experiment.
These fields are user-defined and will be used later in analysis/plotting pipelines.


In [12]:
# Edit these values for your dataset
ds_sxm.attrs['tip'] = 'PtIr'
ds_sxm.attrs['sample'] = 'Cu(111)'
ds_sxm.attrs['ref_a0_nm'] = 0.255
ds_sxm.attrs['temperature'] = '4.35K'

# Example alternative (commented):
# ds_sxm.attrs['tip'] = 'Ni'
# ds_sxm.attrs['sample'] = 'FeTeSe'
# ds_sxm.attrs['ref_a0_nm'] = 0.384
# ds_sxm.attrs['temperature'] = '40mK'

ds_sxm

## End of Stage 1.1

At this point you have a **2D SXM image** loaded as an **`xarray.Dataset`**.

Next (planned):
- **Stage 1.2:** `.3ds` grid loading (`grid2xr`) into `xarray.Dataset`
- **Stage 2:** visualization and data-processing steps (plane fit / flattening) from a dedicated module


## Stage 1.2 — GridSpectroscopy (.3ds) loading

This section loads a Nanonis GridSpectroscopy file (`.3ds`) and converts it into an `xarray.Dataset`.

### What you will do
1. Select a `.3ds` file name from the folder inventory (`files_df`).
2. Load it with `grid2xr()` using an explicit `file_path`.
3. Add experiment metadata to `ds_grid.attrs`.

**Note:** This stage performs loading only. Processing (plane fit / flattening / filtering) belongs to a
dedicated data-processing module (Stage 2).


In [13]:
# Select one or more .3ds files from the inventory
file_list_3ds = files_df[files_df.type == '3ds'].file_name
file_list_3ds

90           Grid Spectroscopy_0T_001.3ds
91           Grid Spectroscopy_0T_002.3ds
92           Grid Spectroscopy_0T_003.3ds
93           Grid Spectroscopy_0T_004.3ds
94        Grid Spectroscopy_200mT_001.3ds
95        Grid Spectroscopy_200mT_002.3ds
96        Grid Spectroscopy_200mT_003.3ds
97        Grid Spectroscopy_400mT_001.3ds
98        Grid Spectroscopy_400mT_002.3ds
99     Grid Spectroscopy_Neg200mT_001.3ds
100    Grid Spectroscopy_Neg200mT_002.3ds
101    Grid Spectroscopy_POS100mT_001.3ds
102    Grid Spectroscopy_POS100mT_002.3ds
103    Grid Spectroscopy_POS100mT_003.3ds
104    Grid Spectroscopy_POS100mT_004.3ds
105    Grid Spectroscopy_POS100mT_005.3ds
Name: file_name, dtype: object

In [14]:
# Choose a single file for loading (edit as needed)
if len(file_list_3ds) == 0:
    raise RuntimeError('No .3ds files found in the selected folder.')

grid_name = file_list_3ds.iloc[0]
print('Selected .3ds file:', grid_name)

Selected .3ds file: Grid Spectroscopy_0T_001.3ds


In [15]:
from pathlib import Path

# Prefer explicit file_path if provided by files_in_folder()
if 'file_path' in files_df.columns:
    grid_path = Path(files_df.loc[files_df.file_name == grid_name, 'file_path'].iloc[0])
else:
    grid_path = Path(folder_path) / grid_name

print('[SPMpy] Loading:', grid_path)

ds_grid = grid2xr(str(grid_path))
ds_grid

[SPMpy] Loading: C:\Users\gkp\OneDrive - Oak Ridge National Laboratory\0_mK STM DATA\2025\20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook\Grid Spectroscopy_0T_001.3ds
C:\Users\gkp\OneDrive - Oak Ridge National Laboratory\0_mK STM DATA\2025\20251016 FeGeTe512_PtIr32_LHet_Bfield_Jewook\Grid Spectroscopy_0T_001
there is no [bwd] channel
No Segments
Grid data acquired at bias = -0.2V
start from NEG bias
Flip => start from POS bias
(160, 320, 41)
dim_px != dim_py
step_dx == step_dy
grid_xr step_dx, step_dy =  ["'Y': 160, 'X': 320, 'bias_mV': 41"]
tip material will be announced later
sample type will be announced later
temperature will be announced later


### Step — Add experiment metadata (attrs)

Edit the values below to match your experiment.
These fields are intentionally user-defined and will be used later in analysis/plotting pipelines.


In [None]:
# Edit these values for your grid dataset
ds_grid.attrs['tip'] = 'PtIr'
ds_grid.attrs['sample'] = 'Cu(111)'
ds_grid.attrs['ref_a0_nm'] = 0.255
ds_grid.attrs['temperature'] = '4.35K'

# Example alternative (commented):
# ds_grid.attrs['tip'] = 'Ni'
# ds_grid.attrs['sample'] = 'FeTeSe'
# ds_grid.attrs['ref_a0_nm'] = 0.384
# ds_grid.attrs['temperature'] = '40mK'

ds_grid

## End of Stage 1

At this point you have:

- `ds_sxm`: a 2D SXM image loaded as an `xarray.Dataset`
- `ds_grid`: a GridSpectroscopy dataset loaded as an `xarray.Dataset`

Next (planned):

- **Stage 2:** Visualization and data-processing steps (plane fit / flattening) from a dedicated module.
