# Imports

In [None]:
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.feather as feather
import zarr
import dask
from dask import delayed
import distributed
from distributed import Client, LocalCluster, progress
from dask_jobqueue import SLURMCluster
import streamz
import streamz.dataframe as sdf
import holoviews as hv
from holoviews.streams import Stream, param, Selection1D
from holoviews.operation.datashader import regrid
from bokeh.models.tools import HoverTool, TapTool
import matplotlib.pyplot as plt
import qgrid
import ipywidgets as widgets
from tqdm import tnrange, tqdm, tqdm_notebook
import warnings
from functools import partial
from cytoolz import *
from operator import getitem
import nd2reader
from importlib import reload
import traceback
import hvplot.pandas
import cachetools
from collections import namedtuple, defaultdict
from collections.abc import Mapping, Sequence
from numbers import Number
import skimage.morphology
import scipy
from glob import glob
import os
import asyncio
from IPython.display import Video

IDX = pd.IndexSlice

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from paulssonlab.image_analysis import *

# Loading data

In [None]:
# nd2_filenames = ["/home/jqs1/scratch/jqs1/microscopy/211117/211117_long_oscillator.nd2"]
# nd2_filenames = ["/n/standby/hms/sysbio/paulsson/collaborations/Personal_Folders/!!Jacob Quinn Shenker/Standby/180928/CapturedRFP_giant snake.nd2"]
nd2_filenames = ["/home/jqs1/scratch/jqs1/microscopy/220718/RBS_DEG_library_20x.nd2"]

In [None]:
all_frames, metadata = workflow.get_nd2_frame_list(nd2_filenames)
image_limits = workflow.get_filename_image_limits(metadata)

`all_frames` lists each exposure (keyed by filename/position/channel/timepoint). `image_limits` is a dict giving *inclusive* image bounds `((x_min, x_max), (y_min, y_max))` for each input image filename. The reason both of these outputs are keyed by filename (and why `workflow.get_nd2_frame_list` takes a list of images) is that we want to support the use case where image acquisition is stopped and restarted one or more times.

In [None]:
image_limits

In [None]:
all_frames

# Config

In [None]:
cluster = SLURMCluster(
    queue="short",
    walltime="06:00:00",
    memory="20GB",
    local_directory="/tmp",
    log_directory="/home/jqs1/log",
    cores=1,
    processes=1,
)
client = Client(cluster)

In [None]:
cluster.scale(0)

In [None]:
cluster.adapt(maximum=300)

In [None]:
cluster

# Reload

I run this code block when I make code changes and want my dask workers to reflect these changes. It would be nice if there were a way for this to happen automatically, á là Jupyter autoreload magic.

In [None]:
def do_reload():
    from importlib import reload
    from paulssonlab.image_analysis import util

    reload(util)
    # reload(trench_detection.hough)
    # reload(diagnostics)
    # reload(workflow)
    # reload(image)


client.run(do_reload)
do_reload()

# Trench detection

The next two cells set up a clunky way to browse through an imaging dataset. For example, we can browse to try to find frames that look weird (e.g., a speck of dust); we can then test that trench detection works correctly on them.

In [None]:
FrameStream = ui.MultiIndexStream.define("FrameStream", all_frames.index)
frame_stream = FrameStream()
box = ui.dataframe_browser(frame_stream)
frame_stream.event()
box

In [None]:
ui.image_viewer(frame_stream).opts(frame_width=500, frame_height=300)

In [None]:
%%time
key = frame_stream.contents
frame = workflow.get_nd2_frame(**key)
find_trenches_diag = diagnostics.wrap_diagnostics(
    trench_detection.find_trenches, ignore_exceptions=True, pandas=True
)
trench_points, trench_diag, trench_err = find_trenches_diag(frame)

In [None]:
#%%output size=150
ui.show_plot_browser(trench_diag["label_2"]);

# Data reduction

Here we use `pd.IndexSlice` (imported as `IDX`) to select the first three positions and first ten timepoints. This is not the most intuitive syntax (it's easy to forget the order of the keys in the `IDX` expression). Something like `all_frames.select(positions=slice(3), timepoints=slice(10))` might be slightly more intuitive, although that's still a little clunky.

In [None]:
all_frames.index

In [None]:
# selected_frames = all_frames.loc[IDX[:, :5, :, [0,50,100]], :]
selected_frames = all_frames.loc[IDX[:, :, :, :], :]

## New trench detection+segmentation+analysis

#### Config

Here we specify which channel we're using as the input to the segmentation function.

In [None]:
segmentation_channel = "RFP-Penta"
measure_channels = ["YFP-DUAL", "RFP-Penta"]

`filter_trenches` expects the output (dataframe) from trench detection and returns a filtered dataframe including only the trenches that are suitable for downstream processing. For every FOV, trench detection is run for the first timepoint of every FOV, then passed through `filter_trenches`, and then that list of trenches for that FOV is used to crop each trench for all timepoints. A slightly better way for this to work is to wait until trench cropping is done for all FOVs, send the dataframe of all trenches for **all** FOVs to `filter_trenches`, so that `filter_trenches` can look at parameters like pitch and rotation angle and throw out FOVs where the trench detection produced pitch/angle far away from the median pitch/angle. The reason I initially didn't implement it that way is that this way all downstream processing (e.g., segmentation) is stalled until all FOVs have their trench detection finished. In practice this is probably fine. One way to get the best-of-both-worlds might be to start segmentation for each FOV as soon as trench detection is done, but as soon as trench detection is done for all FOVs, you cancel in-flight and completed segmentation tasks for the trenches that you no longer want because they were filtered out.

In [None]:
def filter_trenches(trenches):
    return trenches
    # pitch = 32 # (pixels) here we hard-code the correct pitch
    # # so throw out positions with detected pitch more than 1 pixel away from this
    # # a better way to do this is to look at the median pitch of all positions and use that
    # # as the ground truth instead
    # if trenches is None:
    #     return None
    # good_trenches = trenches[
    #     (
    #         (
    #             trenches[("diag", "find_trench_lines.hough_2.peak_func.pitch")] - pitch
    #         ).abs()
    #         <= 1
    #     )
    #     & (~trenches[("upper_left", "x")].isnull())
    # ]
    # TODO: filter based on minimum trench length
    # return good_trenches

Below is a simpler `_measurement_func`. (Either evaluate the above or the below cell.) `_measurement_func` is passed in to `_measure` (below) and provides the user with a way to customize how they want each channel to be measured. `_measurement_func` returns a dict with keys "framewise," "trenchwise," and "labelwise," corresponding to dataframes of measurements made at three levels of granularity. "Framewise" measurements represent quantities that are computed from all pixels of an FOV; trenchwise measurements represent quantities (like average fluorescence, or some measure of sharpness/focus) for unsegmented trench crops; labelwise measurements represent quantities that are computed per cell mask. For labelwise measurements, `_measurement_func` is called once with `intensity_image=None`; in this case, it returns a dict with key "mask_labelwise"; the corresponding value is a dataframe containing measurements that depend only on the segmentation mask and not on any intensity image (in this case, a measurement called "size", which is the number of pixels in each label).

In [None]:
nd2 = nd2reader.ND2Reader(
    "/home/jqs1/scratch/jqs1/microscopy/220718/RBS_DEG_library_20x.nd2"
)
img = nd2.get_frame_2D(v=50, t=50, c=1)
img_crop = img[:500, :500]

In [None]:
plt.imshow(img_crop)

In [None]:
img_crop_labels = trench_segmentation.segment(img_crop)

In [None]:
plt.imshow(img_crop_labels)

In [None]:
_measurement_func(img_crop_labels, None)["mask_labelwise"]

In [None]:
_measurement_func(img_crop_labels, None)["mask_labelwise"]

In [None]:
_measurement_func(img_crop_labels, img_crop)["labelwise"]  # ["mask_labelwise"]

In [None]:
pixelwise_funcs = {"mean": np.mean, "sum": np.sum}
# trenchwise_funcs = {"sharpness": image.sharpness, **pixelwise_funcs}
# trenchwise_funcs = {}


def _measurement_func(label_image, intensity_image):
    if intensity_image is None:
        if label_image is None:
            return None  # can't measure anything
        mask_labelwise_df = pd.DataFrame(
            skimage.measure.regionprops_table(
                label_image,
                properties=(
                    "label",
                    "area",
                    "axis_major_length",
                    "axis_minor_length",
                    "orientation",
                    "centroid",
                ),
            ),
        ).set_index("label")
        return dict(mask_labelwise=mask_labelwise_df)
    # trenchwise_df = workflow.map_frame(trenchwise_funcs, intensity_image)
    # res = dict(trenchwise=trenchwise_df)
    res = {}
    if label_image is None:
        return res  # only measure trenchwise
    labelwise_df = workflow.map_frame_over_labels(
        pixelwise_funcs, label_image, intensity_image
    )
    # labelwise_df = pd.DataFrame(
    #     skimage.measure.regionprops_table(
    #         label_image,
    #         intensity_image,
    #         properties=("label", "intensity_mean"),
    #     ),
    # ).set_index("label")
    res["labelwise"] = labelwise_df
    return res  # measure trenchwise and labelwise

#### Boilerplate

By modifying the above functions, users can customize how they want to process the images (although 60% of experiments might use the same basic configuration: segment in RFP channel and measure mean fluorescences in RFP/YFP/CFP channels, something that looks like the simple `_measurement_func` directly above). Below is a bunch of boilerplate that probably doesn't need to be customized per-experiment. It should be cleaned up a lot and put in .py files in a package.

`processing._get_trench_crops` returns a dict of trench crops keyed by the trench index, plus the additional key `"_frame"` which is the full frame (used for "framewise" measurements, discussed above).

In [None]:
def _measure(
    trenches,
    frames,
    measurement_func,
    segmentation_channel=segmentation_channel,
    measure_channels=None,
    segmentation_func=trench_segmentation.watershed.segment,
    include_frame=True,
    frame_bits=8,
    frame_downsample=4,
    filename=None,
    position=None,
):
    frame_transformation = compose(
        processing.zarrify,
        partial(image.quantize, bits=frame_bits),
        partial(image.downsample, factor=frame_downsample),
    )
    trench_crops = processing._get_trench_crops(
        trenches,
        frames,
        include_frame=include_frame,
        frame_transformation=frame_transformation,
        filename=filename,
        position=position,
    )
    res = {}
    segmentation_masks = {}
    measurements = {}
    # segment
    for trench_set, crops_trench_channel_t in trench_crops.items():
        if trench_set == "_frame":
            continue
        for trench_idx, crops_channel_t in crops_trench_channel_t.items():
            for channel, crops_t in crops_channel_t.items():
                for t, crop in crops_t.items():
                    if measure_channels is not None and channel not in measure_channels:
                        continue
                    segmentation_key = (trench_set, trench_idx, segmentation_channel, t)
                    segmentation_mask = segmentation_masks.get(segmentation_key, None)
                    if segmentation_mask is None and segmentation_func is not None:
                        segmentation_mask = segmentation_func(
                            trench_crops[trench_set][trench_idx][segmentation_channel][
                                t
                            ]
                        )
                        segmentation_masks[segmentation_key] = segmentation_mask
                        # measure mask
                        if measurement_func is not None:
                            measurements[
                                ("mask", (trench_set, trench_idx, t))
                            ] = measurement_func(segmentation_mask, None)
                    # measure
                    if measurement_func is not None:
                        measurements[
                            (channel, (trench_set, trench_idx, t))
                        ] = measurement_func(segmentation_mask, crop)
    if measurement_func is not None:
        measurement_dfs = util.map_dict_levels(
            lambda k: (k[1], k[0], *k[2:]), measurements
        )
        for name, dfs in measurement_dfs.items():
            dfs = util.unflatten_dict(dfs)
            if isinstance(util.get_one(dfs, level=2), pd.Series):
                df = pd.concat(
                    {
                        channel: pd.concat(channel_dfs, axis=1).T
                        for channel, channel_dfs in dfs.items()
                    },
                    axis=1,
                )
            else:
                df = pd.concat(
                    {
                        channel: pd.concat(channel_dfs, axis=0)
                        for channel, channel_dfs in dfs.items()
                    },
                    axis=1,
                )
            df.index.names = ["trench_set", "trench", "t", *df.index.names[3:]]
            measurement_dfs[name] = df
        res["measurements"] = measurement_dfs
    images = dict(raw=trench_crops)
    if segmentation_func is not None:
        images["segmentation"] = util.unflatten_dict(segmentation_masks)
    res["images"] = images
    return res


measure = processing.iterate_over_groupby(["filename", "position"])(_measure)

This function is used to generate filenames for pipeline output files. `kind` is either `"measurements"` (tabular output in parquet format) or `"images"` (image output in zarr format). By default, these outputs are saved in the same directory as the input ND2 file and have the same basename with additional suffixes. E.g., the input file `211117_long_oscillator.nd2` will have outputs `211117_long_oscillator.nd2.images` (folder of zarrs), `211117_long_oscillator.nd2.measurements` (folder of parquet files), `211117_long_oscillator.nd2.trenches.parquet` (the dataframe of trench bounding boxes), etc. I added the `extra` kwarg just so it's easy to rerun the pipeline multiple times and get different output filenames.

In [None]:
def filename_func(
    extension=None, kind=None, name=None, filename=None, position=None, extra="full"
):
    if kind and extra:
        kind = f"{extra}.{kind}"
    components = [s for s in ("", name, extension) if s is not None]
    if position is None:
        path = [f"{filename}.{kind}" + ".".join(components)]
    else:
        path = [f"{filename}.{kind}", "pos{:d}".format(position) + ".".join(components)]
    return os.path.join(*path)

Below is a morass of boilerplate. Two functions which need a bit more explanation:
- `_trench_diag_to_dataframe`: takes a nested dict (a `tree()` object passed to the `diagnostics=` kwarg of `trench_detection.find_trenches`), pulls out the scalar values in the leaves of the nested dict tree (dropping the holoviews plots, etc.), and turns it into a dataframe. The heavy lifting is done by `diagnostics.expand_diagnostics_by_label`, which is a truly ugly function—I owe you some documentation (once I remember myself how it works), but honestly, it should probably just be jettisoned because there has to be a better way of doing all of this. But the basic purpose of `expand_diagnostics_by_label` is to take a dataframe with columns like `label_1.X` and `label_2.X` and turn it into a dataframe with only the column `X` but with an additional level in the multiindex called `label` (and the single row is split into two rows, one for `label=1` and another for `label=2`).
- `find_trenches_diag`: this is just `trench_detection.find_trenches` wrapped by `diagnostics.wrap_diagnostics`. `wrap_diagnostics` automatically passes in a `tree()` object to `diagnostics=` and turns the resulting nested dict into a pandas series, which it returns in a tuple `(result, diag, err)`.
- `_trench_info_to_dataframe`: this unpacks the tuple returned by `find_trenches_diag`, uses `_trench_diag_to_dataframe` to `pd.concat` the dataframe of trench bounding boxes (`trench_points`, which is the return value of `trench_detection.find_trenches`) together with this additional dataframe of debugging information that was extracted from diagnostics.

A quick note: the purpose of the diagnostics kwarg was to enable storing additional information only when a user wanted to manually request it, but never to do so during production operation of the pipeline. Here, just for the trench detection step (not for segmentation), we *are* storing diagnostics for trench finding. The rationale being that if trench detection fails, we can pickle the diagnostics object to disk, and the user can `pickle.load` it and immediately see what's going on. One downside is we're incurring extra processing and memory usage during trench detection, because for positions where trench detection is successful, we store a bunch of holoviews plots only to throw them away as soon as trench detection completes successfully. (Because trench detection is pretty fast compared to segmentation, this maybe isn't a huge problem.) I think a better way to do this would be to run trench detection *without* storing diagnostics, and provide a option to automatically re-run trench detection/segmentation/tracking/etc. steps of the pipeline with diagnostics enabled if any of those steps fail. I think part of the reason I initially implemented it like this is I wasn't sure which of the scalar diagnostics metrics were useful to filter on in `filter_trenches`, which can currently access all of them. Under this new design, any metrics we want to filter on would have to be included with in the dataframe that's returned by `trench_detection.find_trenches`.

In [None]:
def _trench_diag_to_dataframe(trench_diag, sep="."):
    df = trench_diag.to_frame().T
    expanded_df = diagnostics.expand_diagnostics_by_label(df)
    expanded_df.index = expanded_df.index.droplevel(0)
    expanded_df.index.names = [*expanded_df.index.names[:-1], "trench_set"]
    return expanded_df


#     if len(expanded_df):
#         expanded_df.index = expanded_df.index.droplevel(0)
#         expanded_df.index.names = [*expanded_df.index.names[:-1], 'trench_set']
#     else:
#         expanded_df = pd.concat([df], keys=[-1], names=['trench_set'])
#     return expanded_df


def _trench_info_to_dataframe(trench_info):
    trench_points, trench_diag, trench_err = trench_info
    if trench_err is not None:
        # TODO: write trench_err
        return None
    trench_diag = _trench_diag_to_dataframe(trench_info[1])
    # FROM: https://stackoverflow.com/questions/14744068/prepend-a-level-to-a-pandas-multiindex
    trench_diag = pd.concat([trench_diag], axis=1, keys=["diag"])
    trenches = pd.concat(
        [trench_points, util.multi_join(trench_info[0].index, trench_diag)], axis=1
    )
    return trenches


def _trenches_to_bboxes(trenches, image_limits):
    trench_bboxes = workflow.get_trench_bboxes(trenches, image_limits)
    if trench_bboxes is not None:
        trenches = pd.concat([trenches, trench_bboxes], axis=1)
    return trenches


find_trenches_diag = diagnostics.wrap_diagnostics(
    trench_detection.find_trenches, ignore_exceptions=True, pandas=True
)


def do_find_trenches(*key):
    frame = workflow.get_nd2_frame(*key)
    trench_info = find_trenches_diag(frame)
    return trench_info


def do_trenches_to_bboxes(trench_info, key=None, index_names=("filename", "position")):
    trenches = _trench_info_to_dataframe(trench_info)
    if trenches is None:
        return None
    if key is not None:
        trenches = pd.concat([trenches], names=index_names, keys=[key])
    trenches = _trenches_to_bboxes(trenches, image_limits=image_limits)
    return trenches


def do_get_trench_err(trench_info):
    trench_points, trench_diag, trench_err = trench_info
    if trench_err is None:
        return None
    if trench_points is not None:
        raise ValueError("expecting trench_points to be None")
    return trench_info


import pickle


def do_serialize_to_disk(
    data, filename, overwrite=True, skip_nones=True, format="pickle"
):
    if skip_nones:
        data = {k: v for k, v in data.items() if v is not None}
    if not overwrite and os.path.exists(filename):
        raise FileExistsError
    with open(filename, "wb") as f:
        if format == "arrow":
            buf = pa.serialize(data).to_buffer()
            f.write(buf)
        elif format == "pickle":
            pickle.dump(data, f)
    return data


def do_save_trenches(trenches, filename, overwrite=True):
    trenches = pd.concat(trenches)
    processing.write_dataframe_to_parquet(
        filename, trenches, merge=False, overwrite=overwrite
    )
    return trenches


def do_measure_and_write(
    trenches,
    frames,
    return_none=True,
    write=True,
    filename_func=filename_func,
    **kwargs
):
    if trenches is None:
        return None
    trenches = filter_trenches(trenches)
    res = measure(trenches, frames, **kwargs)
    if write:
        processing.write_images_and_measurements(
            res,
            filename_func=filename_func,
            dataframe_format="parquet",
            write_images=True,
            write_measurements=True,
        )
    if return_none:
        return None
    else:
        return res

#### Execute

Here is where we actually run the pipeline. Tasks are submitted using the dask futures API. Initially I had implemented this so every trench was its own dask task; this resulted in millions of small tasks and the dask scheduler choked on them at the time. I imagine with all the recent improvements in the dask scheduler that approach would work somewhat better now. In the current implementation, tasks are run at the granularity of `(filename, position)` pairs. Each `(filename, position)` pair gets a task (`do_measure_and_write`). `do_measure_and_write` filters the trenches according to `filter_trenches` (specified above), and passes them to `measure` (described above).

`do_serialize_to_disk`: this function is designed to take a dask task as input. If the input is non-None, it pickles the input and writes to disk. Here, it is used in combination with `do_get_trench_err` to write a tuple containing the `diagnostics` object and exception to disk for any positions where `do_find_trenches` fails (so the user can pickle.load those files and figure out why that position was failing in the trench detection step).

In [None]:
save_trench_err_futures = {}
all_analysis_futures = {}
save_trenches_futures = {}
save_trench_err_futures = {}

all_trench_bboxes_futures = {}  # TODO: just for debugging

for filename, filename_frames in selected_frames.groupby("filename"):
    # analysis_futures = {}
    trench_bboxes_futures = {}
    trench_err_futures = {}
    for position, frames in filename_frames.groupby("position"):
        key = (filename, position)
        frame_to_segment = frames.loc[IDX[:, :, [segmentation_channel], 0], :]
        trenches_future = client.submit(
            do_find_trenches, *frame_to_segment.index[0], priority=10
        )
        trench_err_futures[key] = client.submit(do_get_trench_err, trenches_future)
        trench_bboxes_future = client.submit(
            do_trenches_to_bboxes, trenches_future, (filename, position), priority=10
        )
        trench_bboxes_futures[key] = trench_bboxes_future
        all_trench_bboxes_futures[key] = trench_bboxes_future
        analysis_future = client.submit(
            do_measure_and_write,
            trench_bboxes_future,
            frames,
            measurement_func=_measurement_func,
            # measurement_func=None,
            # segmentation_func=None,
            measure_channels=measure_channels,
            segmentation_channel=segmentation_channel,
            return_none=True,
            write=True,
            filename_func=filename_func,
        )
        all_analysis_futures[key] = analysis_future
    # save trenches
    trenches_filename = filename_func(
        kind="trenches", extension="parquet", filename=filename
    )
    save_trenches_futures[filename] = client.submit(
        do_save_trenches,
        list(dict(sorted(trench_bboxes_futures.items())).values()),
        trenches_filename,
    )
    trench_errs_filename = filename_func(
        kind="trench_errs", extension="pickle", filename=filename
    )
    save_trench_err_futures[filename] = client.submit(
        do_serialize_to_disk,
        trench_err_futures,
        trench_errs_filename,
    )

In [None]:
save_trench_err_futures

I use the utility function `util.apply_map_futures` to filter a (potentially nested) dict of dask futures and pull out only those that have errored, and gather them. This essentially reraises exceptions for failed dask tasks so that you know what went wrong when debugging interactively.

In [None]:
util.apply_map_futures(
    client.gather, all_analysis_futures, predicate=lambda x: x.status == "error"
)

In [None]:
client.restart()

# Analysis

`211117_long_oscillator.nd2.trenches.parquet` is the dataframe that is produced by `find_trenches_diag`, listing all the trenches for each position and a bunch of intermediate metrics that `trench_detection.find_trenches` spits out.

In [None]:
trench_data = pa.parquet.read_pandas(
    "/home/jqs1/scratch/jqs1/microscopy/211117/211117_long_oscillator.nd2.trenches.parquet"
).to_pandas()

In [None]:
trench_data

In [None]:
trench_data[("diag", "find_trench_lines.hough_2.peak_func.pitch")]

Because they have different columns in the index, we store framewise, trenchwise, and labelwise measurements in separate parquet files. Originally I tried having the main node (running the dask scheduler and Jupyter) gather dask tasks and write them to parquet files (one parquet file for all positions); but even trying to be clever and use streamz and asyncio (an amateurish effort on my part, to be sure) having a single node do the parquet writing just couldn't keep up. In the current architecture, each position gets its own set of three parquet files; and a filesystem lockfile is used to ensure that only one dask task is writing to any given position at a time. When the pipeline is running in batch mode (all timepoints are available for processing immediately), all timepoints are measured simultaneously inside `do_measure_and_write`, so there cannot be any lock conflicts. When the pipeline is running in real-time mode, processing each position's timepoint as it comes off the microscope, each position is written to ~7min apart (which is the spacing of the timepoints in a typical experiment), so again the chance of a lock conflict is essentially zero. One notable downside of the current design in real-time mode is that for writing image data timepoint-by-timepoint, for each write, the current zarr chunk is entirely read into memory, the new timepoint is added, the resulting chunk is recompressed and written to disk. Because the trench image cubes are so small, each trench's array is currently stored as single chunks. As such, you're essentially reading and rewriting all cropped image data dozens or hundreds of times over the course of the experiment, which is extremely wasteful. You are amortizing this over the entire 24hr+ length of an experiment, so maybe it's not that bad. I have been trying to come up with a more elegant way to do this and have added some notes to the google doc. (To be clear, I never fully implemented real-time mode but when designing how output was stored and written was trying to anticipate what would work in real-time mode.) When `processing.write_images_and_measurements` 

In [None]:
%%time
labelwise_df = pa.parquet.read_pandas(
    "/home/jqs1/scratch/jqs1/microscopy/220704/220704rbs_library_fish.nd2.test4.measurements/pos0.labelwise.parquet"
).to_pandas()

In [None]:
labelwise_df.columns = ["/".join(col).strip() for col in labelwise_df.columns.values]

In [None]:
labelwise_df