![imaging2/4](https://img.shields.io/badge/imaging2/4-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/imaging2.ipynb)

# Generate single-cell images

Here, we are going to process the previously ingested microscopy images with the [scPortrait](https://github.com/MannLabs/scPortrait) pipeline to generate single-cell images that we can use to asses autophagosome formation at a single-cell level.

In [None]:
import lamindb as ln
from collections.abc import Iterable

from pathlib import Path
from scportrait.pipeline.extraction import HDF5CellExtraction
from scportrait.pipeline.project import Project
from scportrait.pipeline.segmentation.workflows import CytosolSegmentationCellpose

ln.track()

First, we query for the raw and annotated microscopy images.

In [None]:
input_images = (
    ln.Artifact.filter(ulabels__name="autophagy imaging")
    .filter(description__icontains="raw image")
    .filter(suffix=".tif")
)

The experiment contains different genotypes (`WT` and `EI24KO`) that were treated differently (`unstimulated` vs `14h Torin-1`).
For each condition, multiple clonal cell lines were imaged across multiple fields of view in all imaging channels.
We need to get single-cell images from each FOV indivdually and tag them with all of the appropriate metadata to identify genotype, treatment condition, clonal cell line and imaging experiment.

In [None]:
select_artifacts_df = (
    ln.Artifact.filter(ulabels__name="autophagy imaging")
    .filter(description__icontains="raw image")
    .df(features=True)
)
display(select_artifacts_df.head())

conditions = list(set.union(*select_artifacts_df["stimulation"].values))
cell_line_clones = list(set.union(*select_artifacts_df["cell_line_clone"].values))
FOVs = list(set.union(*select_artifacts_df["FOV"].values))

Alternatively, query for the {class}`~lamindb.ULabel` directly:

In [None]:
conditions = ln.ULabel.filter(
    links_artifact__feature__name="stimulation", artifacts__in=input_images
).distinct()
cell_line_clones = ln.ULabel.filter(
    links_artifact__feature__name="cell_line_clone", artifacts__in=input_images
).distinct()
FOVs = ln.ULabel.filter(
    links_artifact__feature__name="FOV", artifacts__in=input_images
).distinct()

By iterating through conditions, celllines and FOVs, we should only have the 3 images showing a single FOV to enable processing using ScPortrait.

In [None]:
# load config file for processing all datasets
config_file_af = ln.Artifact.using("scportrait/examples").get(
    key="input_data_imaging_usecase/config.yml"
)
config_file_af.description = (
    "config for scportrait for processing of cells stained for autophagy markers"
)
config_file_af.save()

# annotate the config file with the metadata relevant to the study
config_file_af.features.add_values(
    {"study": "autophagy imaging", "artifact type": "scportrait config"}
)

Let's take a look at the processing of one example FOV.

In [None]:
# get input images for one example FOV
condition = conditions[0]
cellline = cell_line_clones[0]
FOV = FOVs[0]

images = (
    input_images.filter(ulabels=condition)
    .filter(ulabels=cellline)
    .filter(ulabels=FOV)
    .distinct()
)

# Perform quick sanity check that we only have images which share all of their attributed except channel and imaged structure
_features = []
values_to_ignore = ["channel", "imaged structure"]

for af in images:
    features = af.features.get_values()
    features = {
        key: features[key] for key in features.keys() if key not in values_to_ignore
    }
    _features.append(features)
assert all(_features[0] == f for f in _features)
shared_features = _features[0]

# bring image paths into the correct order for processing
input_image_paths = [
    images.filter(ulabels__name=channel_name).one().cache()
    for channel_name in ["DAPI", "Alexa488", "mCherry"]
]

In [None]:
# define and create an output location for the processed data
output_directory = "processed_data"
Path(output_directory).mkdir(parents=True, exist_ok=True)

# initialize our scportrait project with a unique ID
unique_project_id = f"{shared_features['cell_line_clone']}/{shared_features['stimulation']}/{shared_features['FOV']}".replace(
    " ", "_"
)

# create the project location
project_location = f"{output_directory}/{unique_project_id}/scportrait_project"
Path(project_location).mkdir(parents=True, exist_ok=True)

# initialize the project
project = Project(
    project_location=project_location,
    config_path=config_file_af.cache(),
    segmentation_f=CytosolSegmentationCellpose,
    extraction_f=HDF5CellExtraction,
    overwrite=True,
)

# load our input images
project.load_input_from_tif_files(
    input_image_paths, overwrite=True, channel_names=["DAPI", "Alexa488", "mCherry"]
)

# process the project
project.segment()
project.extract()

First, lets look at the input images we processed.

In [None]:
project.plot_input_image()

Now we can look at the results generated by scPortrait. First the segmentation masks.

In [None]:
project.plot_segmentation_masks()

And then extraction results consisting of individual single-cell images over all of the channels.

In [None]:
project.plot_single_cell_images()

Now we also want to save these results to our instance. 

In [None]:
ln.Artifact.from_spatialdata(
    sdata=project.filehandler.get_sdata(),
    description="scportrait spatialdata object containing results of cells stained for autophagy markers",
    key=f"processed_data_imaging_use_case/{unique_project_id}/spatialdata.zarr",
).save()

In [None]:
# define var schema
var_schema = ln.Schema(
    name="single-cell image dataset schema var",
    description="column schema for data measured in obsm[single_cell_images]",
    itype=ln.Feature,
    dtype=float,
).save()

# define obs schema
obs_schema = ln.Schema(
    name="single-cell image dataset schema obs",
    features=[
        ln.Feature(name="scportrait_cell_id", dtype="int", coerce_dtype=True).save(),
    ],
).save()

# define uns schema
uns_schema = ln.Schema(
    name="single-cell image dataset schema uns",
    itype=ln.Feature,
    dtype=dict,
).save()

# define composite schema
h5sc_schema = ln.Schema(
    name="single-cell image dataset",
    otype="AnnData",
    slots={"var": var_schema, "obs": obs_schema, "uns": uns_schema},
).save()

# curate an AnnData
curator = ln.curators.AnnDataCurator(project.h5sc, h5sc_schema)
curator.validate()
artifact = curator.save_artifact(
    key=f"processed_data_imaging_use_case/{unique_project_id}/single_cell_data.h5ad"
)

# add shared annotation
annotation = shared_features.copy()
annotation["imaged structure"] = [
    ln.ULabel.using("scportrait/examples").get(name=structure_name)
    for structure_name in ["LckLip-mNeon", "DNA", "mCherry-LC3B"]
]

artifact.features.add_values(annotation)
artifact.labels.add(ln.ULabel(name="scportrait single-cell images").save())

To make our lives easier so that we can process all of the files in our dataset we are going to write a custom image processing function.
We decorate this function with :func:`~lamindb.tracked` to track data lineage of the input and output Artifacts. To improve processing time we will only recompute datasets that have not been previously processed and uploaded to our instance.

In [None]:
@ln.tracked()
def _process_images(
    config_file_af: ln.Artifact,
    input_artifacts: Iterable[ln.Artifact],
    h5sc_schema: ln.Schema,
    output_directory: str,
) -> None:
    # Perform quick sanity check that we only have images which share all of their attributes except channel and imaged structure
    _features = []
    values_to_ignore = ["channel", "imaged structure"]

    for af in input_artifacts:
        features = af.features.get_values()
        features = {
            key: features[key] for key in features.keys() if key not in values_to_ignore
        }
        _features.append(features)
    assert all(_features[0] == f for f in _features)
    shared_features = _features[0]

    # create a unique identifier for the project based on the annotated features
    unique_project_id = f"{shared_features['cell_line_clone']}/{shared_features['stimulation']}/{shared_features['FOV']}".replace(
        " ", "_"
    )

    # check if processed results already exist and if so skip processing
    try:
        # check for single-cell images
        (
            ln.Artifact.using("scportrait/examples").get(
                key=f"processed_data_imaging_use_case/{unique_project_id}/single_cell_data.h5ad"
            )
        )

        # check for SpatialData object
        (
            ln.Artifact.using("scportrait/examples").get(
                key=f"processed_data_imaging_use_case/{unique_project_id}/spatialdata.zarr"
            )
        )
        print(
            "Dataset already processed and results uploaded to instance. Skipping processing."
        )
    except ln.Artifact.DoesNotExist:
        input_image_paths = [
            input_artifacts.filter(ulabels__name=channel_name).one().cache()
            for channel_name in ["DAPI", "Alexa488", "mCherry"]
        ]

        # create the project location
        project_location = f"{output_directory}/{unique_project_id}/scportrait_project"
        Path(project_location).mkdir(parents=True, exist_ok=True)

        project = Project(
            project_location=project_location,
            config_path=config_file_af.cache(),
            segmentation_f=CytosolSegmentationCellpose,
            extraction_f=HDF5CellExtraction,
            overwrite=True,
        )

        # process the project
        project.load_input_from_tif_files(
            input_image_paths,
            overwrite=True,
            channel_names=["DAPI", "Alexa488", "mCherry"],
        )
        project.segment()
        project.extract()

        # ingest results to instance
        # single-cell images
        curator = ln.curators.AnnDataCurator(project.h5sc, h5sc_schema)
        artifact = curator.save_artifact(
            key=f"processed_data_imaging_use_case/{unique_project_id}/single_cell_data.h5ad"
        )
        annotation = shared_features.copy()
        annotation["imaged structure"] = [
            ln.ULabel.using("scportrait/examples").get(name=structure_name)
            for structure_name in ["LckLip-mNeon", "DNA", "mCherry-LC3B"]
        ]
        artifact.features.add_values(annotation)
        artifact.labels.add(ln.ULabel.get(name="scportrait single-cell images"))

        # SpatialData object
        ln.Artifact.from_spatialdata(
            sdata=project.filehandler.get_sdata(),
            description="scportrait spatialdata object containing results of cells stained for autophagy markers",
            key=f"processed_data_imaging_use_case/{unique_project_id}/spatialdata.zarr",
        ).save()

    return None

In [None]:
ln.Param(name="output_directory", dtype="str").save()

Now we are ready to process all of our input images and upload the generated single-cell image datasets back to our instance.

In [None]:
for condition in conditions:
    for cellline in cell_line_clones:
        for FOV in FOVs:
            images = (
                input_images.filter(ulabels=condition)
                .filter(ulabels=cellline)
                .filter(ulabels=FOV)
                .distinct()
            )

            if images:
                _process_images(
                    config_file_af,
                    input_artifacts=images,
                    h5sc_schema=h5sc_schema,
                    output_directory=output_directory,
                )

In [None]:
example_artifact = ln.Artifact.filter(
    ulabels=ln.ULabel.get(name="scportrait single-cell images")
)[0]
example_artifact.view_lineage()

In [None]:
ln.finish()