# Process Images with scPortrait to generate single-cell image datasets

In this notebook we are going to process the previously uploaded images with the [scPortrait](https://github.com/MannLabs/scPortrait) library to generate single-cell images we can use to asses autophagosome formation at a single-cell level.

In [None]:
import lamindb as ln
import os
import h5py
from scportrait.pipeline.extraction import HDF5CellExtraction
from scportrait.pipeline.project import Project
from scportrait.pipeline.segmentation.workflows import CytosolSegmentationCellpose

ln.track()

First lets find all of the images we want to process in our lamindb instance. To do this we are going to use a query call to get all artifacts that belong to our study name and that are tagged with the label "input images". This will ensure that as soon as we have also saved intermediate results into our lamindb instance our code will still work and we will only process the input tif files.

In [None]:
# get all input data that is to be processed
study = ln.ULabel.get(name="autophagy imaging")
input_images = ln.ULabel.get(name="input images")

input_images = ln.Artifact.filter(ulabels = study).filter(ulabels = input_images).filter(suffix = ".tif")

Now we know that in our experiment we have imaged different genotypes (WT and EI24KO), that were treated differently (unstimulated vs 14h Torin-1). For each condition we had multiple clonal cell lines and imaged multiple FOVs in all of the imaging channels. To properly process this dataset we will need to get single-cell images from each FOV indivdually and tag them with all of the appropriate metadata so that we can identify genotype, treatment condition, clonal cell line and imaging experiment. 

In [None]:
conditions = [ln.ULabel.get(name=x) for x in set(a.features.get_values()['stimulation'] for a in input_images)]
cell_line_clones = [ln.ULabel.get(name=x) for x in set(a.features.get_values()['cell_line_clone'] for a in input_images)]
FOVs = [ln.ULabel.get(name=x) for x in set(a.features.get_values()['FOV'] for a in input_images)]

In [None]:
input_images[0].features.get_values()

#would be nice to be able to do something equivalent to
#input_images.features.get_values()

In [None]:
#this is not working as expected -> should have a lot more features
ln.Artifact.filter(ulabels = study).df(features = True)

In [None]:
images.df()

If we now iterate through conditions, celllines and FOVs and get all the input images assigned to a unique combination of these keys we should only have the images showing a single FOV in the 3 imaged channels. 

In [None]:
number_of_channels = 3
for condition in conditions:
    for cellline in cell_line_clones:
        for FOV in FOVs:
            images = input_images.filter(ulabels = condition).filter(ulabels = cellline).filter(ulabels = FOV).distinct()
            assert len(images) == number_of_channels

Now that we know which image files belong together we want to process them. To be able to gain biological insight from this data we want to process all of these individual FOVs in a consistent manner.

To do that we will first upload our common config file containing the processing parameters to our lamindb instance and then define a transform function to process each FOV individually. 

In [None]:
#load config file for processing all datasets
config_path = ln.Artifact.get(key = "input_data/config.yml").load().path
config_file = ln.Artifact(config_path, 
                         description="config for scportrait for processing of cells stained for autophagy markers",
                        ).save()

#annotate the config file with the metadata relevant to the study
config_file.features.add_values(
    {
        "study": "autophagy imaging",
        "artefact type": "scportrait config"
    }
)

If we needed to find this config later on in our lamindb instance we could again query for it. Using `.one()` ensures that only 1 artifact exists that matches our filter criteria. If this is not the case an error would be returned which would let us know that we need to refine our search.

In [None]:
#get the config file we want to use for processing
config = ln.ULabel.get(name="scportrait config")
config_file = ln.Artifact.filter(ulabels = study).filter(ulabels = config).one()

To properly track our inputs and outputs in lamindb so that we later on track data lineages. We will define our custom processing function as a transform and track this transform.

In [None]:
@ln.tracked()
def _process_images(config_file:ln.Artifact, 
                    input_artefacts:ln._query_set.QuerySet, 
                    output_directory:str) -> None:
    
    #perform quick sanity check that we only have images which share all of their attributed except channel and imaged structure
    _features = []
    values_to_ignore = ["channel", "imaged structure"]
    
    for i in input_artefacts:
        features = i.features.get_values()
        features = {key: features[key] for key in features.keys() if key not in values_to_ignore}
        _features.append(features)
    assert all([_features[0] == f for f in _features])
    shared_features = _features[0]

    #get the paths to the input images
    mcherry = input_artefacts.filter(ulabels = ln.ULabel.get(name="mCherry"))
    DAPI = input_artefacts.filter(ulabels = ln.ULabel.get(name="DAPI"))
    Alexa488 = input_artefacts.filter(ulabels = ln.ULabel.get(name="Alexa488"))

    paths = [DAPI.one().load().path, 
             Alexa488.one().load().path, 
             mcherry.one().load().path,
             ]
    
    #create a unique identifier for the project based on the annotated features
    unique_project_id = f"{shared_features['cell_line_clone']}/{shared_features['stimulation']}/{shared_features['FOV']}".replace(" ", "_")
    
    #create the project location
    project_location = f"{output_directory}/{unique_project_id}/scportrait_project"
    os.makedirs(project_location, exist_ok=True)
    
    project = Project(project_location=project_location,
                        config_path= config_file.load().path, 
                        segmentation_f= CytosolSegmentationCellpose,
                        extraction_f= HDF5CellExtraction, 
                        overwrite=True
                        )

    #process the project
    project.load_input_from_tif_files(paths, overwrite=True, channel_names=["DAPI","Alexa488",  "mCherry"])
    project.segment()
    project.extract()

    #potentially we also want to save additional output here (i.e. generated sdata project structure)
    
    #save the generated results back to lamindb
    single_cell_images = f"{project_location}/extraction/data/single_cells.h5"
    
    #update the annotation to reflect the new data modality
    annotation = shared_features
    annotation["filetype"] = "h5" #update filetype to h5
    annotation["number of single-cells"] = h5py.File(single_cell_images, "r")["single_cell_index"].shape[0]
    annotation["channel"] = [ln.ULabel.get(name=x) for x in ["DAPI", "mCherry", "Alexa488"]]
    annotation["imaged structure"] = [ln.ULabel.get(name=x) for x in ['LckLip-mNeon', 'DNA', 'mCherry-LC3B']]
    
    artifact = ln.Artifact(single_cell_images, 
                            description="single-cell image dataset of cells stained for autophagy markers",
                            )
    artifact.save()
    artifact.features.add_values(
        annotation
    )
    artifact.labels.add(ln.ULabel.get(name = "scportrait single-cell images")) 

ln.Param(name='output_directory', dtype='str').save()

Now we still need to create an output directory where we want to locally store our results before they are uploaded to the lamindb instance.

In [None]:
# define and create an output location
output_directory = "processed_data"
os.makedirs(output_directory, exist_ok=True)

Now we are ready to process all of our input images and upload the generated single-cell image datasets back to lamin.

In [None]:
for condition in conditions:
    for cellline in cell_line_clones:
        for FOV in FOVs:
            images = input_images.filter(ulabels = condition).filter(ulabels = cellline).filter(ulabels = FOV).distinct()
            _process_images(config_file, input_artefacts = images, output_directory=output_directory)  

In [None]:
example_artifact = ln.Artifact.filter(ulabels = ln.ULabel.get(name="scportrait single-cell images"))[0]

In [None]:
example_artifact.view_lineage()

In [None]:
ln.finish()