![imaging3/4](https://img.shields.io/badge/imaging3/4-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/imaging3.ipynb)

# Featurize single-cell images

Here, we use [scPortrait](https://github.com/MannLabs/scPortrait) to extract cell features that characterize both morphological and intensity-based properties of individual cells:

- Area of the masks in pixels
- Mean intensity of the chosen channel in the regions labelled by each of the masks
- Median intensity of the chosen channel in the regions labelled by each of the masks
- 75% quantile of the chosen channel in the regions labelled by each of the masks  
- 25% quantile of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area

These features provide a comprehensive profile for later training machine learning models to identify cell types and states.

In [1]:
import lamindb as ln
import bionty as bt
import pandas as pd

from scportrait.pipeline.featurization import CellFeaturizer

ln.track()

[92m→[0m connected lamindb: sophiamaedler/test-imaging
[92m→[0m created Transform('0BSejbephVeu0000'), started new Run('gSKgBwi2...') at 2025-03-21 14:44:28 UTC
[92m→[0m notebook imports: bionty==1.1.2 lamindb==1.3.0 pandas==2.2.3 scportrait==1.3.2


We will generate these features on the basis of the previously generated single-cell image datasets.

In [2]:
# Get single-cell images and config
sc_datasets = (
    ln.Artifact.using("scportrait/examples")
    .filter(ulabels__name="autophagy imaging")
    .filter(ulabels__name="scportrait single-cell images")
)
config = (
    ln.Artifact.filter(ulabels__name="autophagy imaging")
    .filter(ulabels__name="scportrait config")
    .distinct()
    .one()
)

In [3]:
# Process single-cell images with scPortrait's featurizer
featurizer = CellFeaturizer(directory=".", config=config.cache(), project_location=None)

def featurize_datasets(artifact_list) -> pd.DataFrame:
    paths = [dataset.cache() for dataset in artifact_list]
    dataset_lookup = {idx: cell.uid for idx, cell in enumerate(artifact_list)}
    labels = list(dataset_lookup.keys())
    results = featurizer.process(
        dataset_paths=paths, dataset_labels=labels, return_results=True
    )

    # ensure we store the original dataset uid to be able to track featurization results back to their original dataset
    results["dataset"] = results["label"].map(dataset_lookup)
    del results["label"]
    return results


# Train on wildtype (WT) cells
wt_cells_afs = sc_datasets.filter(ulabels__name="WT")

# we have two different conditions which will be the two classes that our classifier should be able to tell apart
condition_uls = [
    ln.ULabel.using("scportrait/examples").get(name=stim_name)
    for stim_name in {af.features.get_values()["stimulation"] for af in wt_cells_afs}
]

# map condition names to class labels
class_lookup = {"untreated": 0, "14h Torin-1": 1}

features = None
for _ix, condition_ul in enumerate(condition_uls):
    cells = wt_cells_afs.filter(ulabels=condition_ul)
    results = featurize_datasets(cells)

    # save condition as a class label
    results["class"] = class_lookup[condition_ul.name]

    # concatenate results together
    if features is None:
        features = results
    else:
        features = pd.concat([features, results])

[92m→[0m completing transfer to track Artifact('Ug6oysO8') as input
[93m![0m returning artifact with same hash: Artifact(uid='jryk1izEoNnZgzVR0000', is_latest=True, key='processed_data_imaging_use_case/U2OS_lcklip-mNeon_mCherryLC3B_clone_1/14h_Torin-1/FOV1/single_cell_data.h5ad', suffix='.h5ad', kind='dataset', otype='AnnData', size=11496192, hash='xAoThmgmywkVWC2M64xDVw', n_observations=70, space_id=1, storage_id=1, run_id=3, schema_id=5, created_by_id=1, created_at=2025-03-21 14:43:50 UTC)
[92m→[0m returning existing schema with same hash: Schema(uid='PUwajeoyOFo5TGdoSlPw', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='1j9UbOOFnijksoThNUnoeg', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-21 14:43:49 UTC)
[92m→[0m mapped records: ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG

We upload the generated features to our instance.

In [4]:
artifact = ln.Artifact.from_df(
    features,
    description="featurized single-cell images",
    key="featurization_results/WT.parquet",
).save()
artifact.cell_lines.add(bt.CellLine.get(name="U2OS"))

artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "genotype": "WT",
    }
)

We repeat this process for KO cells:

In [5]:
# Process KO cells to see if they behave differently
ko_cells_afs = sc_datasets.filter(ulabels__name="EI24KO")

# we have the same two conditions as before
condition_uls = [
    ln.ULabel.using("scportrait/examples").get(name=stimulation_name)
    for stimulation_name in {
        af.features.get_values()["stimulation"] for af in ko_cells_afs
    }
]

features_ko = None
for _idx, condition_ul in enumerate(condition_uls):
    cells = ko_cells_afs.filter(ulabels=condition_ul)
    results = featurize_datasets(cells)

    # save condition as a class label
    results["class"] = class_lookup[condition_ul.name]

    if features_ko is None:
        features_ko = results
    else:
        features_ko = pd.concat([features_ko, results])

[92m→[0m completing transfer to track Artifact('d4cvdSJa') as input
[92m→[0m returning existing schema with same hash: Schema(uid='PUwajeoyOFo5TGdoSlPw', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='1j9UbOOFnijksoThNUnoeg', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-21 14:43:49 UTC)
[92m→[0m mapped records: ULabel(uid='Aj8KGwbh'), ULabel(uid='A2945i5P'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
[92m→[0m transferred records: Artifact(uid='d4cvdSJa6rc6Fd9T0000'), ULabel(uid='joRCMMWX')
[92m→[0m completing transfer to track Artifact('JHkm31GA') as input
[92m→[0m returning existing schema with same hash: Schema(uid='PUwajeoyOFo5TGdoSlPw', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False,

In [6]:
artifact = ln.Artifact.from_df(
    features_ko,
    description="featurized single-cell images",
    key="featurization_results/EI24KO.parquet",
).save()
artifact.cell_lines.add(bt.CellLine.filter(name="U2OS").one())

# annotate with required metadata
artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "genotype": "EI24KO",
    }
)

In [7]:
ln.finish()

[94m•[0m please hit CMD + s to save the notebook in your editor .. [92m✓[0m
[92m→[0m finished Run('gSKgBwi2') after 33s at 2025-03-21 14:45:02 UTC
