# Featurize single-cell images

Here, we use [scPortrait](https://github.com/MannLabs/scPortrait) to extract cell features that characterize both morphological and intensity-based properties of individual cells:

- Area of the masks in pixels
- Mean intensity of the chosen channel in the regions labelled by each of the masks
- Median intensity of the chosen channel in the regions labelled by each of the masks
- 75% quantile of the chosen channel in the regions labelled by each of the masks  
- 25% quantile of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area

These features provide a comprehensive profile for later training machine learning models to identify cell types and states.

In [1]:
import lamindb as ln
import bionty as bt
import pandas as pd

from scportrait.pipeline.featurization import CellFeaturizer

ln.track()

[92m→[0m connected lamindb: scportrait/examples
[92m→[0m loaded Transform('ujI2BcWc8AA60000'), re-started Run('aWmCDdv4...') at 2025-02-24 15:03:20 UTC
[92m→[0m notebook imports: bionty==1.1.0 lamindb==1.1.0 pandas==2.2.3 scportrait==1.1.1.dev0


In [2]:
# Get single-cell images and config
sc_datasets = ln.Artifact.filter(ulabels__name="autophagy imaging").filter(
    ulabels__name="scportrait single-cell images"
)
config = (
    ln.Artifact.filter(ulabels__name="autophagy imaging")
    .filter(ulabels__name="scportrait config")
    .distinct()
    .one()
)

In [None]:
# Process single-cell images with scPortrait's featurizer
featurizer = CellFeaturizer(directory=".", config=config.cache(), project_location=None)

# Train on wildtype (WT) cells
wt_cells_afs = sc_datasets.filter(ulabels__name="WT")

# we have two different conditions which will be the two classes that our classifier should be able to tell apart
condition_uls = [
    ln.ULabel.get(name=stim_name)
    for stim_name in set(af.features.get_values()["stimulation"] for af in wt_cells_afs)
]

# Store the calculated features in a dictionary for each condition
condition_lookup = {}
features = None
for idx, condition_ul in enumerate(condition_uls):
    cells = wt_cells_afs.filter(ulabels=condition_ul)
    paths = [dataset.load().path for dataset in cells]
    dataset_lookup = {cell.uid: idx for idx, cell in enumerate(cells)}
    labels = list(dataset_lookup.values())
    results = featurizer.process(
        extraction_dir=paths[0], labels=labels[0], return_results=True
    )
    results["class"] = idx
    condition_lookup[condition_ul.name] = 1
    if features is None:
        features = results
    else:
        features = pd.concat([features, results])

... synchronizing LsozDWhSsP9ajLJL0000.h5: 100.0%
... synchronizing DkbGSFdQT3AXjqQt0000.h5: 100.0%
... synchronizing Ytx4E35Xfe3ubE190000.h5: 100.0%
... synchronizing e1KSk8g7Y1GuxX6S0000.h5: 100.0%
... synchronizing hCw1GXdk9tonZ7DG0000.h5: 100.0%


In [14]:
artifact = ln.Artifact.from_df(
    features, description="featurized single-cell images"
).save()
artifact.cell_lines.add(bt.CellLine.get(name="U2OS"))

# annotate metadata
artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "artifact type": "single-cell image featurization results",
        "genotype": "WT",
    }
)

[92m→[0m found artifact with same hash: Artifact(uid='OZUxfqXUi6YtFrVW0000', is_latest=True, description='featurized single-cell images', suffix='.parquet', kind='dataset', otype='DataFrame', size=67768, hash='Xm_6eKDzVstPMARccKPHzg', space_id=1, storage_id=1, run_id=127, created_by_id=3, created_at=2025-02-24 15:23:27 UTC); to track this artifact as an input, use: ln.Artifact.get()


We repeat this process for KO cells:

In [15]:
# Train on wildtype cells
ko_cells_afs = sc_datasets.filter(ulabels__name="EI24KO")

# we have two different conditions which will be the two classes that our classifier should be able to tell apart
condition_uls = [
    ln.ULabel.get(name=stimulation_name)
    for stimulation_name in set(
        af.features.get_values()["stimulation"] for af in ko_cells_afs
    )
]

# we will store the calculated features in a dictionary for each condition
condition_lookup = {}
features_ko = None
for idx, condition_ul in enumerate(condition_uls):
    cells = ko_cells_afs.filter(ulabels=condition_ul)
    paths = [dataset.load().path for dataset in cells]
    dataset_lookup = {cell.uid: idx for idx, cell in enumerate(cells)}
    labels = list(dataset_lookup.values())
    results = featurizer.process(
        extraction_dir=paths[0], labels=labels[0], return_results=True
    )
    results["class"] = idx
    condition_lookup[condition_ul.name] = 1
    if features_ko is None:
        features_ko = results
    else:
        features_ko = pd.concat([features_ko, results])

... synchronizing ws65rSiY0he9SSve0000.h5: 100.0%
... synchronizing zJZEvnfApxBWCJzZ0000.h5: 100.0%
... synchronizing jP8M4lTmCX4A3xCs0000.h5: 100.0%
... synchronizing DQQX6jIYqsxZv9ec0000.h5: 100.0%
... synchronizing cYLJWBZoUyQxFwii0000.h5: 100.0%
... synchronizing 9CMfEX06h1RJrjPP0000.h5: 100.0%
... synchronizing Kt9FXnHBQeZ8F0mj0000.h5: 100.0%
... synchronizing P6r2aqhFZomH93uJ0000.h5: 100.0%


In [17]:
artifact = ln.Artifact.from_df(
    features_ko, description="featurized single-cell images"
).save()
artifact.cell_lines.add(bt.CellLine.filter(name="U2OS").one())

# annotate with required metadata
artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "artifact type": "single-cell image featurization results",
        "genotype": "EI24KO",
    }
)

... uploading vqgUwzZxmlVMNRj00000.parquet: 100.0%


In [18]:
ln.finish()

[94m•[0m please hit CTRL + s to save the notebook in your editor  [92m✓[0m
[93m![0m cells [(2, None), (None, 14), (15, 17)] were not run consecutively
[92m→[0m finished Run('aWmCDdv4') after 44m at 2025-02-24 15:47:26 UTC
[92m→[0m go to: https://lamin.ai/scportrait/examples/transform/ujI2BcWc8AA60000
[92m→[0m to update your notebook from the CLI, run: lamin save /home/lukas/code/lamin-usecases/docs/imaging3.ipynb
