# Featurize single-cell images

Here, we use [scPortrait](https://github.com/MannLabs/scPortrait) to extract cell features that characterize both morphological and intensity-based properties of individual cells:

- Area of the masks in pixels
- Mean intensity of the chosen channel in the regions labelled by each of the masks
- Median intensity of the chosen channel in the regions labelled by each of the masks
- 75% quantile of the chosen channel in the regions labelled by each of the masks  
- 25% quantile of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the regions labelled by each of the masks
- Summed intensity of the chosen channel in the region labelled by each of the masks normalized for area

These features provide a comprehensive profile for later training machine learning models to identify cell types and states.

In [1]:
import lamindb as ln
import bionty as bt
import pandas as pd

from scportrait.pipeline.featurization import CellFeaturizer

ln.track()

[92m→[0m connected lamindb: scportrait/examples


  from .autonotebook import tqdm as notebook_tqdm


[92m→[0m loaded Transform('ujI2BcWc8AA60007'), re-started Run('sHU2epGT...') at 2025-03-06 15:09:08 UTC
[92m→[0m notebook imports: bionty==1.1.0 lamindb==1.1.1 pandas==2.2.3 scportrait==1.1.1.dev0


We will generate these features on the basis of the previously generated single-cell image datasets.

In [2]:
# Get single-cell images and config
sc_datasets = ln.Artifact.filter(ulabels__name="autophagy imaging").filter(
    ulabels__name="scportrait single-cell images"
)
config = (
    ln.Artifact.filter(ulabels__name="autophagy imaging")
    .filter(ulabels__name="scportrait config")
    .distinct()
    .one()
)

In [None]:
# Process single-cell images with scPortrait's featurizer
featurizer = CellFeaturizer(directory=".", config=config.cache(), project_location=None)

def featurize_datasets(artifact_list) -> pd.DataFrame:
    paths = [dataset.load().path for dataset in artifact_list]
    dataset_lookup = {idx: cell.uid for idx, cell in enumerate(artifact_list)}
    labels = list(dataset_lookup.keys())
    results = featurizer.process(
        dataset_paths=paths,
        dataset_labels=labels,
        return_results=True
    )

    # ensure we store the original dataset uid to be able to track featurization results back to their original dataset
    results["dataset"] = results["label"].map(dataset_lookup)
    del results["label"]
    return results

# Train on wildtype (WT) cells
wt_cells_afs = sc_datasets.filter(ulabels__name="WT")

# we have two different conditions which will be the two classes that our classifier should be able to tell apart
condition_uls = [
    ln.ULabel.get(name=stim_name)
    for stim_name in set(af.features.get_values()["stimulation"] for af in wt_cells_afs)
]

# map condition names to class labels
class_lookup = {"untreated": 0,
                "14h Torin-1": 1}

features = None
for ix, condition_ul in enumerate(condition_uls):
    cells = wt_cells_afs.filter(ulabels=condition_ul)
    results = featurize_datasets(cells)

    # save condition as a class label
    results["class"] = class_lookup[condition_ul.name]

    # concatenate results together
    if features is None:
        features = results
    else:
        features = pd.concat([features, results])

We upload the generated features to our instance.

In [4]:
artifact = ln.Artifact.from_df(
    features, 
    description="featurized single-cell images",
    key = "featurization_results/WT.parquet"
).save()
artifact.cell_lines.add(bt.CellLine.get(name="U2OS"))

artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "artifact type": "single-cell image featurization results",
        "genotype": "WT",
    }
)

[92m→[0m found artifact with same hash: Artifact(uid='dSQkaFNDfoyEsbG70002', is_latest=True, key='featurization_results/WT.parquet', description='featurized single-cell images', suffix='.parquet', kind='dataset', otype='DataFrame', size=144333, hash='T2VEWFxCEh9LcUO2nchY1Q', n_observations=566, space_id=1, storage_id=1, run_id=215, created_by_id=4, created_at=2025-03-06 15:08:44 UTC); to track this artifact as an input, use: ln.Artifact.get()


We repeat this process for KO cells:

In [5]:
# Process KO cells to see if they behave differently
ko_cells_afs = sc_datasets.filter(ulabels__name="EI24KO")

# we have the same two conditions as before
condition_uls = [
    ln.ULabel.get(name=stimulation_name)
    for stimulation_name in set(
        af.features.get_values()["stimulation"] for af in ko_cells_afs
    )
]

features_ko = None
for idx, condition_ul in enumerate(condition_uls):
    cells = ko_cells_afs.filter(ulabels=condition_ul)
    results = featurize_datasets(cells)

    # save condition as a class label
    results["class"] = class_lookup[condition_ul.name]

    if features_ko is None:
        features_ko = results
    else:
        features_ko = pd.concat([features_ko, results])

In [6]:
artifact = ln.Artifact.from_df(
    features_ko, description="featurized single-cell images",
    key = "featurization_results/EI24KO.parquet"
).save()
artifact.cell_lines.add(bt.CellLine.filter(name="U2OS").one())

# annotate with required metadata
artifact.features.add_values(
    {
        "study": "autophagy imaging",
        "artifact type": "single-cell image featurization results",
        "genotype": "EI24KO",
    }
)

[92m→[0m found artifact with same hash: Artifact(uid='YCsZhphWpiECev3N0002', is_latest=True, key='featurization_results/EI24KO.parquet', description='featurized single-cell images', suffix='.parquet', kind='dataset', otype='DataFrame', size=198112, hash='1Yzh3gRMyKxZOM--oVVYCg', n_observations=848, space_id=1, storage_id=1, run_id=215, created_by_id=4, created_at=2025-03-06 15:08:53 UTC); to track this artifact as an input, use: ln.Artifact.get()


In [7]:
ln.finish()

[94m•[0m please hit CMD + s to save the notebook in your editor . [92m✓[0m
[93m![0m cells [(2, None), (None, 4)] were not run consecutively
[92m→[0m finished Run('sHU2epGT') after 22s at 2025-03-06 15:09:30 UTC
[92m→[0m go to: https://lamin.ai/scportrait/examples/transform/ujI2BcWc8AA60007
[92m→[0m to update your notebook from the CLI, run: lamin save /Users/sophia/Documents/GitHub/lamin-usecases/docs/imaging3.ipynb
