[![](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-spatial/blob/main/docs/rxrx.ipynb)

# RxRx: cell imaging

[rxrx.ai](https://rxrx.ai/) hosts high-throughput cell imaging datasets generated by [Recursion](https://www.recursion.com/).

High numbers of fluorescent microscopy images characterize cellular phenotypes in vitro based on morphology and protein expression (5-10 stains) across a range of conditions.

- In this guide, you'll see how to query some of these data using LaminDB.
- If you'd like to transfer data into your own LaminDB instance, see the [transfer guide](inv:docs#transfer).

In [None]:
# !pip install 'lamindb[bionty,jupyter,gcp]' wetlab
!lamin load laminlabs/lamindata

In [None]:
import lamindb as ln
import bionty as bt
import wetlab as wl

## Search & look up metadata

We'll find all genetic treatments in the `GeneticPerturbation` registry:

In [None]:
df = wl.GeneticPerturbation.df()
df.shape

Let us create a look up object for siRNAs so that we can easily auto-complete queries involving it:

In [None]:
sirnas = wl.GeneticPerturbation.filter(system="siRNA").lookup(return_field="name")

We're also interested in cell lines & wells:

In [None]:
cell_lines = bt.CellLine.lookup(return_field="abbr")
wells = wl.Well.lookup(return_field="name")

## Load the collection

This is [RxRx1](https://www.rxrx.ai/rxrx1): 125k images for 1138 siRNA perturbation across 4 cell lines reading out 5 stains, image dimension is 512x512x6.

Let us get the corresponding object and some information about it:

In [None]:
collection = ln.Collection.get("Br2Z1lVSQBAkkbbt7ILu")
collection.view_lineage()
collection.describe()

The dataset consists in a metadata file and a folder path pointing to the image files:

In [None]:
collection.meta_artifact.load().head()

## Query image files

Because we didn't choose to register each image as a record in the {class}`~lamindb.Artifact` registry, we have to query the images through the metadata file of the dataset:

In [None]:
df = collection.meta_artifact.load()

We can query a subset of images using metadata registries & pandas query syntax:

In [None]:
query = df[
    (df.cell_line == cell_lines.hep_g2_cell)
    & (df.sirna == sirnas.s15652)
    & (df.well == wells.m15)
    & (df.plate == 1)
    & (df.site == 2)
]
query

To access the individual images based on this query result:

In [None]:
collection.data_artifact.storage.root

In [None]:
images = [f"{collection.data_artifact.storage.root}/{key}" for key in query.path]
images

Download an image to disk:

In [None]:
path = ln.UPath(images[1])
path.download_to(".")

In [None]:
from IPython.display import Image

Image(f"./{path.name}")

## Use DuckDB to query metadata

As an alternative to pandas, we could use DuckDB to query image metadata.

In [None]:
import duckdb  # pip install duckdb

features = ln.Feature.lookup(return_field="name")

filter = (
    f"{features.cell_line} == '{cell_lines.hep_g2_cell}' and {features.sirna} =="
    f" '{sirnas.s15652}' and {features.well} == '{wells.m15}' and "
    f"{features.plate} == '1' and {features.site} == '2'"
)

region = ln.setup.settings.storage.region
parquet_data = duckdb.from_parquet(
    collection.meta_artifact.path.as_posix() + f"?s3_region={region}"
)

parquet_data.filter(filter)