[![](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/rxrx-lamin/blob/main/docs/rxrx.ipynb)
[![lamindata](https://img.shields.io/badge/Source%20%26%20report%20on%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/rxrx/record/core/Transform?uid=sx3wFSwnhCYYz8)

# RxRx: cell imaging

[rxrx.ai](https://rxrx.ai/) hosts high-throughput cell imaging datasets generated by [Recursion](https://www.recursion.com/).

High numbers of fluorescent microscopy images characterize cellular phenotypes in vitro based on morphology and protein expression (5-10 stains) across a range of conditions.

- In this guide, you'll see how to query some of these data using LaminDB: [laminlabs/rxrx](https://lamin.ai/laminlabs/rxrx).
- If you'd like to transfer data into your own LaminDB instance, see the [transfer guide](docs:transfer).
- If you'd like to understand how the `laminlabs/rxrx` instance was curated, see this [repository](https://github.com/laminlabs/rxrx-lamin).

## Setup

In [None]:
!lamin load laminlabs/rxrx

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import lnschema_lamin1 as ln1

## Search & look up metadata

We'll find all treatments in the `Treatment` registry:

In [None]:
df = ln1.Treatment.filter().df()
df.shape

Let us create a look up object for siRNAs so that we can easily auto-complete queries involving it:

In [None]:
sirnas = ln1.Treatment.filter(system="siRNA").lookup(return_field="name")

We're also interested in features, cell lines & wells:

In [None]:
features = ln.Feature.lookup(return_field="name")
cell_lines = lb.CellLine.lookup(return_field="abbr")
wells = ln1.Well.lookup(return_field="name")

## Load the dataset

In this instance, there is only a single dataset:

In [None]:
ln.Dataset.filter().df()

This is [RxRx1](https://www.rxrx.ai/rxrx1): 125k images for 1138 siRNA perturbation across 4 cell lines reading out 5 stains, image dimension is 512x512x6.

Let us get the corresponding object and some information about it:

In [None]:
dataset = ln.Dataset.filter(uid="flLeukogmLRzleFCpCRD").one()
dataset.view_flow()
dataset.describe()

The dataset consists in a metadata file and a folder path pointing to the image files:

In [None]:
dataset.file.load().head()

In [None]:
dataset.path

We can get an idea of the folder structure like so:

In [None]:
dataset.path.view_tree(level=2)

Get an idea of all image files like so:

In [None]:
# dataset.path.view_tree()

## Query image files

Because we didn't choose to register each image as a record in the {class}`~lamindb.File` registry, we have to query the images through the metadata file of the dataset:

In [None]:
df = dataset.file.load()

We can query a subset of images using metadata registries & pandas query syntax:

In [None]:
query = df[
    (df.cell_line == cell_lines.hep_g2_cell)
    & (df.sirna == sirnas.s19486)
    & (df.well == wells.l20)
    & (df.plate == 3)
    & (df.site == 2)
]

query

To access the individual images based on this query result:

In [None]:
images = [dataset.path.parent / key for key in query.path]

images

:::{dropdown} Use DuckDB to query metadata

As an alternative to pandas, we could use DuckDB to query image metadata.

```
import duckdb

filter = (
    f"{features.cell_type} == '{cell_lines.hep_g2_cell}' and {features.sirna} =="
    f" '{sirnas.s19486}' and {features.well} == '{wells.l20}' and "
    f"{features.plate} == '3' and {features.site} == '2'"
)

parquet_data = duckdb.from_parquet(file.path.as_posix())

parquet_data.filter(filter)
```

:::