[![lamindata](https://img.shields.io/badge/laminlabs/rxrx-mediumseagreen)](https://lamin.ai/laminlabs/rxrx/record/core/Transform?id=sx3wFSwnhCYYz8)

# RxRx

[rxrx.ai](https://rxrx.ai/) hosts datasets produced with [Recursion](https://www.recursion.com/)'s platform.

Here, we make these data accessible through a LaminDB instance: [laminlabs/rxrx](https://lamin.ai/laminlabs/rxrx).

Specifically, we explore [RxRx1](https://www.rxrx.ai/rxrx1): 125k images for 1138 siRNA perturbation across 4 cell lines reading out 5 stains, image dimension is 512x512x6.

## Setup

In [None]:
!lamin load laminlabs/rxrx

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import lnschema_lamin1 as ln1

## Search & look up metadata

We'll find all treatments in the `Treatment` registry:

In [None]:
df = ln1.Treatment.filter().df()

There are 1572 treatments in total:

In [None]:
df.shape

And two types of treatments:

In [None]:
df.system.unique()

Let us create a look up object for siRNAs so that we can easily auto-complete queries involving it:

In [None]:
sirnas = ln1.Treatment.filter(system="siRNA").lookup(return_field="name")

We're also interested in measured features, cell lines & wells:

In [None]:
features = ln.Feature.lookup(return_field="name")
cell_lines = lb.CellLine.lookup(return_field="abbr")
wells = ln1.Well.lookup(return_field="name")

## Load the dataset

In this instance, there is only a single dataset:

In [None]:
ln.Dataset.filter().df()

Let us get the corresponding object:

In [None]:
dataset = ln.Dataset.filter(uid="vPl0AmfOaiPEsceri2Dt").one()

In [None]:
dataset

The dataset consists in a metadata file:

In [None]:
dataset.file

And a universal path object that allows you to access individual images based on a folder on GCP: 

In [None]:
dataset.path

We can explore the individual image files like so:

In [None]:
ln.File.view_tree(dataset.path, level=2)

## Query image files

Because we didn't choose to register each image as a record in the {class}`~lamindb.File` registry, we have to query the images through the metadata file of the dataset:

In [None]:
df = dataset.file.load()

We can query a subset of images using metadata registries & pandas query syntax:

In [None]:
query = df[
    (df.cell_type == cell_lines.hep_g2_cell)
    & (df.sirna == sirnas.s19486)
    & (df.well == wells.l20)
    & (df.plate == "3")
    & (df.site == "2")
]

query

To access the individual images based on this query result, we 

In [None]:
images = [dataset.path.parent / key for key in query.file_keys]

images

### DuckDB

As an alternative to pandas, we could also use duckdb to query image metadata.

:::{dropdown}

```
import duckdb

filter = (
    f"{features.cell_type} == '{cell_lines.hep_g2_cell}' and {features.sirna} =="
    f" '{sirnas.s19486}' and {features.well} == '{wells.l20}' and "
    f"{features.plate} == '3' and {features.site} == '2'"
)

parquet_data = duckdb.from_parquet(str(file.path))

parquet_data.filter(filter)
```

:::