# Query & load data: `select & load`

- LaminDB allows querying data based on SQL-derived `select` statements.
- Once data is queried, you can load it into memory using `load`.

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

## Select

LaminDB's {func}`~lamindb.select` statements are based on [SQLModel](https://sqlmodel.tiangolo.com) but offer further simplifications targeted at data scientists.[^sql]

[^sql]: And SQLModel is a shallow wrapper around SQLAlchemy, which provides the canonical way of creating SQL statements from Python.

In [None]:
stmt = (
    ln.select(lns.DObject)
    .join(lns.Run)
    .join(lns.Jupynb)
    .join(lns.User, handle="testuser1")
)

To access the query results encoded in `dobjects` (a `SelectStmt`), execute the statemnt with one of

- `.all()`: A list of records.
- `.df()`: A dataframe with each record stored as a row.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

For example:

In [None]:
stmt.all()

It's often the most convenient to use the built-in converter to DataFrames.

In [None]:
stmt.df()

## Arbitrary exploded views

Say we want all user information in this table.

In [None]:
stmt = (
    ln.select(lns.DObject, lns.User)
    .join(lns.Run, lns.DObject.run_id == lns.Run.id)
    .join(lns.Jupynb)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want the user handle.

In [None]:
stmt = (
    ln.select(lns.DObject, lns.User.handle)
    .join(lns.Run, lns.DObject.run_id == lns.Run.id)
    .join(lns.Jupynb)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want selected information from all tables.

In [None]:
stmt = (
    ln.select(lns.DObject.name, lns.DObject.suffix, lns.DObject.size, lns.User.handle)
    .join(lns.Run, lns.DObject.run_id == lns.Run.id)
    .join(lns.Jupynb)
    .join(lns.User)
)

In [None]:
df = stmt.df()

In [None]:
df

## More subsetting

Let us subset to just the parquet files - we know it's exactly a single one. So we can get the record using `.one()`.

In [None]:
ln.select(lns.DObject, suffix=".parquet").join(lns.Run).join(lns.Jupynb).join(
    lns.User, handle="testuser1"
).one()

Or subset to files greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

In [None]:
ln.select(lns.DObject).where(lns.DObject.size > 1e4).join(lns.Run).join(
    lns.Jupynb
).join(lns.User).where(lns.User.handle == "testuser1").df()

Or select a notebook based on a substring in the name:

In [None]:
ln.select(lns.Jupynb).where(lns.Jupynb.name.contains("Ingest")).df()

Or select datasets based on a gene symbol.

In [None]:
ln.select(lns.DObject).join(lns.wetlab.DObjectBiometa).join(lns.wetlab.Biometa).join(
    lns.bionty.Featureset
).join(lns.bionty.FeaturesetGene).join(lns.bionty.Gene).where(
    lns.bionty.Gene.symbol == "Actg1"
).df()

```{note}

Write an example involving `lookup.readout.single_cell_RNA_sequencing`.

```

## Load

Load data objects into memory via {func}`~lamindb.load`:

In [None]:
dobject = ln.select(lns.DObject, name="iris").first()

df = ln.load(dobject)

In [None]:
df.head()

If no in-memory format can be found, `load` returns the filepath:

In [None]:
dobject = ln.select(lns.DObject).where(lns.DObject.name.contains("paradisi05")).one()

In [None]:
ln.load(dobject)

## Using `SQLModel` sessions

If needed, you can also use the lower level [SQLModel](https://sqlmodel.tiangolo.com) API via `ln.session()`.