# Query & load data: `select & load`

- LaminDB allows querying data based on SQL-derived `select` statements.
- Once data is queried, you can load it into memory using `load`.

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

## Query a single table with AND constraints

LaminDB's {func}`~lamindb.select` statements are based on [SQLModel](https://sqlmodel.tiangolo.com) but offer further simplifications targeted at data scientists.

To select data from a single table based on fields of that table, provide contraints directly as keyword arguments:

In [None]:
dtransform = ln.select(lns.DTransform, jupynb_id="W9sKZ8VKmLhY", jupynb_v="1").one()
dobjects = ln.select(  # query all parquet files ingested from a given notebook
    lns.DObject, suffix=".parquet", dtransform_id=dtransform.id
)

To access the query results encoded in `dobjects` (a `SelectStmt`), execute the statemnt with one of

- `.all()`: A list of records.
- `.df()`: A dataframe with each record stored as a row.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

For example:

In [None]:
ln.select(lns.DObject, suffix=".parquet", dtransform_id=dtransform.id).all()

You can call `.df()` to return a `DataFrame` instead.

In [None]:
ln.select(lns.DObject, suffix=".parquet", dtransform_id=dtransform.id).df()

If no constraints are passed, the select returns all rows in the table.

In [None]:
ln.select(lns.DObject).df()

## Query a single table with arbitrary constraints

For more general queries, LaminDB offers SQL's entire spectrum, mapped by SQLAlchemy.

Use them via expressions, for example:

In [None]:
ln.select(lns.DObject).where(
    lns.DObject.created_at > "2022-08"
).df()  # data objects more recent than August 2022

## Load data objects

Load data objects into memory via {func}`~lamindb.load`:

In [None]:
dobject = ln.select(lns.DObject, name="iris").first()

df = ln.load(dobject)

In [None]:
df.head()

If no in-memory format can be found, `load` returns the filepath:

In [None]:
dobject = ln.select(lns.DObject, name="paradisi05_laminopathic_nuclei").one()

ln.load(dobject)

## Query data by linked entities

You can select data objects by fields that are not present in the `dobject` table via linked entities.

You can do this through providing a `where` dictionary.

In [None]:
# The next version will bring this back with a canonical API!

In [None]:
# ln.select.dobject(where=dict(jupynb=dict(name="Ingest data: `Ingest`"))).df()

In [None]:
# ln.select.dobject(suffix=".h5ad", where=dict(gene=dict(symbol="Actg1"))).df()

`where` can filter conditions from multiple entities.

In [None]:
# from bioreadout import lookup

# ln.select.dobject(
#     where=dict(
#         gene=dict(ncbi_gene_id=66722),
#         readout=dict(efo_id=lookup.readout.single_cell_RNA_sequencing),
#     )
# ).df()

query dobject by user

In [None]:
# ln.select.dobject(where=dict(user=dict(name="Test User1"))).df()

## Using `SQLModel` sessions

If needed, you can also use the lower level [SQLModel](https://sqlmodel.tiangolo.com) API via `ln.session()`.

For instance, let's select for a data source via the linked `dtransform`:

In [None]:
with ln.session() as session:
    dtransform = session.get(lns.DTransform, dobject.dtransform_id)

Inspecting the result we see that the dobject originates from a Jupyter Notebook.

In [None]:
dtransform

In [None]:
with ln.session() as session:
    jupynb = session.get(
        lns.Jupynb,
        (dtransform.jupynb_id, dtransform.jupynb_v),  # it's version "1" see jupynb_v
    )

In [None]:
jupynb

Now we found the user who last edited the notebook!

In [None]:
with ln.session() as session:
    user = session.get(lns.User, jupynb.created_by)

In [None]:
user