# Query & load data

- LaminDB allows querying data based on SQL-derived `select` statements.
- Once data is queried, you can load it into memory using `load`.

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

## Basic select statements

LaminDB's {func}`~lamindb.select` statements are based on [SQLModel](https://sqlmodel.tiangolo.com) but offer further simplifications targeted at data scientists.[^sql]

[^sql]: Recall that SQLModel is a shallow wrapper around SQLAlchemy.

In [None]:
stmt = (
    ln.select(ln.DObject)
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User, handle="testuser1")
)

To access the query results encoded in `stmt` (a {class}`~lamindb.dev.db.SelectStmt`), execute the statement with one of

- `.all()`: A list of records.
- `.df()`: A dataframe with each record stored as a row.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

For example:

In [None]:
stmt.all()

It's often the most convenient to use the built-in converter to DataFrames.

In [None]:
stmt.df()

## Arbitrary exploded views

Say we want all user information in this table.

In [None]:
stmt = (
    ln.select(ln.DObject, lns.User)
    .join(lns.Run, ln.DObject.run_id == lns.Run.id)
    .join(lns.Notebook)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want the user handle.

In [None]:
stmt = (
    ln.select(ln.DObject, lns.User.handle)
    .join(lns.Run, ln.DObject.run_id == lns.Run.id)
    .join(lns.Notebook)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want selected information from all tables.

In [None]:
stmt = (
    ln.select(ln.DObject.name, ln.DObject.suffix, ln.DObject.size, lns.User.handle)
    .join(lns.Run, ln.DObject.run_id == lns.Run.id)
    .join(lns.Notebook)
    .join(lns.User)
)

In [None]:
df = stmt.df()

In [None]:
df

## More filtering

Let us subset to just the parquet files - we know it's exactly a single one. So we can get the record using `.one()`.

In [None]:
stmt = (
    ln.select(ln.DObject, suffix=".parquet")
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User, handle="testuser1")
)
stmt.one()

Or subset to files greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

In [None]:
stmt = (
    ln.select(ln.DObject)
    .where(ln.DObject.size > 1e4)
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User)
    .where(lns.User.handle == "testuser1")
)
stmt.df()

Or select a notebook based on a substring in the name:

In [None]:
ln.select(lns.Notebook).where(lns.Notebook.name.contains("Ingest")).df()

## Load

Load data objects into the work environment via {meth}`~lamindb.DObject.load`:

In [None]:
dobject = ln.select(ln.DObject, name="iris").first()

df = dobject.load()

If there is a canonical in-memory representation (like a dataframe), data is loaded directly into memory.

In [None]:
df.head()

If no in-memory format can be found, `load` returns the filepath:

In [None]:
dobject = ln.select(ln.DObject).where(ln.DObject.name.contains("paradisi05")).one()

In [None]:
dobject.load()

## Using `SQLModel` sessions

If needed, you can also use the lower level [SQLModel](https://sqlmodel.tiangolo.com) API via `ln.session()`.