# Query data

Querying is based on [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy) `select` statements.

LaminDB comes with useful default entities to query for, centered around data lineage: {mod}`lamindb.schema`.

You can also readily get started with querying for biological entities: {doc}`/guide/features`.

```{toctree}
:hidden:

query-book
```

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.track()

## Basic select statements

LaminDB's {func}`~lamindb.select` statements offer everything of [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy), but can be executed in one line using `.all()` or `.df()`, targeting ML and data scientists.

In [None]:
stmt = (
    ln.select(ln.File)
    .join(ln.Run)
    .join(ln.Transform)
    .join(lns.User, handle="testuser1")
)

To access the query results encoded in `stmt` (a {class}`~lamindb.dev.db.SelectStmt`), execute the statement with one of

- `.all()`: A list of records.
- `.df()`: A dataframe with each record stored as a row.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

For example:

In [None]:
stmt.all()

It's often the most convenient to use the built-in converter to DataFrames.

In [None]:
stmt.df()

## Arbitrary exploded views

Say we want all user information in this table.

In [None]:
stmt = (
    ln.select(ln.File, lns.User)
    .join(ln.Run, ln.File.source_id == ln.Run.id)
    .join(ln.Transform)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want the user handle.

In [None]:
stmt = (
    ln.select(ln.File, lns.User.handle)
    .join(ln.Run, ln.File.source_id == ln.Run.id)
    .join(ln.Transform)
    .join(lns.User)
)

In [None]:
stmt.df()

Say, we only want selected information from all tables.

In [None]:
stmt = (
    ln.select(ln.File.name, ln.File.suffix, ln.File.size, lns.User.handle)
    .join(ln.Run, ln.File.source_id == ln.Run.id)
    .join(ln.Transform)
    .join(lns.User)
)

In [None]:
df = stmt.df()

In [None]:
df

## More filtering

Let us subset to just the parquet files - we know it's exactly a single one. So we can get the record using `.one()`.

In [None]:
stmt = (
    ln.select(ln.File, suffix=".parquet")
    .join(ln.Run)
    .join(ln.Transform)
    .join(lns.User, handle="testuser1")
)
stmt.one()

Or subset to files greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

In [None]:
stmt = (
    ln.select(ln.File)
    .where(ln.File.size > 1e4)
    .join(ln.Run)
    .join(ln.Transform)
    .join(lns.User)
    .where(lns.User.handle == "testuser1")
)
stmt.df()

Or select a notebook based on a substring in the name:

In [None]:
ln.select(ln.Transform).where(ln.Transform.name.contains("Track")).df()

## Load

Load data objects into the work environment via {meth}`~lamindb.File.load`:

In [None]:
file = ln.select(ln.File, name="iris").first()

df = file.load()

If there is a canonical in-memory representation (like a dataframe), data is loaded directly into memory.

In [None]:
df.head()

If no in-memory format can be found, `load` returns the filepath:

In [None]:
file = ln.select(ln.File).where(ln.File.name.contains("paradisi05")).one()

In [None]:
file.load()