# Query & lookup data

Query data using [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy) `select` statements.

Use lookups to auto-complete categorical query conditions.

In [None]:
# initialize a test instance for this notebook
!lamin delete myobjects
!lamin init --storage ./myobjects

In [None]:
import lamindb as ln

In [None]:
ln.track()

In [None]:
# save some dummy files
ln.save(ln.File("index.md"))
ln.save(ln.File(ln.dev.datasets.df_iris(), name="iris"))
ln.save(ln.File(ln.dev.datasets.file_fastq()));

## Basic select statements

In [None]:
user_handles = ln.User.lookup(field="handle")

With auto-complete, we find a user:

In [None]:
user_handles.testuser1

Use it on one of the fields:

In [None]:
user = ln.select(ln.User, handle=user_handles.testuser1).one()

In [None]:
user

Query all files created by that user:

In [None]:
ln.select(ln.File, created_by=user).df()

To access the query results encoded in `stmt` (a {class}`~lamindb.dev.db.SelectStmt`), execute it with one of

- `.df()`: A dataframe with each record stored as a row.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

For example:

In [None]:
ln.select(ln.File, created_by=user).all()[:3]

## More filtering

Let us subset to just the parquet files - we know it's exactly a single one. So we can get the record using `.one()`.

In [None]:
ln.select(ln.File, suffix=".md", created_by=user).df()

Or subset to files greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

In [None]:
ln.select(ln.File, created_by=user, size__lt=1e4).df()

Or select a notebook based on a substring in the name:

In [None]:
ln.select(ln.Transform, type="notebook", title__contains="Query").df()

## Reference

### and

In [None]:
ln.select(ln.File, name="iris", suffix=".parquet").first()

### or

In [None]:
from django.db.models import Q

ln.select(ln.File).filter(Q(suffix=".md") | Q(suffix=".fastq.gz")).df()

### in

In [None]:
ln.select(ln.File, suffix__in=[".md", ".fastq.gz"]).df()

### order by

In [None]:
ln.select(ln.File).order_by("-created_at").df()

### contains

In [None]:
ln.select(ln.Transform, title__contains="lookup").df()

### startswith

In [None]:
ln.select(ln.Transform, title__startswith="Query").df()

In [None]:
!lamin delete myobjects