# Query & search registries

Find & access data using registries.

## Setup

In [None]:
!lamin init --storage ./mydata

In [None]:
import lamindb as ln

In [None]:
ln.settings.verbosity = "info"

In [None]:
ln.track()

We'll need some toy data:

In [None]:
ln.File(ln.dev.datasets.file_jpg_paradisi05(), description="My image").save()
ln.File(ln.dev.datasets.df_iris(), description="The iris dataset").save()
ln.File(ln.dev.datasets.file_fastq(), description="My fastq").save()

## Look up metadata

For entities where we don't store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the `User` registry:

In [None]:
users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

In [None]:
user = users.testuser1

In [None]:
user

:::{note}

You can also auto-complete in a dictionary:

```python
users_dict = ln.User.lookup().dict()
```

:::

## Filter by metadata

Filter for all files created by a user:

In [None]:
ln.File.filter(created_by=user).df()

To access the query results encoded in a select statement (an extended Django `QuerySet` object), execute it with one of

- `.df()`: A pandas `DataFrame` with each record stored as a row.
- `.all()`: An indexable django `QuerySet`.
- `.list()`: A list of records.
- `.one()`: Exactly one record. Will raise an error if there is none.
- `.one_or_none()`: Either one record or `None` if there is no query result.

```{note}

The ORMs in LaminDB are Django Models and any [Django query](https://docs.djangoproject.com/en/stable/topics/db/queries/) works. LaminDB extends Django's API for data scientists.

Under the hood, any `filter()` call translates into a SQL select statement.

In SQLAlchemy's & SQLModel's queries, this is more evident as they revolve around `select` statements, which is analogous to the `QuerySet` returned by `filter()`. `.one()` and `.one_or_none()` are two parts of LaminDB's API that are borrowed from SQLAlchemy.

```

## Search for metadata

In [None]:
ln.File.search("iris")

In [None]:
ln.File.search("iris", return_queryset=True).first()

Let us create 500 notebook objects with fake titles and save them:

In [None]:
ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.dev.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

In [None]:
ln.Transform.search("intestine").head()

## Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

In [None]:
ln.File.filter(run__created_by__handle__startswith="testuse").df()

The filter selects all files based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it's joining the file table with the run and the user table.)



Beyond `__startswith`, Django supports about [two dozen field comparators](https://docs.djangoproject.com/en/stable/ref/models/querysets/#field-lookups) `field__comparator=value`.

Here are some of them.

### and

In [None]:
ln.File.filter(suffix=".jpg", created_by=user).df()

### less than/ greater than

Or subset to files greater than 10kB. Here, we can't use keyword arguments, but need an explicit where statement.

In [None]:
ln.File.filter(created_by=user, size__lt=1e4).df()

### or

In [None]:
from django.db.models import Q

ln.File.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()

### in

In [None]:
ln.File.filter(suffix__in=[".jpg", ".fastq.gz"]).df()

### order by

In [None]:
ln.File.filter().order_by("-updated_at").df()

### contains

In [None]:
ln.Transform.filter(name__contains="search").df().head(10)

And case-insensitive:

In [None]:
ln.Transform.filter(name__icontains="Search").df().head(10)

### startswith

In [None]:
ln.Transform.filter(name__startswith="Query").df()

In [None]:
!lamin delete --force mydata