# Linked select

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

Querying a complex schema like LaminDB's default bio schema can be hard.

For conceptually simple queries ("Give me all dobjects ingested by user X.", "Give me all dobjects characterizing cell type Y.", etc.), in SQL, queries involving many mediating tables need to be written.

Scenarios as follows will predictably happen (from a [short story](https://erikbern.com/2021/07/07/the-data-team-a-short-story.html) unrelated to biology or R&D):

> It's super stressed. “The table in the database changed, and suddenly our SQL query we use to populate the spreadsheet generates nonsense output”.

> When you look at the SQL query, you almost spit out your coffee. It's a 500 lines long query. The author of the query seems apologetic but at the same time a bit annoyed. “We kept coming to you several times asking for help with these questions”, he says, “and you told us you didn't have resources, so we built it ourselves”.

> The data scientist in your team who gets assigned the monster SQL query isn't happy.

At the same time, we don't want to come up with another ["garbage query language"](https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-garbage-query-language.html), but merely exploit the - what we think of as canonical - R&D schema constraints to come up with declarative queries that are essentially SQL that allow to skip all the immediate steps.

So that you can do

```
ln.select(lns.DObject).where(lns.User.handle == "testuser1")
```

Instead of writing

```
stmt = ln.select(lns.DObject).join(lns.Run).join(lns.Notebook).join(lns.User).where(lns.User.handle == "testuser1")
```

In [None]:
stmt = (
    ln.select(lns.DObject)
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User)
    .where(lns.User.handle == "testuser1")
)

In [None]:
stmt.df()

In [None]:
stmt = (
    ln.select(lns.DObject)
    .where(lns.DObject.suffix == ".parquet")
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User)
    .where(lns.User.handle == "testuser1")
)

In [None]:
stmt.df()

In [None]:
stmt = (
    ln.select(lns.DObject)
    .where(lns.DObject.size > 1e4)
    .join(lns.Run)
    .join(lns.Notebook)
    .join(lns.User)
    .where(lns.User.handle == "testuser1")
)

In [None]:
stmt.df()