# Track files

In [None]:
import lamindb as ln

## Usage

Let us first track the data source. Here, it's a Jupyter notebook, so we can run:

In [None]:
ln.track()

If were to run this in a script, we'd need to pass a `Transform` object:

```
transform = ln.Transform("My exploration script")
ln.track(transform=transform)
```

Here's a file on local storage:

In [None]:
filepath = ln.dev.datasets.file_jpg_paradisi05().resolve().as_posix()

In [None]:
filepath

In LaminDB, you track it in two steps.

First, create a {class}`~lamindb.File` object:

In [None]:
file = ln.File(filepath)

In [None]:
file

:::{dropdown} A `file` record stores basic metadata.

- `id`: a unique persistent ID that also serves as a primary key in the SQL table
- `name`: the file name
- `suffix`: the file suffix
- `size`: the file size in bytes
- `hash`: an MD5 checksum useful to check for integrity and collisions (is this file already stored?)
- `created_at`: time of creation
- `updated_at`: time of last update
- `storage_id`: the location of the storage root (say, an S3 bucket)

:::

:::{dropdown} And provenance-related metadata.

- `created_by`: a reference to :class:`~lamindb.User` who created the file 
- `source`: a reference to :class:`~lamindb.Run` that generated the file

:::

For instance, you see that the `file` record links to the current notebook run:

In [None]:
file.source

In [None]:
# a few checks
assert file.hash == "r4tnqmKI_SjrkdLzpuWp4g"
assert file.source == ln.context.run

Second, add the `file` object to the LaminDB instance: metadata & data are added to database & storage in a single transaction:

In [None]:
file = ln.add(file)

## What happens under the hood?

### In the SQL database

Creation of 
1. a `File` record
2. a `Transform` record
3. a `Run` record

All three records are linked so that you can find the file using any of the metadata fields.

In [None]:
ln.select(ln.File, name=file.name).one()

In [None]:
ln.select(ln.Transform, id=ln.context.transform.id).one()

In [None]:
ln.select(ln.Run, id=ln.context.run.id).one()

### In storage

```{note}

This is your configured storage location (in this instance `./mydata`), which you pass to `ln.setup.init(storage=...)` when initiating the instance.

If cloud storage location is configured, the file will be uploaded.
```

A jpg file with cryptic name that equls the id of the `File` record:

In [None]:
!ls ./mydata

```{tip}

If you prefer semantic names, you can easily achieve it by tracking existing data rather than ingesting data into a storage location: {doc}`/guide/existing`.

Naming data objects in storage by the primary key ID of the `File` is typically preferred when facing potential clashes of names at large scale or working with in-memory views.
```

## Retrieve a file

Getting the data back works through `.stage()` - here, we get back a local filepath:

In [None]:
file.stage()

## Query a file

You can also query the file-associated File record by its metadata. One of the simplest ways is by name:

In [None]:
file = ln.select(ln.File, name="paradisi05_laminopathic_nuclei").one()

file

Learn more: {doc}`/guide/select`.