# Track notebooks, scripts & functions

For tracking pipelines, see: {doc}`docs:pipelines`.

In [None]:
# pip install 'lamindb[jupyter]'
!lamin init --storage ./test-track

## Track a notebook or script

Call {meth}`~lamindb.track` to register your notebook or script as a `transform` and start capturing inputs & outputs of a run.

```{eval-rst}
.. literalinclude:: scripts/run_track_and_finish.py
   :language: python
```

<br>

:::{dropdown} Here is how a notebook with run report looks on the hub.

Explore it [here](https://lamin.ai/laminlabs/lamindata/transform/PtTXoc0RbOIq).

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/RGXj5wcAf7EAc6J80003.png" width="900px">

:::

You find your notebooks and scripts in the {class}`~lamindb.Transform` registry (along with pipelines & functions). {class}`~lamindb.Run` stores executions.
You can use all usual ways of querying to obtain one or multiple transform records, e.g.:

```python
transform = ln.Transform.get(key="my_analyses/my_notebook.ipynb")
transform.source_code  # source code
transform.runs  # all runs
transform.latest_run.report  # report of latest run
transform.latest_run.environment  # environment of latest run
```

To load a notebook or script from the hub, search or filter the `transform` page and use the CLI.

```bash
lamin load https://lamin.ai/laminlabs/lamindata/transform/13VINnFk89PE
```

## Use projects

You can link the entities created during a run to a project.

In [None]:
import lamindb as ln

my_project = ln.Project(name="My project").save()  # create a project

ln.track(project="My project")  # auto-link entities to "My project"

ln.Artifact(ln.core.datasets.file_fcs(), key="my_file.fcs").save()  # save an artifact

Filter entities by project, e.g., artifacts:

In [None]:
ln.Artifact.filter(projects=my_project).df()

Access entities linked to a project.

In [None]:
display(my_project.artifacts.df())
display(my_project.transforms.df())
display(my_project.runs.df())

## Use spaces

You can write the entities created during a run into a space that you configure on LaminHub. This is particularly useful if you want to restrict access to a space. Note that this doesn't affect bionty entities who should typically be commonly accessible.

```python
ln.track(space="Our team space")
```

## Track parameters

In addition to tracking source code, run reports & environments, you can track run parameters.

(track-run-parameters)=

### Track run parameters

First, define valid parameters, e.g.:

In [None]:
ln.Feature(name="input_dir", dtype=str).save()
ln.Feature(name="learning_rate", dtype=float).save()
ln.Feature(name="preprocess_params", dtype="dict").save()

If you hadn't defined these parameters, you'd get a `ValidationError` in the following script.

```{eval-rst}
.. literalinclude:: scripts/run_track_with_params.py
   :language: python
   :caption: run_track_with_params.py
```

Run the script.

In [None]:
!python scripts/run_track_with_params.py  --input-dir ./mydataset --learning-rate 0.01 --downsample

(query-by-run-parameters)=

### Query by run parameters

Query for all runs that match a certain parameters:

In [None]:
ln.Run.filter(
    learning_rate=0.01, input_dir="./mydataset", preprocess_params__downsample=True
).df()

Note that:

* `preprocess_params__downsample=True` traverses the dictionary `preprocess_params` to find the key `"downsample"` and match it to `True`
* nested keys like `"downsample"` in a dictionary do not appear in `Feature` and hence, do not get validated

### Access parameters of a run

Below is how you get the parameter values that were used for a given run.

In [None]:
run = ln.Run.filter(learning_rate=0.01).order_by("-started_at").first()
run.features.get_values()

:::{dropdown} Here is how it looks [on the hub](https://lamin.ai/laminlabs/lamindata/transform/JjRF4mACd9m00001).

<img width="500" alt="image" src="https://github.com/user-attachments/assets/d8a5df37-d585-4940-b6f0-91f99b6c436c">

:::

### Explore parameter values

If you want to query all parameter values together with other feature values, use {class}`~lamindb.models.FeatureValue`.

In [None]:
ln.models.FeatureValue.df(include=["feature__name", "created_by__handle"])

## Track functions

If you want more-fined-grained data lineage tracking, use the `tracked()` decorator.

### In a notebook

In [None]:
ln.Feature(name="subset_rows", dtype="int").save()  # define parameters
ln.Feature(name="subset_cols", dtype="int").save()
ln.Feature(name="input_artifact_key", dtype="str").save()
ln.Feature(name="output_artifact_key", dtype="str").save()

Define a function and decorate it with `tracked()`:

In [None]:
@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    dataset = artifact.load()
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_df(new_data, key=output_artifact_key).save()

Prepare a test dataset:

In [None]:
df = ln.core.datasets.small_dataset1(otype="DataFrame")
input_artifact_key = "my_analysis/dataset.parquet"
artifact = ln.Artifact.from_df(df, key=input_artifact_key).save()

Run the function with default params:

In [None]:
ouput_artifact_key = input_artifact_key.replace(".parquet", "_subsetted.parquet")
subset_dataframe(input_artifact_key, ouput_artifact_key)

Query for the output:

In [None]:
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()

This is the run that created the subsetted_artifact:

In [None]:
subsetted_artifact.run

This is the function that created it:

In [None]:
subsetted_artifact.run.transform

This is the source code of this function:

In [None]:
subsetted_artifact.run.transform.source_code

These are all versions of this function:

In [None]:
subsetted_artifact.run.transform.versions.df()

This is the initating run that triggered the function call:

In [None]:
subsetted_artifact.run.initiated_by_run

This is the `transform` of the initiating run:

In [None]:
subsetted_artifact.run.initiated_by_run.transform

These are the parameters of the run:

In [None]:
subsetted_artifact.run.features.get_values()

These input artifacts:

In [None]:
subsetted_artifact.run.input_artifacts.df()

These are output artifacts:

In [None]:
subsetted_artifact.run.output_artifacts.df()

Re-run the function with a different parameter:

In [None]:
subsetted_artifact = subset_dataframe(
    input_artifact_key, ouput_artifact_key, subset_cols=3
)
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()

We created a new run:

In [None]:
subsetted_artifact.run

With new parameters:

In [None]:
subsetted_artifact.run.features.get_values()

And a new version of the output artifact:

In [None]:
subsetted_artifact.run.output_artifacts.df()

See the state of the database:

In [None]:
ln.view()

### In a script

```{eval-rst}
.. literalinclude:: scripts/run_workflow.py
   :language: python
   :caption: run_workflow.py
```

In [None]:
!python scripts/run_workflow.py --subset

In [None]:
ln.view()

## Sync scripts with git

To sync with your git commit, add the following line to your script:

```python
ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>
```

```{eval-rst}
.. literalinclude:: scripts/synced_with_git.py
   :language: python
   :caption: synced_with_git.py
```

:::{dropdown} You'll now see the GitHub emoji clickable on the hub.

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/IpV8Kiq4xUbgXhzl0000.png" width="900px">
:::

## Manage notebook templates

A notebook acts like a template upon using `lamin load` to load it. Consider you run:

```bash
lamin load https://lamin.ai/account/instance/transform/Akd7gx7Y9oVO0000
```

Upon running the returned notebook, you'll automatically create a new version and be able to browse it via the version dropdown on the UI.

Additionally, you can:

- label using `ULabel`, e.g., `transform.ulabels.add(template_label)`
- tag with an indicative `version` string, e.g., `transform.version = "T1"; transform.save()`

:::{dropdown} Saving a notebook as an artifact

Sometimes you might want to save a notebook as an artifact. This is how you can do it:

```bash
lamin save template1.ipynb --key templates/template1.ipynb --description "Template for analysis type 1" --registry artifact
```

:::

In [None]:
assert run.features.get_values() == {
    "input_dir": "./mydataset",
    "learning_rate": 0.01,
    "preprocess_params": {"downsample": True, "normalization": "the_good_one"},
}

assert my_project.artifacts.exists()
assert my_project.transforms.exists()
assert my_project.runs.exists()

# clean up test instance
!rm -r ./test-track
!lamin delete --force test-track