# Track notebooks, scripts & functions

For tracking pipelines, see: {doc}`docs:pipelines`.

In [None]:
# pip install lamindb
!lamin init --storage ./test-track

## Track a notebook or script

Call {meth}`~lamindb.track` to register your notebook or script as a `transform` and start capturing inputs & outputs of a run.

```{eval-rst}
.. literalinclude:: scripts/run_track_and_finish.py
   :language: python
```

<br>

:::{dropdown} Here is how a notebook with run report looks on the hub.

Explore it [here](https://lamin.ai/laminlabs/lamindata/transform/PtTXoc0RbOIq).

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/RGXj5wcAf7EAc6J80003.png" width="900px">

:::

You find your notebooks and scripts in the {class}`~lamindb.Transform` registry (along with pipelines & functions). {class}`~lamindb.Run` stores executions.
You can use all usual ways of querying to obtain one or multiple transform records, e.g.:

```python
transform = ln.Transform.get(key="my_analyses/my_notebook.ipynb")
transform.source_code  # source code
transform.runs  # all runs
transform.latest_run.report  # report of latest run
transform.latest_run.environment  # environment of latest run
```

To load a notebook or script from the hub, search or filter the `transform` page and use the CLI.

```bash
lamin load https://lamin.ai/laminlabs/lamindata/transform/13VINnFk89PE
```

## Organize local development

If no development directory is set, script & notebooks keys equal their filenames.
Otherwise, script & notebooks keys equal the relative path in the development directory.

To set the development directory to your current shell development directory, run:

```bash
lamin settings set dev-dir .
```

You can see the current status by running:

```bash
lamin info
```


## Sync scripts with git

To sync scripts with with a git repo, either export an environment variable:

```shell
export LAMINDB_SYNC_GIT_REPO = <YOUR-GIT-REPO-URL>
```

Or set the following setting:

```python
ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>
```

If you work on a single project in your lamindb instance, it makes sense to set LaminDB's `dev-dir` to the root of the local git repo clone.
If you work on multiple projects in your lamindb instance, you can use the `dev-dir` as the local root and nest git repositories in it.

## Use projects

You can link the entities created during a run to a project.

In [None]:
import lamindb as ln

my_project = ln.Project(name="My project").save()  # create a project

ln.track(project="My project")  # auto-link entities to "My project"

ln.Artifact(
    ln.examples.datasets.file_fcs(), key="my_file.fcs"
).save()  # save an artifact

Filter entities by project, e.g., artifacts:

In [None]:
ln.Artifact.filter(projects=my_project).to_dataframe()

Access entities linked to a project.

In [None]:
display(my_project.artifacts.to_dataframe())
display(my_project.transforms.to_dataframe())
display(my_project.runs.to_dataframe())

## Use spaces

You can write the entities created during a run into a space that you configure on LaminHub. This is particularly useful if you want to restrict access to a space. Note that this doesn't affect bionty entities who should typically be commonly accessible.

```python
ln.track(space="Our team space")
```

(track-run-parameters)=

## Track parameters & features

In addition to tracking source code, run reports & environments, you can track run parameters & features.

Let's look at the following script, which has a few parameters.

```{eval-rst}
.. literalinclude:: scripts/run_track_with_params.py
   :language: python
   :caption: run_track_with_params.py
```

Run the script.

In [None]:
!python scripts/run_track_with_params.py  --input-dir ./mydataset --learning-rate 0.01 --downsample

Query for all runs that match certain parameters:

In [None]:
ln.Run.filter(
    params__learning_rate=0.01,
    params__preprocess_params__downsample=True,
).to_dataframe()

Describe & get parameters:

In [None]:
run = ln.Run.filter(params__learning_rate=0.01).order_by("-started_at").first()
run.describe()
run.params

You can also access the CLI arguments used to start the run directly:

In [None]:
run.cli_args

You can also track run features in analogy to artifact features.

In contrast to params, features are validated against the `Feature` registry and allow to express relationships with entities in your registries.

Let's first define labels & features.

In [None]:
experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment_label = ln.Record(name="Experiment1", type=experiment_type).save()
ln.Feature(name="s3_folder", dtype=str).save()
ln.Feature(name="experiment", dtype=experiment_type).save()

In [None]:
!python scripts/run_track_with_features_and_params.py  --s3-folder s3://my-bucket/my-folder --experiment Experiment1

In [None]:
ln.Run.filter(s3_folder="s3://my-bucket/my-folder").to_dataframe()

Describe & get feature values.

In [None]:
run2 = ln.Run.filter(
    s3_folder="s3://my-bucket/my-folder", experiment="Experiment1"
).last()
run2.describe()
run2.features.get_values()

## Track functions

If you want more-fined-grained data lineage tracking, use the `tracked()` decorator.

In [None]:
@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    dataset = artifact.load()
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_dataframe(new_data, key=output_artifact_key).save()

Prepare a test dataset:

In [None]:
df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
input_artifact_key = "my_analysis/dataset.parquet"
artifact = ln.Artifact.from_dataframe(df, key=input_artifact_key).save()

Run the function with default params:

In [None]:
ouput_artifact_key = input_artifact_key.replace(".parquet", "_subsetted.parquet")
subset_dataframe(input_artifact_key, ouput_artifact_key)

Query for the output:

In [None]:
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()

This is the run that created the subsetted_artifact:

In [None]:
subsetted_artifact.run

This is the function that created it:

In [None]:
subsetted_artifact.run.transform

This is the source code of this function:

In [None]:
subsetted_artifact.run.transform.source_code

These are all versions of this function:

In [None]:
subsetted_artifact.run.transform.versions.to_dataframe()

This is the initating run that triggered the function call:

In [None]:
subsetted_artifact.run.initiated_by_run

This is the `transform` of the initiating run:

In [None]:
subsetted_artifact.run.initiated_by_run.transform

These are the parameters of the run:

In [None]:
subsetted_artifact.run.params

These are the input artifacts:

In [None]:
subsetted_artifact.run.input_artifacts.to_dataframe()

These are output artifacts:

In [None]:
subsetted_artifact.run.output_artifacts.to_dataframe()

Re-run the function with a different parameter:

In [None]:
subsetted_artifact = subset_dataframe(
    input_artifact_key, ouput_artifact_key, subset_cols=3
)
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()

We created a new run:

In [None]:
subsetted_artifact.run

With new parameters:

In [None]:
subsetted_artifact.run.params

And a new version of the output artifact:

In [None]:
subsetted_artifact.run.output_artifacts.to_dataframe()

See the state of the database:

In [None]:
ln.view()

### In a script

```{eval-rst}
.. literalinclude:: scripts/run_workflow.py
   :language: python
   :caption: run_workflow.py
```

In [None]:
!python scripts/run_workflow.py --subset

In [None]:
ln.view()

## Manage notebook templates

A notebook acts like a template upon using `lamin load` to load it. Consider you run:

```bash
lamin load https://lamin.ai/account/instance/transform/Akd7gx7Y9oVO0000
```

Upon running the returned notebook, you'll automatically create a new version and be able to browse it via the version dropdown on the UI.

Additionally, you can:

- label using `Record`, e.g., `transform.records.add(template_label)`
- tag with an indicative `version` string, e.g., `transform.version = "T1"; transform.save()`

:::{dropdown} Saving a notebook as an artifact

Sometimes you might want to save a notebook as an artifact. This is how you can do it:

```bash
lamin save template1.ipynb --key templates/template1.ipynb --description "Template for analysis type 1" --registry artifact
```

:::

A few checks at the end of this notebook:

In [None]:
assert run.params == {
    "input_dir": "./mydataset",
    "learning_rate": 0.01,
    "preprocess_params": {"downsample": True, "normalization": "the_good_one"},
}, run.params
assert my_project.artifacts.exists()
assert my_project.transforms.exists()
assert my_project.runs.exists()