# Overview

ZnTrack is designed as an object oriented mapper for [DVC](https://dvc.org/).
For an introduction we highly recommend reading the [DVC Getting Started](https://dvc.org/doc/start).
Besides version controlled data management, DVC provides method for building a dependency graph, tracking parameters, comparing metrics, reducing computational overhead and queueing multiple runs.

**Why does it need an object-oriented mapper?**

Whilst DVC provides all this functionality it is designed to be programming language independent.
This can require writing custom python scripts, reading and writing config files and managing depencencies.

ZnTrack is designed to make these steps as easy and well integrated with Python as possible.
In comparison  to the DVC backbone, it is aimed directly at python developeres and therefore allows a highly adapted and optimized interface.

## Structure
ZnTrack is based on two parts, a class decorator and descriptors for e.g. parameter tracking.

### Node class decorator
        
The `@Node` decorator converts a Python class into a DVC stage, by wrapping the `__init__`, `__call__` and `run`. It handles almost all of the required steps to create a DVC stage.

### ZnTrackOptions

ZnTrackOptions are custom descriptors (implementing a custom `__get__` and `__set__` method) that are used to track parameters as well as defining dependencies, metrics or other outputs.

## Stages
DVC organizes its pipeline in multiple stages.
When using ZnTrack we can write our code inside a Jupyter notebook.
We can make use of this functionality by setting the `nb_name` config as follows:

In [1]:
from zntrack import Node, config

config.nb_name = "01_Intro.ipynb"

In [2]:
from zntrack.utils import cwd_temp_dir
temp_dir = cwd_temp_dir()

Working with DVC requires a GIT and DVC repository which we can set up easily:

In [3]:
!git init
!dvc init

Initialized empty Git repository in /tmp/tmphkzkixbk/.git/
Initialized DVC repository.

You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[0m[31m|[0m                                                                     [31m|[0m
[31m|[0m        DVC has enabled anonymous aggregate usage analytics.         [31m|[0m
[31m|[0m     Read the analytics documentation (and how to opt-out) here:     [31m|[0m
[31m|[0m             <[36mhttps://dvc.org/doc/user-guide/analytics[39m>              [31m|[0m
[31m|[0m                                                                     [31m|[0m
[31m+---------------------------------------------------------------------+
[0m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: <[36mhttps://dvc.org/doc[39m>
- Get help and share ideas: <[36mhttps://dvc.org/chat[39m>
- Star us on GitHub: <[36mhttps://github.com/iterative/dvc[39m>
[0m

To define a stage or node on the execution graph, we can start with a Python class implementing only a `run` method.
The class must implement a `run` method, which is the entry point for the computation executed by DVC.
To convert the class into a ZnTrack Node we apply the `@Node()` decorator to it.

There are two things we can do with the stage at this point.

1. Create a new instance of the Node.
After instantiation of the new instance we can set attributes, e.g. parameters or dependencies.
Ideally no expensive calculations are required at this stage, because we only want to create a frame for our method.

2. Calling the stage.
If no explicit `__call__` method is defined, ZnTrack will add one to our stage.
This method is usually the place to interface with the user, passing parameters, dependencies, outputs, etc. .
After the call, the class will write the stage to the `dvc.yaml` file and we are ready to run the stage via DVC.

In [4]:
@Node()
class Stage0:
    def run(self):
        pass


stage_0 = Stage0()
stage_0()

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script




[NbConvertApp] Writing 10481 bytes to 01_Intro.py


2021-11-30 16:12:43,779 (INFO): Creating 'dvc.yaml'
Adding stage 'Stage0' in 'dvc.yaml'

To track the changes with git, run:

	git add dvc.yaml



In [5]:
!tree

[01;34m.[00m
├── 01_Intro.ipynb
├── dvc.yaml
└── [01;34msrc[00m
    └── Stage0.py

1 directory, 3 files


We can see, that ZnTrack has created a `dvc.yaml` file for us (using DVC in the backend).

In [6]:
from IPython.display import Pretty, display

display(Pretty("dvc.yaml"))

stages:
  Stage0:
    cmd: "python3 -c \"from src.Stage0 import Stage0; Stage0(load=True, name='Stage0').run()\"\
      \ "
    deps:
    - src/Stage0.py


We can see, that it does run `Stage0(load=True).run()` I.e. this method must be able to run on its own.

We can now use `dvc  repro` to execute our code, which in result does nothing yet

In [7]:
!dvc repro

Running stage 'Stage0':                                               core[39m>
> python3 -c "from src.Stage0 import Stage0; Stage0(load=True, name='Stage0').run()" 
Generating lock file 'dvc.lock'                                       core[39m>
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

### ZnTrack Results
We can see, that the Node ran without issues.
Unfortunately, the Node we just created doesn't do anything.
In our first example we would like to create a random number and save the results.
We can do this utilizing `zn.outs` which is a special type of DVC outs file, managed by ZnTrack.
We do this by defining a class level attribute.
This is similar to setting a Python `@property` where `__get__` and `__set__` has some custom handling assigned to it.
In comparison to the `@property` we do not need to think about the `getter/setter`.

In [8]:
from zntrack import zn
from random import randrange


@Node()
class RandomNumber:
    number = zn.outs()

    def run(self):
        self.number = randrange(10)


random_number = RandomNumber()
random_number()

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script




[NbConvertApp] Writing 10481 bytes to 01_Intro.py


2021-11-30 16:12:53,000 (INFO): Adding stage 'RandomNumber' in 'dvc.yaml'

To track the changes with git, run:

	git add nodes/RandomNumber/.gitignore dvc.yaml



We can access the results of our Node by passing `load=True`. This will currently raise a ValueError,
 because we haven't actually executed the `run` method yet. Again, this is done via `dvc repro`

In [9]:
try:
    print(RandomNumber(load=True).number)
except ValueError as err:
    print(err)

Can not load outs / number for <__main__.RandomNumber object at 0x1476e822c8e0>! Check, if the Node you are trying to access has been run? Check, if you are trying to access some results e.g. in the __init__, before the graph has been executed. You could consider adding `exec_=True` to your class to circumvent this behaviour.


In [10]:
!dvc repro

Running stage 'RandomNumber':                                         core[39m>
> python3 -c "from src.RandomNumber import RandomNumber; RandomNumber(load=True, name='RandomNumber').run()" 
Updating lock file 'dvc.lock'                                                   

Stage 'Stage0' didn't change, skipping

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

Now we can have a look at our result and work with it.

In [11]:
RandomNumber(load=True).number

5

Because we are using DVC, rerunning the graph via `dvc repro` will not result in a new computation, but instead it will use the cached value.
Changing this is explained later.

### ZnTrack arguments
Currently, our stage will always yield a random number in the hard coded range 0-9.
ZnTrack Nodes become increasingly more interesting when introducing custom parameters.
 We can now start by adding a maximum value to our Node.


In [12]:
from zntrack import dvc
@Node()
class MaxRandomNumber:
    number = zn.outs()
    maximum = dvc.params()

    def __call__(self, maximum):
        self.maximum = maximum

    def run(self):
        self.number = randrange(self.maximum)


max_random_number = MaxRandomNumber()
max_random_number(maximum=512)

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script




[NbConvertApp] Writing 10481 bytes to 01_Intro.py


2021-11-30 16:13:02,454 (INFO): Adding stage 'MaxRandomNumber' in 'dvc.yaml'

To track the changes with git, run:

	git add dvc.yaml nodes/MaxRandomNumber/.gitignore



In [13]:
!dvc repro

Running stage 'MaxRandomNumber':                                      core[39m>
> python3 -c "from src.MaxRandomNumber import MaxRandomNumber; MaxRandomNumber(load=True, name='MaxRandomNumber').run()" 
Updating lock file 'dvc.lock'                                                   

Stage 'RandomNumber' didn't change, skipping
Stage 'Stage0' didn't change, skipping

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

In [14]:
MaxRandomNumber(load=True).number

249

### Custom Types and Files

When using arguments ZnTrack can handle the most basic python types and also some more complex types such as `pathlib.Path`.
In the following example we introduce using paths as arguments and writing data to a custom output file.
Therefore, we use `dvc.outs`

In [15]:
from pathlib import Path


@Node()
class WriteToFile:
    filename: Path = dvc.outs()

    def __call__(self, filename: Path):
        self.filename = filename
        # we need to create the directory here, because
        #  when creating the Node dvc will add a .gitignore
        #  to this directory.
        self.filename.mkdir(exist_ok=True, parents=True)

    def run(self):
        self.filename.write_text('Lorem Ipsum')

    def read_from_file(self):
        print(self.filename.read_text())


write_to_file = WriteToFile()
write_to_file(filename=Path("outs", "example.txt"))

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script




[NbConvertApp] Writing 10481 bytes to 01_Intro.py


2021-11-30 16:13:12,774 (INFO): Adding stage 'WriteToFile' in 'dvc.yaml'

To track the changes with git, run:

	git add dvc.yaml outs/.gitignore



In [16]:
!dvc repro

Stage 'MaxRandomNumber' didn't change, skipping                       core[39m>
Running stage 'WriteToFile':
> python3 -c "from src.WriteToFile import WriteToFile; WriteToFile(load=True, name='WriteToFile').run()" 
Updating lock file 'dvc.lock'                                                   

Stage 'Stage0' didn't change, skipping
Stage 'RandomNumber' didn't change, skipping

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

We can see, that a file in `outs` with our filename has been created.
The file can be generated anywhere inside the DVC repository. For external Files DVC provides the keyword external which is accessible via `@Node(external=True)`

In [17]:
WriteToFile(load=True).filename

PosixPath('outs/example.txt')

In [18]:
WriteToFile(load=True).read_from_file()

Lorem Ipsum


At this point our `dvc.yaml` file has grown a bit and looks like the following

In [19]:
display(Pretty("dvc.yaml"))

stages:
  Stage0:
    cmd: "python3 -c \"from src.Stage0 import Stage0; Stage0(load=True, name='Stage0').run()\"\
      \ "
    deps:
    - src/Stage0.py
  RandomNumber:
    cmd: "python3 -c \"from src.RandomNumber import RandomNumber; RandomNumber(load=True,\
      \ name='RandomNumber').run()\" "
    deps:
    - src/RandomNumber.py
    outs:
    - nodes/RandomNumber/outs.json
  MaxRandomNumber:
    cmd: "python3 -c \"from src.MaxRandomNumber import MaxRandomNumber; MaxRandomNumber(load=True,\
      \ name='MaxRandomNumber').run()\" "
    deps:
    - src/MaxRandomNumber.py
    params:
    - MaxRandomNumber
    outs:
    - nodes/MaxRandomNumber/outs.json
  WriteToFile:
    cmd: "python3 -c \"from src.WriteToFile import WriteToFile; WriteToFile(load=True,\
      \ name='WriteToFile').run()\" "
    deps:
    - src/WriteToFile.py
    outs:
    - outs/example.txt


We can also look at our `zntrack.json` file investigating the passed arguments:

In [20]:
display(Pretty("params.yaml"))

MaxRandomNumber:
  maximum: 512


### ZnTrack Init

As you may have already noticed we have not created an `__init__` yet.
Arguments are passed to the `__call__` and `ZnTrackOptions (dvc.<...>)` are defined on a class level.
The following example will illustrate, why using the `__init__` can lead to confusing results.
Therefore, we need to keep in mind, that DVC runs the following command:

    python3 -c "from src.Stage0 import Stage0; Stage0(load=True).run()"
    
which we will use to imitate `dvc repro` in the following.

In [21]:
@Node()
class InitStage:
    def __init__(self, value="Not defined"):
        self.value = value

    def run(self):
        print(self.value)

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script
[NbConvertApp] Writing 10481 bytes to 01_Intro.py


In [22]:
init_stage = InitStage(value='Lorem Ipsum')
init_stage()
print(init_stage.value)

2021-11-30 16:13:21,879 (INFO): Adding stage 'InitStage' in 'dvc.yaml'

To track the changes with git, run:

	git add dvc.yaml

Lorem Ipsum


In [23]:
InitStage(load=True).run()

Not defined


We can see, that our passed value is not available during the command that is executed by `DVC`.
This is important to keep in mind, when using ZnTrack.
The issue can be easily solved by using `dvc.params()`.
Although possible, it should be avoided to define them within the `__init__` and go for class level definitions.
Nevertheless, the `__init__` can be used for e.g., defining class attributes or setting `ZnTrackOption`.
We can therefore extend our `MaxRandomNumber` in the following way by a constant minimum value:

In [24]:
@Node()
class InitMaxRandomNumber:
    number = zn.outs()
    maximum = dvc.params()

    def __init__(self):
        self.minimum = 0

    def __call__(self, maximum):
        self.maximum = maximum

    def run(self):
        self.number = randrange(self.minimum, self.maximum)


init_max_random_number = InitMaxRandomNumber()
init_max_random_number(maximum=512)

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script




[NbConvertApp] Writing 10481 bytes to 01_Intro.py


2021-11-30 16:13:28,604 (INFO): Adding stage 'InitMaxRandomNumber' in 'dvc.yaml'

To track the changes with git, run:

	git add nodes/InitMaxRandomNumber/.gitignore dvc.yaml



In [25]:
!dvc repro

Running stage 'InitStage':                                            core[39m>
> python3 -c "from src.InitStage import InitStage; InitStage(load=True, name='InitStage').run()" 
Not defined
Updating lock file 'dvc.lock'                                         core[39m>

Stage 'RandomNumber' didn't change, skipping
Stage 'Stage0' didn't change, skipping
Running stage 'InitMaxRandomNumber':
> python3 -c "from src.InitMaxRandomNumber import InitMaxRandomNumber; InitMaxRandomNumber(load=True, name='InitMaxRandomNumber').run()" 
Updating lock file 'dvc.lock'                                                   

Stage 'MaxRandomNumber' didn't change, skipping
Stage 'WriteToFile' didn't change, skipping

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

In [26]:
InitMaxRandomNumber(load=True).number

270

Because this is an essential property of ZnTrack and differs from most other Python code
 the following example DOES NOT work, because dvc will try to
 run `InitMaxRandomNumer(load=True).run()` without passing a value to `maximum` and therefore resulting in an error!

```python

@Node()
class InitMaxRandomNumberWrong:
    number = zn.outs()
    maximum = dvc.params()
    
    def __init__(self, maximum):
        self.minimum = 0
        self.maximum = maximum
        
    def run(self):
        self.number = randrange(self.minimum, self.maximum)
```

what does work would be the following version.
But for code clarity it should be avoided if possible and the `__call__` should be utilized.
Sometimes a combined approach might be inevitable, because e.g.,
upon class instantiation a generated value shall be passed and later a user value.

In [27]:
@Node()
class InitMaxRandomNumberTrick:
    number = zn.outs()
    maximum = dvc.params()

    def __init__(self, maximum=None):
        self.minimum = 0
        if maximum is not None:
            self.maximum = maximum

    def run(self):
        self.number = randrange(self.minimum, self.maximum)

Submit issues to https://github.com/zincware/ZnTrack.


[NbConvertApp] Converting notebook 01_Intro.ipynb to script
[NbConvertApp] Writing 10481 bytes to 01_Intro.py


In [28]:
init_max_random_number_trick = InitMaxRandomNumberTrick(maximum=4096)
init_max_random_number_trick()
!dvc repro

2021-11-30 16:13:38,980 (INFO): Adding stage 'InitMaxRandomNumberTrick' in 'dvc.yaml'

To track the changes with git, run:

	git add nodes/InitMaxRandomNumberTrick/.gitignore dvc.yaml

Stage 'InitMaxRandomNumber' didn't change, skipping                   core[39m>
Stage 'WriteToFile' didn't change, skipping
Stage 'InitStage' didn't change, skipping
Running stage 'InitMaxRandomNumberTrick':
> python3 -c "from src.InitMaxRandomNumberTrick import InitMaxRandomNumberTrick; InitMaxRandomNumberTrick(load=True, name='InitMaxRandomNumberTrick').run()" 
Updating lock file 'dvc.lock'                                                   

Stage 'MaxRandomNumber' didn't change, skipping
Stage 'Stage0' didn't change, skipping
Stage 'RandomNumber' didn't change, skipping

To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.
[0m

In [29]:
InitMaxRandomNumberTrick(load=True).number

4088

In [30]:
temp_dir.cleanup()