# Basic netunicorn experiment example
This example shows basic client-side usage of netunicorn API.
Prerequisites:
- overall understanding of the project
- deployed netunicorn infrastructure and director services
- known `endpoint`, `login`, and `password` for the connection

To work with the project, you need to install several packages:
- `netunicorn-base` - provides abstractions and classes to create pipelines and define experiments. If you want to just define your pipeline and write tasks, you need only this package.
- `netunicorn-client` - provides connectivity to netunicorn infrastructure. You need this package to submit and execute experiments.
- `netunicorn-library` - a library of predefined and contributed tasks for the platform. You can use tasks in this package for your pipelines, and submit your code here to share. Please note, that most of the tasks there are provided 'as-is' by other teams and developers, and netunicorn team doesn't guarantee their correctness.

In [1]:
%pip install netunicorn-base
%pip install netunicorn-client
%pip install netunicorn-library

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Let's import needed classes

In [2]:
import os
import time

# for pretty printing of JSONs
from pprint import pprint

# client to connect to the infrastructure
from netunicorn.client.remote import RemoteClient, RemoteClientException

# basic abstraction for experiment creation and management
from netunicorn.base.experiment import Experiment, ExperimentStatus
from netunicorn.base.pipeline import Pipeline

# task to be executed in the pipeline
# you can write your own tasks, but now let's use simple predefined one
from netunicorn.library.tasks.basic import SleepTask

At first, we want to define a pipeline to execute.

Pipelines consist of tasks located on different stages, and would be executed by each node where they would be assigned later.

To create a pipeline, instantiate Pipeline object and use the method `.then()` to define a new stage. All tasks on the same stage would be started together, and the next stage would start only when all tasks from the current stage successfully finished.

In [3]:
# we will use simple SleepTask for this example
# you can look at the source code of the SleepTaskLinuxImplementation in netunicorn-library to understand how it works

pipeline = Pipeline()

# Notice, that executor will first in parallel execute `sleep 5` and `sleep 3`...
pipeline = pipeline.then([
    SleepTask(5),
    SleepTask(3)
]).then(
    # ...and after they finished (after 5 second in total) will execute `sleep 10`
    SleepTask(10)
)

You can combine multiple tasks and stages to create your own pipelines. Each instance of a task in a pipeline would be serialized (together with all variables) and sent to the executor. The result of task (and pipeline in general) execution would be serialized and sent back to you.

Experiment is defined as one or multiple assignments of pipelines to particular nodes. To define an experiment, we should get available nodes in infrastructure.

To access the infrastructure, you need several parameters, provided by netunicorn deployment administrators (e.g., your university or company). In the next cell we will read these parameters from environment variables or use default values if they are not set.

In [4]:
# if you have .env file locally for storing credentials
if '.env' in os.listdir():
    from dotenv import load_dotenv
    load_dotenv(".env")

# API connection endpoint
endpoint = os.environ.get('NETUNICORN_ENDPOINT') or 'http://localhost:26611'

# user login
login = os.environ.get('NETUNICORN_LOGIN') or 'test'

# user password
password = os.environ.get('NETUNICORN_PASSWORD') or 'test'

In [5]:
# let's create a client with these parameters
client = RemoteClient(endpoint=endpoint, login=login, password=password)
client.healthcheck()

True

Using client, you can receive information about available nodes, and then filter them and take needed amount for your experiment.

In [6]:
# let's receive all available for us nodes
nodes = client.get_nodes()

In [7]:
# let's explore existing nodes
for element in nodes:
    print(element)

<Uncountable node pool with next node template: [aws-fargate-A-, aws-fargate-B-, aws-fargate-ARM64-]>
[snl-server-5, atopnuc-84:47:09:17:c1:b7, atopnuc-84:47:09:17:c4:83, atopnuc-84:47:09:17:c8:0c, atopnuc-84:47:09:17:c1:df, atopnuc-84:47:09:17:c0:b6, atopnuc-84:47:09:17:c2:14, atopnuc-84:47:09:17:c1:8d, atopnuc-84:47:09:17:c0:f6, atopnuc-84:47:09:17:c0:57, atopnuc-84:47:09:16:b6:cf, raspi-e4:5f:01:72:89:99, raspi-e4:5f:01:56:d9:3a, raspi-e4:5f:01:56:d8:cd, raspi-e4:5f:01:8d:f5:95, raspi-e4:5f:01:56:d9:0a, raspi-e4:5f:01:75:ae:8d, raspi-e4:5f:01:75:6b:2c, raspi-e4:5f:01:75:6e:53, raspi-e4:5f:01:56:d9:8b, raspi-e4:5f:01:78:6f:2e, raspi-e4:5f:01:8e:27:aa, raspi-e4:5f:01:79:4a:18, raspi-e4:5f:01:56:d6:ce, raspi-e4:5f:01:a0:4a:dd, raspi-e4:5f:01:a0:4a:77, raspi-e4:5f:01:a7:ae:70, raspi-e4:5f:01:8d:ca:12, raspi-e4:5f:01:a7:b1:af, raspi-e4:5f:01:6f:ee:14, raspi-e4:5f:01:72:a4:93, raspi-e4:5f:01:a0:4f:c5, raspi-e4:5f:01:ad:c9:04, raspi-e4:5f:01:9c:20:81, raspi-e4:5f:01:75:54:04, raspi-e4:5f:0

**Attention:** if you are executing this notebook with another instance of netunicorn, your available nodes would differ from the ones shown here. Feel free to skip the next several cells or modify them to fit your infrastructure.

Here we meet the first important concept of netunicorn - node pools.

Node pools are objects that represent existing nodes in the infrastructure. Generally, there exists two types of node pools:
- CountableNodePool - represents a set of nodes with a fixed number of nodes. Usually used for static infrastructures with real physical nodes.
- UncountableNodePool - represents a set of nodes where nodes are created dynamically. Typical examples of such infrastructures are cloud providers, such as AWS, GCP, Azure, etc.

CountableNodePool can contain other node pools, while Uncountable cannot. When you request for nodes from the client, you always receive a CountableNodePool, that can contain 0 or more other pools. Let's verify it:

In [8]:
type(nodes)

netunicorn.base.nodes.CountableNodePool

Both CountableNodePool and UncountableNodePool have unified interface, but some methods could work differently. For example, we can use \_\_getitem__ to get a node (or another pool) from CountableNodePool:

In [9]:
node_pool = nodes[0]
print(type(node_pool))
print(node_pool)

<class 'netunicorn.base.nodes.UncountableNodePool'>
<Uncountable node pool with next node template: [aws-fargate-A-, aws-fargate-B-, aws-fargate-ARM64-]>


Meanwhile, UncountableNodePool have infinite number of nodes. So instead, this method would return a _template_ of a node:

In [10]:
node_pool[0]

aws-fargate-A-

Given this difference, we suggest to use the next methods to work with nodes from pools:
- `take` - returns Sequence of N nodes from the pool (or generate N nodes, if it is UncountableNodePool)
- `skip` - returns a new pool without N first nodes (or pools)
- `filter` - accepts boolean function and returns a new pool with nodes (or pools) that satisfy this function

Let's see how it works:

In [11]:
for element in nodes:
    print(element)

print()
print(f"Take three first nodes: {nodes.take(3)}")
print(f"Skip first object in the `nodes` pool and take three next nodes: {nodes.skip(1).take(3)}")  # notice, that we skipped the whole node pool
print(f"Filter example: {nodes.filter(lambda node: node.name.startswith('raspi-') or node.name.startswith('aws-fargate-A-'))}")

<Uncountable node pool with next node template: [aws-fargate-A-, aws-fargate-B-, aws-fargate-ARM64-]>
[snl-server-5, atopnuc-84:47:09:17:c1:b7, atopnuc-84:47:09:17:c4:83, atopnuc-84:47:09:17:c8:0c, atopnuc-84:47:09:17:c1:df, atopnuc-84:47:09:17:c0:b6, atopnuc-84:47:09:17:c2:14, atopnuc-84:47:09:17:c1:8d, atopnuc-84:47:09:17:c0:f6, atopnuc-84:47:09:17:c0:57, atopnuc-84:47:09:16:b6:cf, raspi-e4:5f:01:72:89:99, raspi-e4:5f:01:56:d9:3a, raspi-e4:5f:01:56:d8:cd, raspi-e4:5f:01:8d:f5:95, raspi-e4:5f:01:56:d9:0a, raspi-e4:5f:01:75:ae:8d, raspi-e4:5f:01:75:6b:2c, raspi-e4:5f:01:75:6e:53, raspi-e4:5f:01:56:d9:8b, raspi-e4:5f:01:78:6f:2e, raspi-e4:5f:01:8e:27:aa, raspi-e4:5f:01:79:4a:18, raspi-e4:5f:01:56:d6:ce, raspi-e4:5f:01:a0:4a:dd, raspi-e4:5f:01:a0:4a:77, raspi-e4:5f:01:a7:ae:70, raspi-e4:5f:01:8d:ca:12, raspi-e4:5f:01:a7:b1:af, raspi-e4:5f:01:6f:ee:14, raspi-e4:5f:01:72:a4:93, raspi-e4:5f:01:a0:4f:c5, raspi-e4:5f:01:ad:c9:04, raspi-e4:5f:01:9c:20:81, raspi-e4:5f:01:75:54:04, raspi-e4:5f:0

Notice, that in the `filter` example we received a new pool with two node pools inside, each containing nodes satisfying the condition.

Let's explore the node object itself:

In [12]:
example_node = nodes.take(5)[0] 
print(example_node.name)  # unique identifier of the node
print(example_node.properties)  # different properties that could be used for filtering
print(example_node.architecture)  # architecture of the node

aws-fargate-A-4
{'cpu': 256, 'memory': 512, 'netunicorn-environments': ['DockerImage'], 'connector': 'aws-fargate'}
Architecture.LINUX_AMD64


You can use any part of the node for filtering if needed. For full Node description, please refer to the documentation.

Let's select several real world Raspberry Pi nodes for our experiment:

In [13]:
interesting_nodes = nodes.filter(lambda node: node.name.startswith('raspi-'))  # feel free to change it if needed
working_nodes = interesting_nodes.take(3)
print(working_nodes)

[raspi-e4:5f:01:72:89:99, raspi-e4:5f:01:56:d9:3a, raspi-e4:5f:01:56:d8:cd]


In [14]:
# Cell for those who playing with this notebook on a local infrastructure:
if os.environ.get('NETUNICORN_ENDPOINT', 'http://localhost:26611')  == 'http://localhost:26611':
    working_nodes = client.get_nodes().take(1)

For simplicity, our first experiment would consist of all nodes running the same pipeline. Let's create Experiment instance and add all nodes with the pipeline using `map()` method. You can read the documentation about other methods of creating assignments (called `Deployments` in netunicorn).

In [15]:
experiment = Experiment().map(pipeline, working_nodes)

# let's explore experiment object
print(experiment)
print()

# and separate Deployments
for deployment in experiment:
    print(deployment.node)
    print(deployment.environment_definition)
    print()

 - Deployment: Node=raspi-e4:5f:01:72:89:99, executor_id=, prepared=False
 - Deployment: Node=raspi-e4:5f:01:56:d9:3a, executor_id=, prepared=False
 - Deployment: Node=raspi-e4:5f:01:56:d8:cd, executor_id=, prepared=False

raspi-e4:5f:01:72:89:99
DockerImage(commands=[], runtime_context=RuntimeContext(ports_mapping={}, environment_variables={}, additional_arguments=[]), image=None, build_context=BuildContext(python_version='3.10.9', cloudpickle_version='2.2.1'))

raspi-e4:5f:01:56:d9:3a
DockerImage(commands=[], runtime_context=RuntimeContext(ports_mapping={}, environment_variables={}, additional_arguments=[]), image=None, build_context=BuildContext(python_version='3.10.9', cloudpickle_version='2.2.1'))

raspi-e4:5f:01:56:d8:cd
DockerImage(commands=[], runtime_context=RuntimeContext(ports_mapping={}, environment_variables={}, additional_arguments=[]), image=None, build_context=BuildContext(python_version='3.10.9', cloudpickle_version='2.2.1'))



You can notice, that each deployment has `environment definition` - important part of the deployment, that stores environment variables, port mappings, decide where pipeline would be executed, etc.

To submit the experiment, we need to create a user-wide unique name for the experiment, and call an appropriate method of the `client` object. Notice, that you can submit the same experiment several times with different names to be executed more than once.

In [16]:
experiment_name = 'experiment_cool_name'
try:
    # just in case you already have this experiment
    client.delete_experiment(experiment_name)
except RemoteClientException as e:
    # if not, exception would be fired
    pass

client.prepare_experiment(experiment, experiment_name)

'experiment_cool_name'

When you submit an experiment, netunicorn services automatically create or download a virtual environment for execution of the pipeline, insert serialized pipeline inside and distribute these environments to the desired nodes.

To check status of the experiment, you can use corresponding method of the `client` object.

In [17]:
# status will change from PREPARING to READY when compiled and deployed
while True:
    info = client.get_experiment_status(experiment_name)
    print(info.status)
    if info.status == ExperimentStatus.READY:
        break
    time.sleep(20)

ExperimentStatus.PREPARING
ExperimentStatus.PREPARING
ExperimentStatus.PREPARING
ExperimentStatus.PREPARING
ExperimentStatus.READY


Given that we used default settings for deployments, during the execution of the previous cell, netunicorn:
1. Extracted all requirements for tasks and analyzed nodes 
2. Compiled new Docker images for each combination of pipeline and architecture
3. Distributed these Docker images to experiment nodes

In [18]:
# One of the returned objects is a prepared experiment. It holds all the information about deployments compilation
# Some nodes could be failed to prepare due to various reasons
prepared_experiment = info.experiment
for deployment in prepared_experiment:
    print(f"Node name: {deployment.node}")
    print(f"Deployed correctly: {deployment.prepared}")
    print(f"Error: {deployment.error}")
    print()

Node name: raspi-e4:5f:01:72:89:99
Deployed correctly: True
Error: None

Node name: raspi-e4:5f:01:56:d9:3a
Deployed correctly: True
Error: None

Node name: raspi-e4:5f:01:56:d8:cd
Deployed correctly: True
Error: None



When the status is ready, nodes are prepared for execution and downloaded all the needed environments and pipelines (don't forget to check `prepared` status of the returned experiment to confirm).

Now you can ask `client` object to start the experiment. It will ask nodes to spin up executors and will collect the execution results.

In [19]:
client.start_execution(experiment_name)

'experiment_cool_name'

In [20]:
# Again, let's check experiment status until it changes from RUNNING to FINISHED
import time
while True:
    info = client.get_experiment_status(experiment_name)
    print(info.status)
    if info.status != ExperimentStatus.RUNNING:
        break
    time.sleep(10)

ExperimentStatus.RUNNING
ExperimentStatus.RUNNING
ExperimentStatus.RUNNING
ExperimentStatus.RUNNING
ExperimentStatus.FINISHED


During the experiment execution, netunicorn:
1. Started Docker containers on each node of the experiment
2. Each container started `netunicorn-executor`, that loaded the pipeline and executed it.
3. During the execution, `executor` periodically notified `director` services with `heartbeat` messages, that it is alive and working
4. After pipeline execution, `executor` uploaded results to the netunicorn and finalized the environment
5. After all executors finished, `director` services collected results and finalized the experiment

Let's explore the results:

In [21]:
from returns.pipeline import is_successful
from returns.result import Result

for report in info.execution_result:
    print(f"Node name: {report.node.name}")
    print(f"Error: {report.error}")

    result, log = report.result  # report stores results of execution and corresponding log
    
    # result is a returns.result.Result object, could be Success of Failure
    print(type(result))
    if isinstance(result, Result):
        data = result.unwrap() if is_successful(result) else result
        pprint(data)

    # we also can explore logs
    for line in log:
        print(line.strip())
    print()

Node name: raspi-e4:5f:01:72:89:99
Error: None
<class 'returns.result.Success'>
defaultdict(<class 'list'>,
            {'1f9ed7d5-3d07-4986-9510-b449372b990d': [<Success: 3>],
             '247fd6b0-2304-4b73-9c64-c14a5ff43d5c': [<Success: 10>],
             'fabb97ab-3c03-43fe-825c-7119e1538234': [<Success: 5>]})
Parsed configuration: Gateway located on https://pinot.cs.ucsb.edu/dev/netunicorn/gateway
Current directory: /
Pipeline loaded from local file, executing.
Pipeline finished, start reporting results.

Node name: raspi-e4:5f:01:56:d9:3a
Error: None
<class 'returns.result.Success'>
defaultdict(<class 'list'>,
            {'1f9ed7d5-3d07-4986-9510-b449372b990d': [<Success: 3>],
             '247fd6b0-2304-4b73-9c64-c14a5ff43d5c': [<Success: 10>],
             'fabb97ab-3c03-43fe-825c-7119e1538234': [<Success: 5>]})
Parsed configuration: Gateway located on https://pinot.cs.ucsb.edu/dev/netunicorn/gateway
Current directory: /
Pipeline loaded from local file, executing.
Pipeline fi

Congrats! 

We (hope, you too ^_^) successfully finished the basic experiment using the netunicorn platform.

For next steps, you can read the documentation on creating more complex experiments, including writing your own tasks, providing your own Docker containers, experiment synchronization, etc.

For all questions, refer to the official organization: https://github.com/netunicorn and netunicorn team.