Darryl Oatridge, August 2022

In [1]:
import os

In [2]:
os.environ['HADRON_PM_PATH'] = '0_hello_meta/demo/contracts'
os.environ['HADRON_DEFAULT_PATH'] = '0_hello_meta/demo/data'

## Controller
The Controller is a unique component that independantly orchestrates the components registered to it. It executes the components Domain Contract and not its code. Domain Contracts belonging to a Controller should be in the same path location as the Controllers Domain Contract.  The Controller executes the registered Controllers Domain Contracts in accordance to the instructions given to it when the ``run_components`` is executed. The Controller orchestrates how those components should run with the components being independant in their actions and therefore a separation of concerns. With Controller you do not need to give it a name as this is assumed in each folder containing Domain Contracts for this set of components, known as a Domain Contract Cluster.  This allows us the entry point to interogate the Controller and its components.

 

In [3]:
from ds_discovery import Controller

In [4]:
controller = Controller.from_env(has_contract=False)

### Add Components

Now we have the empty Controller we need to register or add which components make up this Controller, it should be noted that the Domain Contracts for each component must be in the same folder of the Controller Domain Contract.

To add a component we use the intent method specific for that component type in this case `model_transition` for `hello_tr` and `model_wrangle` for `hello_wr`.

In [5]:
controller.intent_model.transition(canonical=0, task_name='hello_tr', intent_level='hw_transition')

In [6]:
controller.intent_model.wrangle(canonical=0, task_name='hello_wr', intent_level='hw_wrangle')

### Report

Using the Task report we can check the components have been added.  

In [7]:
controller.report_tasks()

Unnamed: 0,level,order,component,task,parameters,creator
0,hw_transition,0,Transition,'hello_tr',[],doatridge
1,hw_wrangle,0,Wrangle,'hello_wr',[],doatridge


As with all components the Controller executes the components in the order given.  By using the Controller's special Run Book we are given considerabily more flexability in the order and behaviour of each component and how it interacts with others.  

As good practice a Run Book should always be created for each Controller as this provides better transparency into how the components run.

In [8]:
run_book = [
    controller.runbook2dict(task='hw_transition'),
    controller.runbook2dict(task='hw_wrangle'),
]
controller.add_run_book(run_levels=run_book)

### Run Controller Pipeline
To run the controller we execute `run_controller` this is a special method and replaces `run_component_pipeline`, common to other components, adding extra features to enable the control of the registared components. This is the only method you can use to run the Controller and execute its registared components. It is worth noting it is the components that produce the outcome of their collective objectives or tasks and not the Controller.  The Controller orchestrates how those components should run with the components being independant in their actions and therefore a separation of concerns.  

In [9]:
controller.run_controller()

The Controller is a powerful tool and should be investigated further to understand all its options.  The Run Book can be used to provide a set of instructions on how each component recieves its source and persists, be it to another component or as an external data set.  The `run_controller` has useful tools to monitor changes in incoming data and provide a run report of how all the components ran.

-------------------------------

In the section below we will demonstrate a couple of these features.

One of the most useful parameters that comes with the `run_controller` is the `run_cycle_report` that saves off a run report, that provides the run time of the controller and the components there in.

In [10]:
controller.run_controller(run_cycle_report='cycle_report.csv')
controller.load_canonical(connector_name='run_cycle_report')

Unnamed: 0,time,text
0,2022-12-04 11:25:46.797362,start run-cycle 0
1,2022-12-04 11:25:46.798716,start task cycle 0
2,2022-12-04 11:25:46.800535,running hw_transition
3,2022-12-04 11:25:48.871490,"canonical shape is (1309, 10)"
4,2022-12-04 11:25:48.874526,running hw_wrangle
5,2022-12-04 11:25:48.927769,"canonical shape is (1309, 13)"
6,2022-12-04 11:25:48.929326,tasks complete
7,2022-12-04 11:25:48.930743,end of report


Now we have the `run_cycle_report` we can observe the other parameters. In this case we are adding the `run_time` parameter that runs the controllers components for a time period of three seconds

In [11]:
controller.run_controller(run_time=3, run_cycle_report='cycle_report.csv')
controller.load_canonical(connector_name='run_cycle_report')

Unnamed: 0,time,text
0,2022-12-04 11:25:48.954783,start run-cycle 0
1,2022-12-04 11:25:48.955975,start task cycle 0
2,2022-12-04 11:25:48.957836,running hw_transition
3,2022-12-04 11:25:51.724604,"canonical shape is (1309, 10)"
4,2022-12-04 11:25:51.726417,running hw_wrangle
5,2022-12-04 11:25:51.763047,"canonical shape is (1309, 13)"
6,2022-12-04 11:25:51.764218,tasks complete
7,2022-12-04 11:25:51.765259,sleep for 1 seconds
8,2022-12-04 11:25:52.766775,start run-cycle 1
9,2022-12-04 11:25:52.768569,start task cycle 0


In this example we had the parameters `repeat` and `sleep` where the first defines the number of times to repeat the  component cycleand the second, and the number of seconds to pause between each cycle.

In [12]:
controller.run_controller(repeat=2, sleep=3, run_cycle_report='cycle_report.csv')
controller.load_canonical(connector_name='run_cycle_report')

Unnamed: 0,time,text
0,2022-12-04 11:25:54.447504,start run-cycle 0
1,2022-12-04 11:25:54.449032,start task cycle 0
2,2022-12-04 11:25:54.451115,running hw_transition
3,2022-12-04 11:25:56.220527,"canonical shape is (1309, 10)"
4,2022-12-04 11:25:56.222606,running hw_wrangle
5,2022-12-04 11:25:56.274872,"canonical shape is (1309, 13)"
6,2022-12-04 11:25:56.276414,tasks complete
7,2022-12-04 11:25:56.277962,sleep for 3 seconds
8,2022-12-04 11:25:59.282215,start task cycle 1
9,2022-12-04 11:25:59.284029,running hw_transition


Finally we use the `source_check_uri` parameter as a pointer to and input source to watch for changes.

In [13]:
controller.run_controller(repeat=3, source_check_uri='https://www.openml.org/data/get_csv/16826755/phpMYEkMl.csv', run_cycle_report='cycle_report.csv')
controller.load_canonical(connector_name='run_cycle_report')

Unnamed: 0,time,text
0,2022-12-04 11:26:01.399082,start run-cycle 0
1,2022-12-04 11:26:01.400208,start task cycle 0
2,2022-12-04 11:26:06.909143,running hw_transition
3,2022-12-04 11:26:08.498987,"canonical shape is (1309, 10)"
4,2022-12-04 11:26:08.500951,running hw_wrangle
5,2022-12-04 11:26:08.536517,"canonical shape is (1309, 13)"
6,2022-12-04 11:26:08.538500,tasks complete
7,2022-12-04 11:26:08.539621,start task cycle 1
8,2022-12-04 11:26:12.990562,Source has not changed
9,2022-12-04 11:26:12.992190,start task cycle 2
