In [1]:
# after kernel restart, always execute this line
%matplotlib inline
import sys
sys.path.insert(0, 'path/to/everest-basecamp')

Basceamp inference flow / DOSA part
----------------------------------------
This example assumes the training & export as shown e.g. in notebook 01. 

### ML-developer part
This time, a pytorch model is used as input. Also, some data are used to calibrate the weights for the quantized datatype. 

In [2]:
from ebc import basecamp


batch_size = 1

emli = basecamp.ml_inference
emli.set_constraints(app_name="EVEREST traffic prediction v1",
                     onnx_input_name='x',
                     input_shape=[batch_size, 17],
                     input_size_t=8,  # in bits 
                     quantization='int8', 
                     batch_size=batch_size,
                     target_throughput=5000,  # batch_size inferences per second
                     arch_gen_strategy='throughput'
                     )
# set outpupt path
emli.set_output_path('build_dirs/etp_v1')

# set pytorch model as input
emli.set_model_path("torchscript", './etp_01.pt')

# we can pass calibration data
emli.set_calibration_data_path('./traffic_data_2023-04-26.npy')

### Performance engineer part
(just as example)
As described in the user story, the performance engineer could modify global or high-level configuration of DOSA to accomodate applicaiton specific architectural trade-offs. 

In [3]:
emli.dosa_config['build']['comm_message_interleaving'] = 5

### DOSA compile

In [4]:
# to enable the roofline graphics
# emli.enable_roofline_gui()

emli.compile()

  from .autonotebook import tqdm as notebook_tqdm


[DOSA:config:INFO] Allowing a degredation of the throughput of 0.1 from the targeted throughput.
[DOSA:build:INFO] Not deleting existing content in output dir.
DOSA: Building OSGs, communication and device libraries...
	...done.

DOSA: Parsing constraints...
	...done.

DOSA: Importing TorchScript...
	...done.
DOSA: Starting quantization translation...
	...done.
DOSA: Starting quantization calibration (1000 steps) ...


  return torch._C._cuda_getDeviceCount() > 0


	...done.
DOSA: Building quantized AST...
Previous exported models deleted!
step brevitas export
step tidy up


                i.e. domain=finn to domain=qonnx.custom_op.<general|fpgadataflow|...>


step streamline




step finn to DOSA




	...done.

DOSA: Executing TVM optimization passes...
	...done.

DOSA: Generating high-level architecture...
[DOSA:OICALC:INFO] overwriting dtypes of function (input) with DosaDtype.int8 (orig: float32).
[DOSA:OICALC:INFO] overwriting dtypes of function FN_0001 with DosaDtype.int8 (orig: float32).
[DOSA:OICALC:INFO] overwriting dtypes of function FN_0002 with DosaDtype.int8 (orig: float32).
[DOSA:OICALC:INFO] overwriting dtypes of function FN_0003 with DosaDtype.int8 (orig: float32).
#[version = "0.0.5"]
def @main(%global_in: Tensor[(1, 17), float32] /* ty=Tensor[(1, 17), float32] */) -> Tensor[(1, 4), float32] {
  %4 = fn (%p02: Tensor[(1, 17), float32] /* ty=Tensor[(1, 17), float32] */, %p12: Tensor[(16, 17), float32] /* ty=Tensor[(16, 17), float32] */, %p21: Tensor[(16, 255), float32] /* ty=Tensor[(16, 255), float32] */, %p31: Tensor[(16, 255), float32] /* ty=Tensor[(16, 255), float32] */, Primitive=1) -> Tensor[(1, 16), float32] {
    %2 = nn.dense(%p02, %p12, units=None, out_dtype

dosa_infer.cpp: In function ‘int infer_batch(char*, uint32_t, char*, uint32_t)’:
   bool last_instruction_was_recv = true;
        ^~~~~~~~~~~~~~~~~~~~~~~~~
LIB//ZRLMPI.cpp: In function ‘int receiveHeader(long unsigned int, uint8_t, uint8_t, uint32_t, int, uint8_t*, bool, uint32_t)’:
     if(!copyToCache && header.dst_rank != own_rank)
                        ~~~~~~~~~~~~~~~~^~~~~~~~~~~
   uint32_t expected_packet_cnt = 0;
            ^~~~~~~~~~~~~~~~~~~
LIB//ZRLMPI.cpp: In function ‘void send_internal(int*, int, uint8_t, int, int, uint8_t)’:
     uint8_t buffer[(max_udp_payload_bytes+400)*typewidth*sizeof(uint32_t) + MPIF_HEADER_LENGTH];
                                                                                               ^
         if(count_of_this_message > max_udp_payload_bytes)
            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
LIB//ZRLMPI.cpp: In function ‘void recv_internal(int*, int, uint8_t, int, int, uint8_t, uint8_t*)’:
   if(ret > count*typewidth*sizeof(uint

[DOSA:hls4mlOSG:INFO] Small input, using reuse_factor 1.
[DOSA:hls4mlOSG:INFO] starting hls4ml tool...
Interpreting Model
Topology:
Layer name: input_1, layer type: InputLayer, input shapes: [[1, 17]], output shape: [1, 17]
Layer name: fn_0001_nn_dense, layer type: Dense, input shapes: [[1, 17]], output shape: [1, 16]
Layer name: fn_0002_nn_dense, layer type: Dense, input shapes: [[1, 16]], output shape: [1, 8]
Layer name: fn_0003_nn_dense, layer type: Dense, input shapes: [[1, 8]], output shape: [1, 4]
Creating HLS model
Writing HLS project
Done
[DOSA:hls4mlOSG:DEBUG] determined prod_with: 17
[DOSA:hls4mlOSG:DEBUG] determined prod_with: 17

[VERBOSE] best draft found:
{
  "name": "EVEREST traffic prediction v1",
  "total_flops": 836.0,
  "total_parameter_bytes": 6552,
  "predicted_performance_iter_hz": 310882.260827914,
  "total_dse_time_s": "0.02",
  "total_nodes": 0,
  "nodes": [
    {
      "folder": "node_0",
      "ranks": [
        0
      ],
      "type": "CPU_dummy_x86-1",
   

In [5]:
! ls -l build_dirs/etp_v1

total 56
-rw-rw-r--. 1 ngl ngl  2568 Jan 30 10:53 arch_info.json
drwxrwxr-x. 5 ngl ngl    56 Jan 30 10:08 cFBuild1
-rw-rw-r--. 1 ngl ngl   394 Jan 30 10:53 cluster.json
-rwxrwxr-x. 1 ngl ngl   752 Jan 30 10:53 dosa_build.sh
-rw-rw-r--. 1 ngl ngl  9343 Jan 30 10:53 dosa_deploy.py
-rw-rw-r--. 1 ngl ngl  2383 Jan 30 10:53 dosa_report.py
-rw-rw-r--. 1 ngl ngl 20565 Jan 30 10:53 generated_architecture.json
-rw-rw-r--. 1 ngl ngl   346 Jan 30 10:53 ml_inference.section
drwxrwxr-x. 3 ngl ngl   167 Jan 30 10:53 node_0
drwxrwxr-x. 6 ngl ngl   125 Jan 30 10:53 node_1
drwxrwxr-x. 3 ngl ngl    75 Jan 30 10:53 quantized_model
drwxrwxr-x. 2 ngl ngl     6 Jan 29 18:53 tmp_rpt_dir


Please pay attention to the `ml_inference.section` file, which we will need later to introduce this "application variant". 

Basecamp Climbs part & App emit
------------------------------------
This part assumes the software-only app exists with the necessary annotations, as shown in the example below: 
For `app.py` this is:
```
...
# to indicate the initialization (yes, the @basecamp 'pragma' is a comment)
# @basecamp climbs init args={"action_name" : "sys.argv[4]", "host_address" : "sys.argv[5]"}
...

# then, to mark the code to accelerate
...
x = torch.from_numpy(din).float()
# @basecamp climbs accelerate begin args={"x": "x", "y": "y_pred"}
dnn.update_coeffs(coeffs, road_id)
y = dnn(x)
y_pred = y.detach().numpy()
# @basecamp climbs accelerate end
...

```

Also, the `Dockerfile` has the following comment (after installing python, but before `ENTRYPOINT`):
```
# @basecamp climbs init
```

The necessary arguments to list in the `args=` JSON annotation is explained using the `ebc climbs describe --flow` command (or the python API).

In [6]:
basecamp.climbs.describe('ml_inference')

The ML inference flow of the EVEREST SDK requires the following arguments:
1. Setup:
    - The `action_name` of the corresponding IBM cloudFPGA service.
    - The `host_address` (host ip address) to be used to connect to the FPGAs.
    - Both are then submitted via `dosa_net.init_from_action(action_name, host_address)`
2. Execution:
    - One numpy.ndarray `x` as input, where the first axis are the batches. 
    - The output is returned in another array `y`.
    - The inference is then called via `y = dosa_net.infer_batch(x)`


### Combine the Climb
Now, we add all the necessary files of the SW only --- i.e. the "main" app --- to everest basecamp climbs (or `e_climbs`). Then, we add the accelerated version combiled by DOSA as module/variant. Finally, we emit the combined app. 

In [7]:
e_climbs = basecamp.climbs
# first, create a new Climb
e_climbs.create('accelerated_tpred_simple', '/home/ngl/gitrepos/EVEREST/traffic_prediction/build_dirs/tpred_integrated_v1/')
# please note the resulting `.climb` file

Climb /home/ngl/gitrepos/EVEREST/traffic_prediction/build_dirs/tpred_integrated_v1/accelerated_tpred_simple.climb created successfully.


In [8]:
! ls -l traffic_prediction/tpred_app

total 24
-rw-rw-r--. 1 ngl ngl 7010 Jan 29 20:06 app.py
-rw-rw-r--. 1 ngl ngl  790 Jan 29 18:14 client.py
drwxrwxr-x. 2 ngl ngl   64 Jan 29 19:56 data
-rw-rw-r--. 1 ngl ngl  285 Jan 29 19:27 Dockerfile
-rw-rw-r--. 1 ngl ngl  704 Jan 29 18:42 Readme.md
-rw-rw-r--. 1 ngl ngl  180 Jan 29 18:14 requirements.txt


In [9]:
my_climb_file = 'build_dirs/tpred_integrated_v1/accelerated_tpred_simple.climb'
app_dir = 'traffic_prediction/tpred_app/'

# now, we add all the files of the SW only app
e_climbs.add_file(app_dir + 'app.py', 'python', my_climb_file)
e_climbs.add_file(app_dir + 'Dockerfile', 'docker', my_climb_file)
# 'copy' means, the files are copied without modification/analysis, also directorys are copied recursively
e_climbs.add_file(app_dir + 'client.py', 'copy', my_climb_file)
e_climbs.add_file(app_dir + 'data/', 'copy', my_climb_file)
e_climbs.add_file(app_dir + 'Readme.md', 'copy', my_climb_file)
e_climbs.add_file(app_dir + 'requirements.txt', 'copy', my_climb_file)

In [10]:
# now, we add the DOSA compiled files as 'variant'
e_climbs.add_module('traffic_prediction/build_dirs/etp_v1/ml_inference.section', my_climb_file)

### Emit the Climb
i.e. generate the code

In [11]:
e_climbs.emit(my_climb_file)
# optionaly, an output directory could be specified. 
# If not, the directory where the `.climb` file is will be used: 'traffic_prediction/build_dirs/tpred_integrated_v1/'



In [12]:
! ls -l traffic_prediction/build_dirs/tpred_integrated_v1/

total 60
-rw-rw-r--. 1 ngl ngl   745 Jan 30 10:53 accelerated_tpred_simple.climb
-rw-rw-r--. 1 ngl ngl 11502 Jan 30 10:53 app.py
-rw-rw-r--. 1 ngl ngl  1234 Jan 30 10:53 basecamp_build_and_run_instructions.md
-rw-rw-r--. 1 ngl ngl   790 Jan 30 10:53 client.py
-rw-rw-r--. 1 ngl ngl   394 Jan 30 10:53 cluster.json
drwxrwxr-x. 2 ngl ngl    64 Jan 30 10:33 data
-rw-rw-r--. 1 ngl ngl   485 Jan 30 10:53 Dockerfile
-rw-rw-r--. 1 ngl ngl  9343 Jan 30 10:53 dosa_deploy.py
-rw-rw-r--. 1 ngl ngl  1723 Jan 30 10:53 margot.json
drwxrwxr-x. 4 ngl ngl   181 Jan 30 10:37 node_0
-rw-rw-r--. 1 ngl ngl   420 Jan 30 10:53 ops.json
-rw-rw-r--. 1 ngl ngl   704 Jan 30 10:53 Readme.md
-rw-rw-r--. 1 ngl ngl   180 Jan 30 10:53 requirements.txt


Now, the app is emitted and the code can switch dynamically between the SW only or the FPGA version, depending on the availability of the FPGAs. 

For example, the `app.py` has the following code inserted:
```python
# at the place where the # @basecamp climbs init was:
# generated by EVEREST basecamp
from node_0 import dosa_root
dosa_net = dosa_root.DosaRoot(8)
dosa_net.init_from_action(sys.argv[4], sys.argv[5])
dosa_net.reset()

# at the place where the # @basecamp climbs accelerate was:
# generated by EVEREST basecamp
import time

@tune(tuner,
      knobs=[{"name": "version", "type": "int"}],
      metrics=[{"name": "time", "function": extract_time}],
      features=[{"name": "hw", "function": margot_select_variant}])
def everest_accelerate(version = 0):
    start = time.time()
    try:
        if variant == 0:
            dnn.update_coeffs(coeffs, road_id)
            y = dnn(x)
            y_pred = y.detach().numpy()
        if variant == 1:
            y_pred = dosa_net.infer_batch(x)
        except:  # fallback to cpu version
            dnn.update_coeffs(coeffs, road_id)
            y = dnn(x)
            y_pred = y.detach().numpy()
    end = time.time()
    return end-start

everest_accelerate()

```


Further build and run instructions can be found in `tpred_integrated_v1/basecamp_build_and_run_instructions.md`:

In [14]:
! cat traffic_prediction/build_dirs/tpred_integrated_v1/basecamp_build_and_run_instructions.md

Build and run instructions for the EVEREST accelerated app: accelerated_tpred_simple


Build:
-------

For the runtime tuner:
```bash
sudo snap install mosquitto  # or other ways: https://mosquitto.org/download/
docker pull margotpolimi/brian:1.0
docker pull margotpolimi/stub_agora:1.0
```

Apparently, the app has it's own Dockerfile, so maybe:
```bash
docker build -f Dockerfile -t ebc_accelerated_accelerated_tpred_simple:latest .
```

Furthermore, the following modules have specific installation instructions:
- **`ml_inference`**:
Some python modules are required (will be executed by the Dockerfile, if applicable):
```
virtualenv venv -p /usr/bin/python3.8
. venv/bin/activate
pip install -r node_0/requirements.txt
```


Run:
-------

For the runtime tuner:
```bash
docker run -d --rm --network host --name brian margotpolimi/brian:1.0
docker run -d --rm --network host --name stub_agora margotpolimi/stub_agora:1.0
mosquitto_pub -t agora/traffic^0.1^block1/knowledge -f ops.json
```

To st