# Initial setup

Install Bambu and required packages:

In [None]:
!add-apt-repository -y ppa:git-core/ppa
!apt-get update
!apt-get install -y --no-install-recommends build-essential ca-certificates gcc-multilib git iverilog verilator
!wget https://release.bambuhls.eu/appimage/bambu-latest.AppImage
!chmod +x bambu-*.AppImage
!ln -sf $PWD/bambu-*.AppImage /bin/bambu
!ln -sf $PWD/bambu-*.AppImage /bin/spider
!ln -sf $PWD/bambu-*.AppImage /bin/tree-panda-gcc
!ln -sf $PWD/bambu-*.AppImage /bin/clang-12
!ln -sf $PWD/bambu-*.AppImage /bin/mlir-opt-12
!ln -sf $PWD/bambu-*.AppImage /bin/mlir-translate-12
!rm -rf PandA-bambu bambu-tutorial
!git clone --depth 1 --filter=blob:none --branch everest-school --sparse https://github.com/ferrandi/PandA-bambu.git
%cd PandA-bambu
!git sparse-checkout set documentation/everest_summer_school
%cd ..
!mv PandA-bambu/documentation/everest_summer_school/ bambu-tutorial

Check the installation:

In [None]:
!bambu -h

In [None]:
!mlir-opt-12 --help

# Bambu inputs and outputs


Example C code in /content/bambu-tutorial/Exercise1/icrc.c

In [None]:
%cd /content/bambu-tutorial/Exercise1
!bambu icrc.c --top-fname=icrc1 --simulator=VERILATOR --simulate --generate-tb=test_icrc1.xml -v2 --print-dot --pretty-print=a.c 2>&1 | tee icrc1.log

Inspect the generated files in the explorer tab on the left:

*   /content/bambu-tutorial/Exercise1/icrc1.v
*   /content/bambu-tutorial/Exercise1/simulate_icrc1.sh
*   /content/bambu-tutorial/Exercise1/synthesize_Synthesis_icrc1.sh
*   /content/bambu-tutorial/Exercise1/a.c



Visualize the FSM:

In [None]:
from graphviz import Source
Source.from_file('HLS_output/dot/icrc1/HLS_STGraph.dot')

Try out:

* different target board
* different clock period
* VHDL instead of Verilog output
* different verbosity level

# HLS of an MLIR kernel

Source code: /content/bambu-tutorial/Exercise2/gemm_32.mlir and /content/bambu-tutorial/Exercise2/helmholtz.mlir

In [None]:
%cd /content/bambu-tutorial/Exercise2/
!mlir-opt-12 /content/bambu-tutorial/Exercise2/gemm_32.mlir -lower-affine -convert-scf-to-std -convert-std-to-llvm='use-bare-ptr-memref-call-conv=1' -o /content/bambu-tutorial/Exercise2/gemm_32.llvm.mlir
!mlir-translate-12 /content/bambu-tutorial/Exercise2/gemm_32.llvm.mlir --mlir-to-llvmir -o /content/bambu-tutorial/Exercise2/gemm_32.ll
!bambu gemm_32.ll --simulate --generate-tb=gemm_32_test.xml --no-clean --compiler=I386_CLANG13 --top-fname=gemm_32 --simulator=VERILATOR -v2 --print-dot |& tee log.txt

Try out:
* synthesize the Helmholtz kernel
* use the --generate-interface option
* apply mlir-opt optimizations before synthesis
* disable function proxies to allocate floating point units in parallel

# OpenMP parallel for

Source code: /content/bambu-tutorial/Exercise3/trinityq4/lubm_trinityq4.c


In [None]:
%cd /content/bambu-tutorial/Exercise3/
!bambu trinityq4/lubm_trinityq4.c --top-fname=search \
   common/atominIncrement.c common/data.c -Icommon/ \
   --compiler=I386_GCC49 --experimental-set=BAMBU -O3 --std=c99 -fno-delete-null-pointer-checks \
   --channels-type=MEM_ACC_11 --memory-allocation-policy=NO_BRAM \
   --device-name=xc7vx690t-3ffg1930-VVD --clock-period=10 \
   -DMAX_VERTEX_NUMBER=26455 -DMAX_EDGE_NUMBER=100573 -DN_THREADS=2  \
   --mem-delay-read=20 --mem-delay-write=20 \
   --generate-tb=test-1.xml --simulator=VERILATOR --simulate \
   --pragma-parse --num-accelerators=2 --memory-banks-number=4 --channels-number=2 --context_switch=4 \
   -v3 |& tee log.txt

# ap_types and ac_types support

Synthesis of an accelerator with 11-bit data, input and output fifo ports.

In [None]:
%cd /content/bambu-tutorial/Exercise4
!bambu ap_example.cpp --simulate --no-clean --compiler=I386_CLANG13 --generate-interface=INFER --top-fname=gcd --simulator=VERILATOR

## Custom floating point synthesis



In [None]:
%cp /content/bambu-tutorial/Exercise2/gemm_32.ll /content/bambu-tutorial/Exercise4
%cp /content/bambu-tutorial/Exercise2/gemm_32_test.xml /content/bambu-tutorial/Exercise4
!bambu gemm_32.ll --simulate --no-clean --compiler=I386_CLANG13 --top-fname=gemm_32 --simulator=VERILATOR --generate-tb=gemm_32_test.xml -v4 --disable-function-proxy --print-dot --fp-format="gemm_32*e8m7b-127tih0" --max-ulp=2000000 |& tee log.bfloat16.txt