# System-Level Design (and Modeling for Embedded Systems)

## **Lecture 7 – Computation Modeling & Refinement**

Kim Grüttner kim.gruettner@dlr.de
Henning Schlender henning.schlender@dlr.de
Jörg Walter joerg.walter@offis.de

System Evolution and Operation German Aerospace Center (DLR)

&

Distributed Computation and Communication OFFIS



© 2009 Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

## Lecture 7: Outline



## Processor layers

- Application
- Task/OS
- Firmware
- Hardware

## Processor synthesis

- Software synthesis
- Hardware synthesis

## System-On-Chip Environment (SCE)





## Multi-Processor System-On-Chip (MPSoC)



- Growing system complexities and sizes
  - Heterogeneous multi-processor systems (MPSoC)
- Increasing significance of embedded software
  - Growing software content
- System design at higher levels of abstraction
  - Validation and analysis
  - Concurrent hardware and software development
  - Implementation synthesis
- Design of embedded software and processors
  - Large influence on system performance, power, etc.
  - Actual SW on ISS is accurate but slow
  - ➤ High-level models for early and accurate feedback
  - Software synthesis

## **General Processor Micro-Architecture**



- Basic system component is a processor (PE)
  - Programmable, general-purpose software processor (CPU)
  - Programmable special-purpose processor (e.g. DSPs)
  - Application-specific instruction set processor (ASIP)
  - Custom hardware processor



## Functionality and timing

# **Processor Models (1)**



#### Structural RTL models





Software processor

Hardware processor

## > Sub-cycle accurate

# **Processor Models (2)**



Behavioral RTL/IS models



> Cycle accurate

# High-Level Computation Modeling





## Application modeling

- Native process execution (C code)
- Back-annotated execution timing

## Processor modeling

- Operating system
  - Real-time multi-tasking (RTOS model)
  - Bus drivers (C code)
- Hardware abstraction layer (HAL)
  - Interrupt handlers
  - Media accesses
- Processor hardware
  - Bus interfaces (I/O state machines)
  - Interrupt suspension and timing

Source: G. Schirner, A. Gerstlauer, R. Doemer. "Abstract, Multifaceted Modeling of Embedded Processors for System Level Design," ASPDAC07

## **Processor Model: Application Layer**



- High-level, abstract programming model
  - Hierarchical process graph
    - ANSI C leaf processes
    - Parallel-serial composition
  - Abstract, typed inter-process communication
    - Channels
    - Shared variables



- > Timed simulation of application functionality (SLDL)
  - Back-annotate timing
    - Estimation or measurement (trace, ISS)
    - Function or basic block level granularity
  - Execute natively on simulation host
    - Discrete event simulator
    - Fast, native compiled simulation





Logical time

# **Processor Model: Task Layer**



- Scheduling
  - Group processes into tasks
    - Static scheduling
  - Schedule tasks
    - Dynamic scheduling, multitasking
    - Preemption, interrupt handling
    - Task communication (IPC)



- OS model on top of standard SLDL
  - Wrap around SLDL primitives,

replace event handling

- Block all but active task
- Select and dispatch tasks
- Target-independent, canonical API
  - Task management
  - Channel communication
  - Timing and all events



# **OS Modeling**



High-level RTOS abstraction







Specification *TLM* 

**Implementation** 

- Specification is fast but inaccurate
  - Native execution, concurrency model
- Traditional ISS-based validation infeasible
  - Accurate but slow (esp. in multi-processor context), requires full binary
- Model of operating system
  - High accuracy but small overhead at early stages
  - > Focus on key effects, abstract unnecessary implementation details
  - > Model all concepts: Multi-tasking, scheduling, preemption, interrupts, IPC

Source: A. Gerstlauer, H. Yu, D. Gajski. "RTOS Modeling for System-Level Design," DATE03.

# Simulated Dynamic Behavior





## **RTOS Model Implementation**



#### RTOS model

- OS, task, event management
  - Descriptors & queues
- Scheduling
  - Select and dispatch task based on algorithm
  - Block all but active task on SLDL level
- Preemption
  - Allow rescheduling at simulation time increases
- Event handling
  - Remove task temporarily from OS while waiting for SLDL event

#### > RTOS model library

- RTOS models for different scheduling strategies
  - Round robin, priority based
- Parametrizable
  - Task parameters (priorities)

```
channel OS implements OSAPI {
     Task current = 0;
     os queue rdyq;
     void dispatch(void) {
       current = schedule();
       notify (curleme.evene,
     void yield() {
       task = current;
10
       dispatch();
       wait(task.event);
     void time wait(time t) {
15
       waitfor(t);
       yield();
     Task pre wait(void) {
20
       Task t = rdyq.get(current);
       dispatch(); return t;
     void post wait(Task t) {
       rdyq.put(t);
       wait(t.event);
```

## **RTOS Model Interface**



Canonical, target-independent API

```
interface OSAPI
      void init();
                                                    OS management
      void start(int sched alg);
      void interrupt return();
5
      Task task create (char *name, int type,
                       sim time period);
      void task terminate();
      void task sleep();
10
                                                    Task management
      void task activate(Task t);
      void task endcycle();
      void task kill(Task t);
      Task par start();
      void par end(Task t);
15
      Task pre wait();
                                                    Event handling
      void post wait(Task t);
      void time wait(sim time nsec);
20
                                                    Delay modeling
  };
```

## **Task Refinement**





## **Processor Model: Task Layer**



## Scheduling

- Group processes into tasks
  - Static scheduling
- Schedule tasks
  - Dynamic scheduling, multitasking
  - Preemption, interrupt handling
  - Task communication (IPC)



## Scheduling refinement

- Flatten hierarchy
- Reorder behaviors

#### OS refinement

- Insert OS model
- Task refinement
- IPC refinement



## **Processor Model: Firmware Layer**



## **Hardware Abstraction Layer (HAL)**

- Interrupt handling
- External communication
  - Software Drivers
    - Presentation, Session, Packeting
    - Synchronization (e.g. Interrupts)
  - TLM Bus model
    - User transactions
  - However, interrupts are still unscheduled



## **Processor Model: Hardware Layer**



Unscheduled (HAL):

Scheduled (HW):



### **Hardware Layer**

- Hardware interrupt handling
  - Interrupt Scheduling
    - » Suspend user code
    - » Priority, Nesting
- Media Access Control (MAC) for bus interface
  - - Split user transaction into bus transaction
- Arbitrated TLM bus model



### **Processor Model: Bus-Functional Layer**



- Processor bus-functional model (BFM)
  - Pin-accurate model of processor
    - Cycle approximate for SW execution
  - · Bus model
    - Pin-accurate
    - Cycle-Accurate





# **Processor Model - Summary**



- Layered model
  - Feature levels
- Processor layers
  - Application
    - Native C
  - Task
    - OS model
  - Hardware abstraction
    - Middleware
  - Processor hardware
    - Bus I/F
    - Interrupts, suspension



| Features                                 |           |
|------------------------------------------|-----------|
| Target approx. computation timing        | Appl. 🗸 🔟 |
| Task mapping, dynamic scheduling         | Task      |
| Task communication, synchronization      |           |
| Interrupt handlers, low level SW drivers |           |
| HW interrupt handling, int. scheduling   | <u> </u>  |
| Cycle accurate communication             |           |
| Cycle accurate computation               |           |

## Lecture 7: Outline



## ✓ Processor layers

- ✓ Application
- √ Task/OS
- ✓ Firmware
- √ Hardware

## Processor synthesis

- Software synthesis
- Hardware synthesis

# **Software Synthesis**







- Automatically generate target binaries from TLM
  - Generate code for application (tasks and IPC)
  - Synthesize firmware (drivers, interrupt handlers)
  - OS wrappers and HAL implementations from DB
  - > Compile and link against target RTOS and libraries

Source: G. Schirner, A. Gerstlauer, R. Doemer. "Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications," ASPDAC08

## **Processor Implementation Models**





#### Software C model

- Generated application C code
  - Flat standard ANSI C code
- Firmware and hardware models
  - RTOS model, HAL model
  - Low-level & hardware interrupt handling
  - External bus communication protocol/TLM



#### Software ISS model

- Reintegrared processor ISS
  - Bus-functional ISS wrapper
- Running generated binary
  - Application, RTOS, drivers, HAL

# **Single-Processor Experiments**



#### Voice encoding and decoding

- Motorola DSP 56600
  - Encoding & decoding tasks
  - custom OS
- 4 custom I/O blocks
- 1 custom HW co-processor
  - Codebook search

#### Processor models

- Perfect timing
  - Back-annotated from ISS
- Priority-based OS model
  - EDF: Decoder > Encoder
- HW interrupt scheduling
  - 4 non-preempted priority levels

#### Reference

Motorola proprietary ISS



# **Processor Modeling Results**



- Execute on Sun Fire V240 (1.5 GHz)
  - 163 speech frames
- Speed vs. accuracy
  - ➤ OS model (Appl ⇒ Task)
  - ➤ Interrupts (FW ⇒ TLM)

> 1800x speed w/ 3% error (vs. cycle-accurate ISS)



## Lecture 7: Outline



## ✓ Processor layers

- ✓ Application
- √ Task/OS
- ✓ Firmware
- √ Hardware

## Processor synthesis

- √ Software synthesis
- Hardware synthesis

# **High-Level Synthesis (1)**



#### Allocation



Datapath

Source: D. Shin, A. Gerstlauer, R. Dömer, D. Gajski, "An Interactive Design Environment for C-based High-level Synthesis of RTL Processors," TVLSI, April 2008.

# **High-Level Synthesis (2)**



## Scheduling





**FSMD** 

# **High-Level Synthesis (3)**



### Binding



Source: Accellera, "RTL Semantics," http://www.eda.org/alc-cwg/cwg-open.pdf

# **SCE Interactive RTL Synthesis**







## RTL Modeling Example

```
behavior FSMD Example (
        signal in bool
                              CLK,
                                           // system clock
        signal in bool
                                           // system reset
                              RST,
        signal in bit[31:0]
                              Inport,
                                           // input ports
        signal in bit[1]
                             Start,
        signal out bit[31:0] Outport,
                                           // output ports
        signal out bit[1]
                              Done)
  void main(void)
                                           // clock + sensitivity
     fsmd (CLK)
         bit[32] a, b, c, d, e;
                                          // local variables
                      { Outport = 0;
                                          // default
                        Done = 0b;
                                          // assignments
                                         // reset actions
          if (RST)
                      { goto S0;
                      { if (Start) goto S1;
          S0 :
                                  goto S0;
                        else
                                     // state actions
          S1 :
                      \{ a = b + c; \}
                        d = Inport * e; // (register transfers)
                        Outport = a;
                        goto S2;
          ...}
};
```



# Mapped RTL Example

```
behavior FSMD Example (
       signal in bool
                             CLK,
                                          // system clock
       signal in bool
                                         // system reset
                             RST,
       signal in bit[31:0] Inport,
                                          // input ports
       signal in bit[1]
                             Start,
       signal out bit[31:0] Outport,
                                          // output ports
       signal out bit[1]
                             Done)
  void main(void)
     fsmd (CLK)
                                          // clock + sensitivity
         bit[32] a, b, c, d, e;
                                          // unmapped variables
                                       // default
                     { Outport = 0;
                       Done = 0b;
                                         // assignments
                     { goto S0;
                                        // reset actions
         if (RST)
         S0:
                     { if (Start) goto S1;
                                  goto S0;
                       else
          S1:
                     \{ a = b + c; \}
                                         // Accellera style 1
                       d = Inport * e; // (unmapped)
                       Outport = a;
                       goto S2;
};
```



# Mapped RTL Example

```
behavior FSMD Example(
        signal in bool
                              CLK,
                                           // system clock
        signal in bool
                                          // system reset
                             RST,
        signal in bit[31:0] Inport,
                                           // input ports
        signal in bit[1]
                             Start,
        signal out bit[31:0] Outport,
                                           // output ports
        signal out bit[1]
                             Done)
  void main(void)
     fsmd (CLK)
                                           // clock + sensitivity
         buffered[CLK] bit[32] RF[5];
                                           // register file
                                        // default
                      { Outport = 0;
                        Done = 0b;
                                          // assignments
                      { goto S0;
                                         // reset actions
          if (RST)
          S0:
                      { if (Start) goto S1;
                                  goto S0;
                        else
                      { RF[0]=RF[1]+RF[2]; // Accellera style 2
          S1:
                        RF[3]=Inport*RF[4];// (storage mapped)
                        Outport = RF[0];
                        goto S2;
};
```



# Mapped RTL Example

```
behavior FSMD Example (
        signal in bool
                              CLK,
                                           // system clock
        signal in bool
                                           // system reset
                              RST,
        signal in bit[31:0] Inport,
                                           // input ports
        signal in bit[1]
                              Start,
        signal out bit[31:0] Outport,
                                           // output ports
        signal out bit[1]
                              Done)
  void main(void)
     fsmd (CLK)
                                           // clock + sensitivity
         buffered[CLK] bit[32] RF[5];
                                           // register file
                                        // default
                      { Outport = 0;
                        Done = 0b;
                                           // assignments
                      { goto S0;
                                         // reset actions
          if (RST)
                      { if (Start) goto S1;
          S0:
                                   goto S0;
                        else
          S1:
                      \{ RF[0] =
                                           // Accellera style 3
                         ADDO(RF[1], RF[2]);// (function mapped)
                        RF[3] =
                         MULO(Inport, RF[4]);
                        Outport = RF[0];
                        goto S2;
};
```



# Mapped RTL Example

```
behavior FSMD Example (
        signal in bool
                              CLK,
                                           // system clock
        signal in bool
                                           // system reset
                              RST,
        signal in bit[31:0] Inport,
                                           // input ports
        signal in bit[1]
                              Start,
        signal out bit[31:0] Outport,
                                           // output ports
        signal out bit[1]
                              Done)
  void main(void)
     fsmd (CLK)
                                           // clock + sensitivity
         buffered[CLK] bit[32] RF[5];
                                           // register file
         bit[32] BUSO, BUS1, BUS2;
                                           // busses
                      { Outport = 0;
                                           // default
                        Done = 0b;
                                           // assignments
                      { goto S0;
                                          // reset actions
          if (RST)
                      { if (Start) goto S1;
          S0:
                                   goto S0;
                        else
          S1:
                      \{ BUS0 = RF[1]; \}
                                          // Accellera style 4
                                           // (connection mapped)
                        BUS1 = RF[2];
                        BUS3 = ADD0 (BUS0, BUS1);
                        RF[0] = BUS3;
                        goto S2;
};
```



# Mapped RTL Example

```
behavior FSMD Example (
        signal in bool
                             CLK,
                                          // system clock
        signal in bool
                                          // system reset
                             RST,
        signal in bit[31:0] Inport,
                                          // input ports
        signal in bit[1]
                             Start,
        signal out bit[31:0] Outport,
                                          // output ports
        signal out bit[1]
                             Done)
  void main(void)
     fsmd (CLK)
                                          // clock + sensitivity
         signal bit[5:0] RF CTRL;
                                          // control wires
         signal bit[1:0] ADDO CTRL, MULO CTRL;
                      { Outport = 0;
                                       // default
                       Done = 0b;
                                          // assignments
                                       // reset actions
                      { goto S0;
          if (RST)
                      { if (Start) goto S1;
          S0 :
                                  goto S0;
                        else
                      { RF CTRL = 011000b; // Accellera style 5
          S1:
                        ADDO CTRL = 01b; // (exposed control)
                        MUL0 CTRL = 11b;
                        goto S2;
};
```

# **Lecture 7: Summary**



### OS and Processor Modeling

- Model of software running in execution environment
  - Timed application, OS, bus drivers, interrupt handlers
  - Processor hardware model, suspension, bus interfaces
- Virtual platform prototype
  - > Embedded software development and validation
  - Viable complement to ISS-based validation

## Backend processor synthesis

- Software synthesis
  - Code generation, RTOS targeting, cross-compilation & linking
  - Fully automatic final target binary generation
- Hardware synthesis
  - High-level/behavioral synthesis: allocation, scheduling, binding
  - Interactive C-to-RTL synthesis flow

#### **Student Assistent Job Offer**



#### Henning.Schlender@dlr.de

#### Tasks

- Context: Autonomous Driving
- ROS 2 Development (C++)
- Implementing ROS 2 Components
- Co-Simulation with Carla Simulator
- Implementing Automated Driving Scenarios

#### Requirements

- Bachelor's Degree (desirable)
- Experience in
  - C++
  - Gitlab
  - Linux









