# Intel® Advisor - Roofline Analysis

This sections demonstrates how to collect and generate a roofline report using Intel Advisor.

##### Sections
- [What is the Roofline Model?](#What-is-the-Roofline-Model?)
- _Analysis:_ [Roofline Analysis Report](#Roofline-Analysis-Report)
- [Finding Effective Optimization Strategies](#Finding-Effective-Optimization-Strategies)
- [Command Line Options for GPU Roofline Analysis](#Command-Line-Options-for-GPU-Roofline-Analysis)
- [Using Roofline Analysis on Intel GPU](#Using-Roofline-Analysis-on-Intel-GPU)

## Learning Objectives
- Explain how Intel® Advisor performs GPU Roofline Analysis.
- Run the GPU Roofline Analysis using command line syntax.
- Use GPU Roofline Analysis to identify effective optimization strategies.


## What is the Roofline Model?

A Roofline chart is a visual representation of application performance in relation to hardware limitations, including memory bandwidth and computational peaks.

## Requirements for a Roofline Model on a GPU

  * Application must be at least partially running on a GPU
   * Gen9 or Gen11 integrated graphics (gen9 or gen11)
   * Offload must be implemented with OpenMP, SYCL, DPC++, or OpenCL
  * A recent version of Intel® Advisor (Beta 4)
  * Generating a Roofline Model on GPU generates a multi-level roofline
   * A single loop generates several dots
   * Each dot can be compared to its own memory (GTI/L3/DRAM/SLM)


## Gen9 Memory Hierarchy

![image](assets/gen9.png)

## Advisor Command-Line for Collecting and Reporting "roofline"

In [None]:
%%writefile advisor_roofline.sh
#!/bin/bash
source /opt/intel/inteloneapi/setvars.sh > /dev/null 2>&1
export ADVIXE_EXPERIMENTAL=gpu-profiling
advixe-cl –collect=survey --enable-gpu-profiling --project-dir=./advisor_roofline --search-dir src:r=. -- ./rtm_stencil
advixe-cl –collect=tripcounts --stacks --flop --enable-gpu-profiling --project-dir=./advisor_roofline --search-dir src:r=. -- ./rtm_stencil
advixe-cl --report=roofline --gpu --project-dir=./advisor_roofline --report-output=./advisor_roofline/roofline.html


## Display Advisor "roofline" Report

  * GPU Roofline Performance Insights
  * Highlights poor performing loops
  * Shows performance ‘headroom’  for each loop
    * Which can be improved
    * Which are worth improving
  * Shows likely causes of bottlenecks
    * Memory bound vs. compute bound
    * Suggests next optimization steps


## Roofline Analysis Report

Let's run a roofline report -- this is another <b>live</b> report that is interactive.

[Intel Advisor Roofline report](assets/roofline.html)

In [1]:
import os
os.system('/bin/echo $(whoami) is running DPCPP_Essentials Module5 -- Roofline_Analysis - 2 of 2 roofline.html')
from IPython.display import IFrame
IFrame(src='assets/roofline.html', width=1024, height=769)

# Finding Effective Optimization Strategies

GPU Roofline Performance Insights

  * Highlights poor performing loops
  * Shows performance ‘headroom’  for each loop
   * Which can be improved
   * Which are worth improving
  * Shows likely causes of bottlenecks
   * Memory bound vs. compute bound
  * Suggests next optimization steps
  
  ![alt text](assets/roofline1.png "Optimization Strategies.")


## Command Line Options for GPU Roofline Analysis

The Roofline model on GPU is a technical preview feature and is not available by default. To enable it: 
```bash
export ADVIXE_EXPERIMENTAL=gpu-profiling
```

To run the GPU Roofline analysis in the Intel® Advisor CLI:
Run the Survey analysis with the <span style="color:blue">--enable-gpu-profiling</span> option:
```bash advixe-cl –collect=survey --enable-gpu-profiling --project-dir=<my_project_directory> --search-dir src:r=<my_source_directory> -- ./myapp [app_parameters]
```
Run the Trip Counts and FLOP analysis with <span style="color:blue">--enable-gpu-profiling</span> option:
```bash 
advixe-cl –collect=tripcounts --stacks --flop --enable-gpu-profiling --project-dir=<my_project_directory> --search-dir src:r=<my_source_directory> -- ./myapp [app_parameters]
```

Generate a GPU Roofline report:
```bash 
advixe-cl --report=roofline --gpu --project-dir=<my_project_directory> --report-output=roofline.html
```
Open the generated <span style="color:blue">roofline.html</span> in a web browser to visualize GPU performance.




## Using Roofline Analysis on Intel GPU

![alt text](assets/roofline2.png "Roofline Analysis on Intel GPU.")


## Showing Dots for all Memory Sub-systems

![alt text](assets/roofline3.png "More info.")

## Add Labels

![alt text](assets/roofline4.png "Labeling.")

## Clean the View

![alt text](assets/roofline5.png "Clean View.")

## Show the Guidance

![alt text](assets/roofline6.png "Guidance.")

## Short Case Study

First Analysis on Finite Difference Kernel

  * An initial analysis shows that we are not maximizing the performance on any roofline.
  * The code uses workgroups of 1x1x1 which prevents vectorization.
  * An idea would be to manually control the work sent to the compute units in a way that we define the workgroups by ourselves and provide sizes that would at least match vectorization better.


![alt text](assets/roofline7.png "Analysis.") 
    


## Second Analysis on Finite Difference Kernel

  * By using 4*8*8, the performance increases and we are now maximizing the throughput on L3.
  * To optimize further, we need to find a way to modify arithmetic intensity of the L3 dot. 
  * Other analysis involves checking L3 cache misses and Vtune might also help to get a better performance.

 ![alt text](assets/roofline8.png "Analysis.") 

## Summary

  * We ran a roofline report.
  * Explored the features of the roofline report and learned how to interpret the report.
  * Examined the information to determine where speedup opportunites exist.