# VTune Profiling on Intel Developer Cloud

VTune Profiling can be accomplished on Intel Developer Cloud using one of two methods:

1. [VTune Server Web UI for Collection and Analysis](#VTune-Server-Web-UI-for-Collection-and-Analysis)
2. [VTune Command Line for Collection and VTune GUI for Analysis](#VTune-Command-Line-for-Collection-and-VTune-GUI-for-Analysis)

__Option 1__ requires you to __start VTune server on Intel Developer Cloud__ and you can __access the VTune UI in web browser__, it also requires you to __setup ssh reverse proxy from local machine__, VTune Web UI may be slow to load data since it is fetched from cloud. It does not require any local installation, but does have extra step of setting up SSH keys for Intel Developer Cloud so that you can start a reverse proxy to access VTune web UI.

__Option 2__ requires you to __install Intel VTune Profiler on your local machine__ for analysis. VTune collection is done using __cmd line on Intel Developer Cloud__ and the data is copied to local machine for analysis. This option is faster to do analysis since the data and VTune application is installed on local machine.

## VTune Server Web UI for Collection and Analysis

For this setup, you have to __start VTune server on Intel Developer Cloud__, you will have to __setup ssh reverse proxy from local machine__ and you can __access the VTune UI in web browser__ as shown in the picture below.

<img src="assets/vtune_server_block.png" width="75%">

Quick setup instructions are for advanced users, refer to detailed setup instructions below for detailed step-by-step guide.

### Quick Setup Instructions:

1. Start VTune server on Intel Developer Cloud:

   `$vtune-backend --allow-remote-access --enable-server-profiling`

    Make a note of IP and PORT in the URL displayed

2. In local machine terminal, create a reverse proxy:

   `ssh uXXXXXXXXXXXXXXX@idcbetabatch.eglb.intel.com -L PORT:IP:PORT`

3. In local machine browser, copy the URL from step 1 and replace IP with `127.0.0.1`

   `https://127.0.0.1:PORT/?xxxx`

    This will launch VTune Web UI

4. Select "Analysis Configuration" in VTune Web UI, enter the "Application" full path of binary on Intel Developer Cloud (pwd), Select "Performance Snapsot" to "GPU Hotspot" and Run


### Detailed Setup Instructions:
Below are the steps to setup VTune server and VTune Web UI:
1. Start VTune server on Intel Developer Cloud
2. Create a reverse proxy from local machine
3. Launch VTune Web UI
4. Start VTune Analysis

#### Start VTune server on Intel Developer Cloud
In Jupyter terminal (or SSH terminal), start vtune server:
```
$vtune-backend --allow-remote-access --enable-server-profiling
```

The above command will display a URL to open in browser which will look like shown below, make note of IP_ADDRESS and PORT in the URL
<img src="assets/vtune_server.png">


#### Create a reverse proxy from localhost

In Intel Developer Cloud Console:
- select "Training and Workshops"
- click on title of any training, and select "Options" dropdown
- select "Upload Key" and add ssh key, if u have not added one before
- click "Launch Using SSH"
- Copy the ssh command

In local machine terminal, setup reverse proxy:

(ssh command copied above) +  `-L PORT:IP_ADDRESS:PORT`
```
ssh xxxx@xxxx.com -L PORT:IP_ADDRESS:PORT
```
Command should look like below:
```
ssh uXXXXXXXXXXXXXXX@idcbetabatch.eglb.intel.com -L 39317:10.10.10.32:45571
```

#### Launch VTune Web UI

Copy URL displayed in the Jupyter Terminal from earlier step.

Open Browser on local machine and open the URL with IP_ADDRESS replaced with `127.0.0.1`, it should look similar to the one below:
```
https://127.0.0.1:45571/?xxxx
```
This should launch VTune Web UI

#### Start VTune Analysis from Web UI

- Click "Configure Analysis"
- In the "Application" field enter the full path to the binary you want to profile:
  - Compile any SYCL code to get binary like `a.out`
  - To get full path, `cd` to the location of binary in Jupyter terminal and enter `pwd`, copy the full path
  - example: `/home/uXXXXX/test/a.out`
- For "Performance Snapshot" select "GPU Hotspots"
- Click the Run button
<img src="assets/vtune_web_configure.png">
- Wait for collection to complete and then UI will load for analysis
- Navigate to the "Graphics" tab and then "Platform" tab to analyze performance timeline and compute stats
- Refer to VTune Profiler documentation for more information


Instead of doing "Configure Analysis" from the VTune Web UI, you can also collect data using the vtune command line and then open the analysis on VTune Web UI:
- In a Jupyter terminal, run VTune command line to collect data:
```
vtune -collect gpu-hotspots -result-dir ~/intel/vtune/projects/vtune_data $(pwd)/a.out
```
Make sure to set the `-result-dir` value to ` ~/intel/vtune/projects/<name-for-project>`

- The `vtune_data` will appear in "Project Navigator" panel in the left side of Web UI, click on it to open analysis

## VTune Command Line for Collection and VTune GUI for Analysis

For this setup, you have start VTune collection using __vtune cmd line on Intel Developer Cloud__, the data has to be copied to local machine and you have to __install Intel VTune Profiler on your local machine__ for analysis of data as shown in picture below:

<img src="assets/vtune_gui_block.png" width="75%">

Quick setup instructions are for advanced users, refer to detailed setup instructions below for detailed step-by-step guide.

### Quick Setup Instructions:

1. Collect VTune data on Intel Developer Cloud:

    `vtune -collect gpu-hotspots -result-dir vtune_data $(pwd)/a.out`

2. Compress the `vtune_data` folder

   `tar -cvf vtune_data.tgz vtune_data`

3. Download `vtune_data.tgz` to local computer and uncompress

   `tar -xvf vtune_data.tgz`

4. Open results in local installation of VTune

    Install VTune on oneAPI Base Toolkit on local machine and launch VTune

    Select "Open Results" and point to the `*.vtune` file in `vtune_data` folder


### Detailed Setup Instructions:
Below are steps to run vtune cmd line and download data for analysis on local machine.
   1. Use vtune command line to collect vtune profiling data
   2. Compress vtune data
   3. Download vtune data to local computer
   4. Install VTune Profiler on local computer and open vtune data for analysis

#### Collect VTune data using command line
- Modify the module's example code and then "Build and Run", this will generate the binary `a.out`
- Then in "Terminal", go to the current module directory and run the following vtune command (change the `-result-dir` value from `vtune_data` to something that identifies your code) 
```
vtune -collect gpu-hotspots -result-dir vtune_data $(pwd)/a.out
```

#### Compress vtune data
- Compress the vtune results directory to copy to your location computer (GUI)
```
tar -cvf vtune_data.tgz vtune_data
```
#### Download and Uncompress vtune data
- Download the compressed vtune results:
    -  If using Jupyter, right click on the `*.tgz` file and select "Download"
    -  If using `ssh`, use `scp` to copy the `*.tgz` to your GUI computer
- Uncompress the vtune results files:
```
tar -xvf vtune_data.tgz
```

#### VTune Analysis on local machine
- On your computer, install "Intel VTune Profiler" from [__Intel oneAPI Base Toolkit__](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html)
- Open __Intel VTune Profiler__ and select the option to "Open Results" in the "Welcome" tab and select the vtune results directory that was downloaded, select the *.vtune file.
- Navigate to the "Graphics" tab and then "Platform" tab to analyze performance timeline and compute stats
- Refer to VTune Profiler documentation for more information

<img src="assets/vtune_profiler.png">