# Lab: Practice the FPGA Development Flow

##### Sections
- [oneAPI with Intel® FPGAs](#oneAPI-with-Intel®-FPGAs)
- [Stage 1: Emulation](#Stage-1:-Emulation)
- [Stage 2: Optimization Report Generation](#Stage-2:-Optimization-Report-Generation)
- [How to Use the Terminal within Jupyter Lab](#How-to-Use-the-Terminal-within-Jupyter-Lab)
- [Emulation and Optimization Report Generation with Included Tutorials](#Emulation-and-Optimization-Report-Generation-with-Included-Tutorials)
- [References to Learn More](#References-to-Learn-More)

## Learning Objectives

* Understand the development flow for Intel® FPGAs with the Intel® oneAPI Toolkits
* Practice using the flow with a simple piece of code
* Practice using the flow with the FPGA tutorials included in the Intel oneAPI Base Toolkit


***
# oneAPI with Intel® FPGAs

The development flow for Intel FPGAs with oneAPI involves several stages. The purpose of these stages is so that you can
* Ensure functionality of your code (you get the correct answers from your computation)
* Ensure the custom hardware built to implement your code has optimal performance

Without having to endure the lengthy compile to a full FPGA executable each time.

The flow is represented in the diagram below.

In this lab, we will practice the first 2 stages of the flow - emulating your code to make sure your code is function, and generating an optimization report to see how optimized the hardware generated from your code is. (A subsequent lab will give you practice working with the optimization report.)

There is a tutorial for home if you would like to do a full FPGA compile and run on an FPGA.

<img src="Assets/fpga_flow.png">

***
# Stage 1: Emulation

The first stage of development for FPGAs with oneAPI is __emulation__. The purpose of emulation is to make sure that your code is __functional__, or in other words, that you __get the correct answers from your computations__.

The compile time for this stage will be very quick, usually seconds.

This quick compile time allows you to iterate through this stage many times, until your code is functionally correct.

__Now, let's give it a try!__

The code below implements a simple cumulative sum on an array of values.

The code is heavily commented, so please take a quick look now to get an overview of what it is doing. (Also - keep in mind this is a simple example. It wouldn't be worth it to ever use a lookaside acceleator to sum 100 integers!)

We will use this simple piece of code to learn the steps of the development flow for FPGAs with oneAPI.

__After you are finished examining the code, click ▶ to save the code to a file.__

In [None]:
%%writefile lab/simple_for_fpga.cpp
//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <CL/sycl.hpp>
#include <CL/sycl/intel/fpga_extensions.hpp>
using namespace sycl;
class simple_sum;
static const int N = 100;
int main(){
    
  //# Create an array of 100 incrementing numbers
  //# The sum should be 5050
  int summands[100];
  for (int i=0;i<100;i++) summands[i]=i+1;
    
  //# Create a variable to hold the sum
  int sum = 0;
    
  //# A -D switch will define which device we choose
  #if defined(FPGA_EMULATOR)
    intel::fpga_emulator_selector device_selector;
  #else
    intel::fpga_selector device_selector;
  #endif

  //# Buffers are used to share data between the host and the FPGA
  buffer<int, 1> buffer_summands(summands, 100);
  buffer<int, 1> buffer_sum(&sum, 1);

  //# define queue which has default device associated for offload
  //# The queue is used by the host to kick off code on the FPGA
  queue q(device_selector);
    
  //# Send the values to the FPGA or the FPGA emulator to calculate the sum
  //# You can think of the handler as a proxy for everything behind the scenes
  //#   that needs to happen between the host and the FPGA
  q.submit([&](handler &h) {
    //# The FPGA needs to have access to the buffers set up earlier
    //# The access is defined in terms of the FPGA's access
    auto acc_summands = buffer_summands.get_access<access::mode::read>(h);
    auto acc_sum = buffer_sum.get_access<access::mode::write>(h);
      
    //# This the code that gets executed on the FPGA
    //# This is often referred to as a kernel
    //# If you wanted to make simple_sum a function, you could,
    //#   and the FPGA Tutorials are written in this manner
    h.single_task<class simple_sum>([=]() {
      //# Kernel to add things up using FPGA or FPGA emulator
      //# Code inside here becomes hardware
      int kernel_sum = 0;
      for (int i=0;i<100;i++) kernel_sum = kernel_sum + acc_summands[i];
      acc_sum[0] = kernel_sum;
    });
  }).wait();

  //# Print Output
  std::cout << "The calculation is finished. The sum is ";
  std::cout << sum;
  std::cout << "." << std::endl;

  return 0;
}

__Now, you will compile the code to target the FPGA emulator.__

Recall from class, the command to do this is as shown below.

<img src="Assets/emulator_command.png">

The command we'll use also adds a -o switch to define the output filename.

__Now, make the code section below active (you will see a blue bar beside the section), and click ▶.__ This will compile the code into an executable targetting the FPGA emulator, and then run the emulated code. You will see output from the std::cout statements within the code.

In [None]:
! dpcpp -fintelfpga lab/simple_for_fpga.cpp -DFPGA_EMULATOR -o bin/simple_for_fpga.emu
! bin/simple_for_fpga.emu

__You should have seen output that looked like the below:__

The calculation is finished. The sum is 5050.

__It's always useful to see what happens when things don't go perfectly.__

Go back to the code you just executed and introduce a syntax error (or a few). Then, click ▶ for the section of the notebook with the code, and ▶ for the section of the notebook to compile and execute the code with the FPGA emulator.

__You can see how fast and easy emulating your code is!__

That was fast, like the software compiles most software developers are used to! This fast compile and execution are why you stay at this stage until your code is functional. (ie - You're getting the correct answers from your code!)

***
## Stage 2: Optimization Report Generation

In this next section of the lab, you will compile the kernel using different command line options with dpcpp in order to create an optimization report. You will also be using the Jupyter Lab interface to browse and open a file, so that will be explained to you.

Since the last part of the exercise told you to see what happens when you introduce syntax errors into the code, let's start fresh by writing the example code to a file again. Make the code section of the notebook below active and press ▶ to re-write the simple code example to a file.

In [None]:
%%writefile lab/simple_for_fpga.cpp
//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <CL/sycl.hpp>
#include <CL/sycl/intel/fpga_extensions.hpp>
using namespace sycl;
class simple_sum;
static const int N = 100;
int main(){
    
  //# Create an array of 100 incrementing numbers
  //# The sum should be 5050
  int summands[100];
  for (int i=0;i<100;i++) summands[i]=i+1;
    
  //# Create a variable to hold the sum
  int sum = 0;
    
  //# A -D switch will define which device we choose
  #if defined(FPGA_EMULATOR)
    intel::fpga_emulator_selector device_selector;
  #else
    intel::fpga_selector device_selector;
  #endif

  //# Buffers are used to share data between the host and the FPGA
  buffer<int, 1> buffer_summands(summands, 100);
  buffer<int, 1> buffer_sum(&sum, 1);

  //# define queue which has default device associated for offload
  //# The queue is used by the host to kick off code on the FPGA
  queue q(device_selector);
    
  //# Send the values to the FPGA or the FPGA emulator to calculate the sum
  //# You can think of the handler as a proxy for everything behind the scenes
  //#   that needs to happen between the host and the FPGA
  q.submit([&](handler &h) {
    //# The FPGA needs to have access to the buffers set up earlier
    //# The access is defined in terms of the FPGA's access
    auto acc_summands = buffer_summands.get_access<access::mode::read>(h);
    auto acc_sum = buffer_sum.get_access<access::mode::write>(h);
      
    //# This the code that gets executed on the FPGA
    //# This is often referred to as a kernel
    //# If you wanted to make simple_sum a function, you could,
    //#   and the FPGA Tutorials are written in this manner
    h.single_task<class simple_sum>([=]() {
      //# Kernel to add things up using FPGA or FPGA emulator
      //# Code inside here becomes hardware
      int kernel_sum = 0;
      for (int i=0;i<100;i++) kernel_sum = kernel_sum + acc_summands[i];
      acc_sum[0] = kernel_sum;
    });
  }).wait();

  //# Print Output
  std::cout << "The calculation is finished. The sum is ";
  std::cout << sum;
  std::cout << "." << std::endl;

  return 0;
}

__Now, you are going to compile the code to generate an optimization report.__

Recall from class, the commands to do this in two steps are shown below. (We are using the two-step method since there is a current issue showing the source code in the report with the one-step method.)

<img src="Assets/report_command.png">

The command below also includes a -o switch so that the output file can be given an explicit name.

__Make the code section below active and press ▶ to compile the code and generate an optimization report.__
__This compilation make take 2 minutes.__

In [None]:
! dpcpp -fintelfpga lab/simple_for_fpga.cpp -c -o bin/simple_for_fpga.o
! dpcpp -fintelfpga bin/simple_for_fpga.o -fsycl-link -Xshardware -o bin/simple_for_fpga.a
! echo "The compile is finished."

__When you see "The compile is finished." above, an optimization report file will have been generated for the code.__

Now, let's examine that report file.

Within the Jupyter Lab environment, you will see files to be browsed on the left side. Browse to the directory lab2/bin/simple_for_fpga.prj/reports/ (Double click on diretories to push down into them.)

The left side of your screen should look similar to the screenshot below:

<img src="Assets/browse_to_report.png">

__Double-click on report.html.__ The report will open up as another tab beside the notebook tab in Jupyter Lab, as shown below.

__You will probably need to click on "Trust HTML" for the report to open fully.__

<img src="Assets/report_initial_open.png">

__Once you have clicked on "Trust HTML," it will look like the below screenshot.__

<img src="Assets/trusted_report.png">

You've now learned how to use the first two stages of the FPGA development flow with oneAPI! You will spend most of your development time in these two stages. In the next sections of the lab, you will learn how to work with the built-in FPGA tutorials and example designs.

***
## How to Use the Terminal within Jupyter Lab

For the next lab session, you will be working within the terminal that is part of the Jupyter Lab. Working with the terminal when running Jupyter Lab on the DevCloud is like working within a terminal within a Linux development environment. Once it is open, the commands will be the same Linux commands you are used to.

__In order to open a terminal within the Jupyter Lab environment, first click the "+" near the top left of the Jupyter Lab environment in your browser.__ The "+" you need to click has a red box drawn around it in the screenshot below.

<img src="Assets/button_for_launcher.png">

Once you click the "+" a launcher tab will appear. __Click the terminal icon in the launcher, as shown below.__

<img src="Assets/button_for_terminal.png">

__After that, a new tab will appear within the Jupyter Lab working area that is a terminal.__ You will be at a prompt. You will be placed in your home diretory to begin with. The terminal tab with the prompt is shown below.

<img src="Assets/terminal_with_prompt.png">

The next section will give you instructions to execute within this terminal window. Note that you can highlight the command you are going to run, hit Ctrl-C to copy it, and hit Ctrl-V to paste it at the prompt if you would like.

***
## Emulation and Optimization Report Generation with Included Tutorials

The oneAPI Base Toolkit includes many tutorials and examples for learning more about writing optimized code for FPGAs. In this lab section, you will learn how to generate and compile those examples and tutorials.

All of the examples and tutorials use the CMake build process. If you are familiar with CMake or Make build processes, the instructions in this section will be familiar to you.

To begin, go to the terminal prompt that you opened in the last section of the lab.

Browse to the working directory for the lab. If you placed the lab files within your home diretory and unzipped them, the command will be as shown below.

~~~bash
$ cd labs/lab2/
~~~

The command to access the FPGA tutorials built into the toolkit is shown below. Run this command at the prompt now.

~~~bash
$ oneapi-cli
~~~

After you type this command into the terminal, the terminal should look like the screenshot below.

<img src="Assets/terminal_oneapi_cli.png">

Select __(1) Create a project__

Then, select __(1) cpp__

Your terminal screen should now look like the screenshot below.

<img src="Assets/tutorial_choices.png">

As you can see, there are many choices for FPGA Tutorials and FPGA Example Designs. The Tutorials provide example code to help you learn more about individual topics. Most topics covered in the programming guide and the optimization guides are covered.

Scroll down to __FPGA Tutorial: Loop Unroll__ and hit enter. In the field for Directory, type __tutorial__, as shown below.

<img src="Assets/tutorial_gen.png">

Then, hit enter until you've pressed enter to __Create__ the project. Then, hit __Quit__.

You should now be back at the terminal prompt.

At the prompt, change your directory to go inside the generated tutorial.

~~~bash
$ cd tutorial/loop_unroll
~~~

Take a look at the generated files by typing ls at the prompt.

~~~bash
$ ls
~~~

View the file README.md if you would like, by either opening it within the terminal using __vi__, or by browsing to it and clicking on it using the built-in file browser at the left in Jupyter Lab.

The next instructions are also found in the README.md file, if you would like to work from it instead. This lab will work through the "Compile and run for emulation" and "Generate HTML optimization reports" sections.

First, create a build directory and prepare it for running the CMake build process by executing the following commands (these assume that you start from the tutorial/loop_unroll directory created when you generated the tutorial.

~~~bash
$ mkdir build
$ cd build
$ cmake ..
~~~

Then, build the tutorial for emulation and run the resulting emulation executable by typing the following commands:

~~~bash
$ make fpga_emu
$ ./loop_unroll.fpga_emu
~~~

After running these commands, your terminal screen should look similar to the screenshot below. __Note that because the command is running as an emulated piece of code, the performance numbers do not reflect the FPGA performance. You would have to do the complete compile for the FPGA and run on the FPGA for those numbers.__

<img src="Assets/tutorial_emu_output.png">

__Next, you will compile to generate an optimization report.__

At the prompt, type the following command to generate an optimization report for the tutorial example.

~~~bash
$ make report
~~~

The output to your terminal should look like the screenshot below.

<img src="Assets/tutorial_report_build.png">

Now, locate the report.html file in the file browser to the left in the Jupyter Lab interface. The location of the file will be in:
~~~bash
<directory_where_you_unzipped_to>/labs/lab2/tutorial/loop_unroll/build/loop_unroll_report.prj/reports/
~~~

Open the report.html file within Jupyter Lab by double-clicking. __Reminder: You may need to click "Trust HTML"__

The report should look like the screenshot below.

<img src="Assets/unroll_example_report.png">

As a reminder, since compiling to a full FPGA executable involves a very long compile time andIntel® oneAPI DPC++ FPGA Optimization Guide there are limited DevCloud nodes available for this, we will not do that step. There is a tutorial if you would like to try that after class.

You are now finished with the lab.

If you have extra time, you can do the following things:
* Interact with your classmates if chatting by Discord is enabled for your class session.
* Read the references below to learn more.
* Generate more tutorial examples (that is an especially good thing to do if you are also reading the Intel® oneAPI DPC++ FPGA Optimization Guide).
* Take a short rest before the next section of the class begins.

***
## References to Learn More

Please refer to the following resources to learn more. This is a great thing to do if you have extra time during the lab!

#### FPGA Specific Documentation

* [Website hub for using FPGAs with oneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/fpga.html)
* [Intel® oneAPI Programming Guide](https://software.intel.com/content/www/us/en/develop/download/intel-oneapi-programming-guide.html)
* [Intel® oneAPI DPC++ FPGA Optimization Guide](https://software.intel.com/content/www/us/en/develop/download/oneapi-fpga-optimization-guide.html)
* [FPGA Tutorials GitHub](https://github.com/intel/BaseKit-code-samples/tree/master/FPGATutorials)

#### Intel® oneAPI Toolkit documentation
* [Intel® oneAPI main page](https://software.intel.com/oneapi "oneAPI main page")
* [Intel® oneAPI programming guide](https://software.intel.com/sites/default/files/oneAPIProgrammingGuide_3.pdf "oneAPI programming guide")
* [Intel® DevCloud Signup](https://software.intel.com/en-us/devcloud/oneapi "Intel DevCloud")  Sign up here if you do not have an account.
* [Intel® DevCloud Connect](https://devcloud.intel.com/datacenter/connect)  Login to the DevCloud here.
* [Get Started with oneAPI for Linux*](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux)
* [Get Started with oneAPI for Windows*](https://software.intel.com/en-us/get-started-with-intel-oneapi-windows)
* [Intel® oneAPI Code Samples](https://software.intel.com/en-us/articles/code-samples-for-intel-oneapibeta-toolkits)
* [oneAPI Specification elements](https://www.oneapi.com/spec/)

#### SYCL 
* [SYCL* Specification (for version 1.2.1)](https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf)

#### Modern C++
* [CPPReference](https://en.cppreference.com/w/)
* [CPlusPlus](http://www.cplusplus.com/)