Skip to content

Commit

Permalink
doc: merge labs
Browse files Browse the repository at this point in the history
Signed-off-by: Yiping Peng <yibingp@synopsys.com>
  • Loading branch information
BabaYB committed Dec 5, 2018
2 parents 5cbe3b6 + 4afd567 commit 805ce83
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 155 deletions.
2 changes: 1 addition & 1 deletion .travis/deploy_doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ tar xzf doc.tar.gz || die
rm -rf doc.tar.gz || die

git add --all || die
git commit -s -a -m "doc: Push updated generated sphinx documentation of commit ${TRAVIS_COMMIT}" || die
git commit -s -a -m "doc: Push updated generated sphinx documentation of commit ${TRAVIS_COMMIT} ${TRAVIS_COMMIT_MESSAGE}" || die
if [ $? -eq 0 ] ; then
echo 'Push changes to gh-pages branch.'
git push ${REPO_LINK} gh-pages:gh-pages > /dev/null 2>&1 || die
Expand Down
171 changes: 53 additions & 118 deletions doc/documents/labs/level1/lab1.rst

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions doc/documents/labs/level1/lab5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,9 @@ Exercises

Try to create you own application to control the peripherals of ARC board

.. note::
The ARC |iotdk| is powered over USB. Note that the ARC |iotdk| needs to be powered by an external power adapter if additional devices are connected to the extension interfaces. External power supply must be 5V DC (A 12V power supply will most probably damage your board).

.. |figure1| image:: /img/lab5_emsk.png
:alt: lab5_emsk
:width: 400
Expand Down
2 changes: 1 addition & 1 deletion doc/documents/labs/level2/lab7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ Open ``cmd`` from the folder *embarc_osp/arc_labs/labs/lab6_ble_rn4020*, input t

.. code-block:: console
make run
make BOARD=iotdk TOOLCHAIN=gnu run
Then the output is displayed in the serial terminal.
|figure1|
Expand Down
26 changes: 11 additions & 15 deletions doc/documents/labs/level2/lab_dsp1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,11 @@ To optimize code with DSP extensions, two sets of compiler options are used thro
DSP Extensions Options
^^^^^^^^^^^^^^^^^^^^^^^^^^

Use |embarc| build system to build tool. The details can be found in |embarc| document page. Here is the example command. You can pass extra compiler/linker options by ADT_COPT/ADT_LOPT.
Use |embarc| build system to compile the code. The details can be found in |embarc| document page. Here is the example command. You can pass extra compiler/liner options by ADT_COPT/ADT_LOPT.

.. code-block:: console
gmake BOARD=emsk BD_VER=23 CUR_CORE=arcem9d TOOLCHAIN=mw gui ADT_COPT="-Hfxapi -Xdsp2" OLEVEL=O2
gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw ADT_COPT="-Hfxapi -Xdsp2" OLEVEL=O2
Options that are used in the lab are:

Expand Down Expand Up @@ -99,11 +99,11 @@ Options that are used in the lab are:

* ``-Xagu_small, -Xagu_medium, -Xagu_large``:

Enables AGU, and specifies its size. Note, IOTDK has small AGU
Enables AGU, and specifies its size.

.. note::

Because ARC is configurable processor, different cores can contain different extensions on hardware level. Therefore, options set for compiler should match underlying hardware. On the other hand, if specific hardware feature is present in the core but compiler option is not set, it cannot be used effectively, if used at all. IOTDK Core default options are presented in Appendix A.
Because ARC is configurable processor, different cores can contain different extensions on hardware level. Therefore, options set for compiler should match underlying hardware. On the other hand, if specific hardware feature is present in the core but compiler option is not set, it cannot be used effectively, if used at all. IOTDK Core default options are presented in tcf file.

Optimization level
^^^^^^^^^^^^^^^^^^^^
Expand All @@ -120,44 +120,40 @@ A regular code without direct usage of DSP extensions can be optimized to use DS
Steps
--------------------------

Step 1. Compiling without DSP extensions
1. Compiling with option -O0, DSP extensions will be specified in TCF file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Set optimization level "-O0", and no DSP extensions (unchecking -Xdsp1, -Xdsp2).

After compilation, open disassembly window and check assembly code for function "test".

Below is the list of options used when launching gmake:

``OLEVEL=O0 ADT_COPT="-arcv2em -core1 -Xlib -Xtimer0 -Xtimer1"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw OLEVEL=O0``

You can use the following command to generate disassembly code:
You can use the following command to generate disassembly code, and check assembly code for function "test".

``elfdump -T -S <your_working_directory>/obj_iotdk_10/mw_arcem9d/dsp_lab1_mw_arcem9d.elf``

Notice assembly code in the disassembled output. See how many assembly instruction are used for each line. For example, for loop spends several instruction to calculate loop variable value and check whether to stop.

|dsp_figure_1.1|

Step 2. Compiling without DSP extensions, with -O2
2. Compiling with DSP extensions, with -O2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compile with:

``OLEVEL=O2 ADT_COPT="-arcv2em -core1 -Xlib -Xtimer0 -Xtimer1"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw OLEVEL=O2``

Adding optimization level -O2, optimizes out many of the instructions:

|dsp_figure_1.2|

In this code it is easy to find zero-delay loop ("lp" command) which acts as for loop. Note that multiply-accumulate is done with separate "mpyw_s" and "add1_s" instructions.

Step 3. Compiling with DSP extensions
3. Compiling with DSP extensions, with -O3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Compile with:

``OLEVEL=O3 ADT_COPT="-arcv2em -core1 -Xlib -Xtimer0 -Xtimer1 -Xdsp1"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw OLEVEL=O3``

Adding -Xdsp1 (optimization level changed to -O3) helps compiler to optimize away "mpyw_s" and "add1_s" instructions and replace them with hardware dual-16bit SIMD multilication "vmpy2h". Notice the loop count is now 5.

Expand Down
18 changes: 8 additions & 10 deletions doc/documents/labs/level2/lab_dsp2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,35 +104,33 @@ The main performance check loop is shown in the following example. The outer loo
Steps
------------

To test the following example, some modification of the code is required to have two loops with and without DSP. You must re-build libraries for this particular configuration of IOTDK:
To test the following example, some modification of the code is required to have two loops with and without DSP.

``buildlib my_dsp -tcf=<IOTDK tcf file> -bd . -f``
Firstly you must build DSP libraries for this particular configuration of IOTDK:

``buildlib my_dsp -tcf=<IOTDK tcf file> -bd ../ -f``

|iotdk| tcf file can be found in ``embarc_osp/board/iotdk/configs/10/tcf/arcem9d.tcf``

Both examples are to be compiled with DSP extensions.

Step 1. Run program without FXAPI
1. Run program without FXAPI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Build with the command:

``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw ADT_COPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf``

``-Xdsp_complex" ADT_LOPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf -Hlib=./my_dsp"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw gui ADT_COPT="-Hdsplib" ADT_LOPT="-Hdsplib -Hlib=../my_dsp"``

With high optimization level functions using "short" type is compiled to use DSP MAC operation, enabling significant speedup.

|dsp_figure_2.1|

Step 2. Run program with FXAPI
2. Run program with FXAPI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Rename main.c.fxapi to main.c, then execute the command:

``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw ADT_COPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf``

``-Xdsp_complex" ADT_LOPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf -Hlib=./my_dsp"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw gui ADT_COPT="-Hdsplib" ADT_LOPT="-Hdsplib -Hlib=../my_dsp"``

However, using FXAPI enables compiler to directly use complex MAC instruction "cmachfr".

Expand Down
20 changes: 10 additions & 10 deletions doc/documents/labs/level2/lab_dsp3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -256,27 +256,27 @@ Using |iotdk| board for performance comparison
Steps
-----------------

Both examples are to be compiled with DSP extensions, with the following options set:
Firstly you must build DSP libraries for this particular configuration of IOTDK:

``-O2 -arcv2em -core1 -Xlib -Xtimer0 -Xtimer1 -Xdsp1 -Hdsplib``
``buildlib my_dsp -tcf=<IOTDK tcf file> -bd ../ -f``

Step 1. Run program without DSP library
|iotdk| tcf file can be found in ``embarc_osp/board/iotdk/configs/10/tcf/arcem9d.tcf``

Both examples are to be compiled with DSP extensions.

1. Run program without DSP library
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Build with the command:

``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw gui ADT_COPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw ADT_COPT="-Hdsplib" ADT_LOPT="-Hdsplib -Hlib=../my_dsp"``

``-Xdsp_complex" ADT_LOPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf -Hlib=./my_dsp"``

Step 2. Run program with DSP library
2. Run program with DSP library
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Rename main.c.dsplib to main.c, then execute the command:

``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw gui ADT_COPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf``

``-Xdsp_complex" ADT_LOPT="-Hdsplib -Xdsp2 -tcf=./arcem9d.tcf -Hlib=./my_dsp"``
``gmake BOARD=iotdk BD_VER=10 CUR_CORE=arcem9d TOOLCHAIN=mw ADT_COPT="-Hdsplib" ADT_LOPT="-Hdsplib -Hlib=../my_dsp"``

Note that DSPLIB is statically linked with the project when -Hdsplib is set, and as the DSPLIB itself is pre-compiled with high level of optimization, changing optimization option for example program does not affect DSPLIB performance. On the other hand, even with highest optimization level a function utilizing simple instructions on "short" type (even converted to MACs if possible) is less efficient that direct use of DSPLIB.

Expand Down

0 comments on commit 805ce83

Please sign in to comment.