VideoCore IV Kernels under Linux

hermanhermitage edited this page Dec 17, 2012 · 3 revisions

Tutorial

Starting with https://github.com/Hexxeh/rpi-firmware/commit/6b437596141c19718f9ed23ec7b51a3da21d95a5, the RaspberryPi GPU firmware has been updated with a memory allocation and execution API on mailbox property interface. This allows VideoCore binary code to be uploaded and executed. Whilst there is no documented access to internal software and hardware services, this still enables the vector integer unit to be used to execute kernels.

Running your first Kernel

Preparing Linux with Appropriate Firmware Version

  • Prepare an SD card with the latest raspbian (2012-10-28-wheezy-raspbian.img or later).
  • Boot and go thru rasp-config to configure as desired.
  • Update your GPU firmware to a version supporting the required Mailbox property interface:
    sudo apt-get update
    sudo wget http://goo.gl/1BOfJ -O /usr/bin/rpi-update && sudo chmod +x /usr/bin/rpi-update
    sudo apt-get install git-core
    sudo rpi-update
    sudo reboot
    sudo rpi-update d27ca   # or a later version, see https://github.com/Hexxeh/rpi-firmware/commits/next

Configure CMA settings

Required for the newer firmware images - see: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=29&t=19334.

  • add to /boot/cmdline.txt
    coherent_pool=2M cma=2M smsc95xx.turbo_mode=N
  • add to /boot/config.txt
    gpu_mem_256=160
    gpu_mem_512=316
    cma_lwm=16
    cma_hwm=32
  • Reboot your pi.

Run a demo

    mkdir videocoretest
    cd videocoretest
    sudo mknod char_dev c 100 0
    wget https://raw.github.com/raspberrypi/linux/rpi-3.6.y/arch/arm/mach-bcm2708/include/mach/vcio.h
    wget https://dl.dropbox.com/u/3669512/temp/mailbox.c
    wget https://dl.dropbox.com/u/3669512/temp/alpha.bin
  • edit mailbox.c change #include ".../vcio.h" to #include "vcio.h"
  • gcc mailbox.c
  • sudo ./a.out

Inside the sample kernel

void alpha_blt_block(unsigned char *dest, unsigned char *src1, unsigned char *src2, unsigned char *alpha, int size);

00000000: 0100 0010 0000 0100: .B         :4204                     ; add r4, r0
00000002: 1011 0000 0000 0101: .. .       :b005 0020                ; mov r5, #0x0020
00000006: 1111 1000 0000 1011: ..8...@... :f80b 8438 0380 f940 0008 ; vld HX(16++,0), (r2+=r5) REP8
00000010: 1111 1000 0000 1011: ..8...@... :f80b 8838 0380 f940 000c ; vld HX(32++,0), (r3+=r5) REP8
0000001a: 1111 1000 0000 1011: ..8...@... :f80b 8038 0380 f940 0004 ; vld HX(0++,0), (r1+=r5) REP8
00000024: 1111 1101 1001 0011: .... ...>. :fd93 0401 0020 fbe0 003e ; vmul H(16++,0), H(16++,0), H(32++,0) REP8
0000002e: 1111 1101 1001 0011: ...$....>. :fd93 2409 00a0 fbe0 003e ; vmul H(16++,16), H(16++,16), H(32++,16) REP8
00000038: 1111 1101 0100 0011: C.".....?. :fd43 8822 07ff fbe0 003f ; vsub HX(32++,0), HX(32++,0), #65535 REP8
00000042: 1111 1101 1001 0011: .... ...>. :fd93 0000 0020 fbe0 003e ; vmul H(0++,0), H(0++,0), H(32++,0) REP8
0000004c: 1111 1101 1001 0011: ... ....>. :fd93 2008 00a0 fbe0 003e ; vmul H(0++,16), H(0++,16), H(32++,16) REP8 
00000056: 1111 1101 0000 0011: .. .....>. :fd03 8020 0210 fbe0 003e ; vadd HX(0++,0), HX(0++,0), HX(16++,0) REP8
00000060: 1111 1000 1000 1011: .. ....S.. :f88b e020 0380 53e0 0000 ; vst HX(0++,0), (r0+=r5) REP8
0000006a: 0101 0110 0101 0000: PV         :5650                     ; adds8 r0, r5
0000006c: 0101 0110 0101 0001: QV         :5651                     ; adds8 r1, r5
0000006e: 0101 0110 0101 0010: RV         :5652                     ; adds8 r2, r5
00000070: 0101 0110 0101 0011: SV         :5653                     ; adds8 r3, r5
00000072: 1000 1011 0000 0000: ...S       :8b00 53ca                ; blt r0, r4, 0x00000006
00000076: 0000 0000 0101 1010: Z.         :005a                     ; rts

Commentary

vld HX(16++,0), (r2+=r5) REP8
; REP8 causes the instruction to be repeated 8 times, and is equivalent to the 8 instructions:
;   vld HX(16, 0), (r2+'0*r5')
;   vld HX(17, 0), (r2+'1*r5')
;   ...
;   vld HX(23, 0), (r2+'7*r5')
;
; vld HX(16, 0), (r2+0*r5) means
;  unsigned char *src = r2+0*r5;
;  P(16,0) = *src++; P(16,16) = *src++; 
;  P(16,1) = *src++; P(16,17) = *src++; 
;  ...
;  P(16,15) = *src++; P(16,31) = *src++;