Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Western Digital Delivery #31

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

MichaelZaidman
Copy link

The delivery content:

  • Cinco on Windows (see README for installation instructions)
  • Arduino core code improvements and optimizations (clocks and delays accuracy, code size reduction, interrupts, missed APIs and files, etc.)
  • GCC and OpenOCD configurations
  • Libraries (Touch screen, SW UART, SW I2C, tone, servo, NeoPixel LED, RGB LED, ...)

The code is based on the AugustUpdate branch.

dbarbi1 and others added 30 commits August 8, 2017 16:53
Prevent from recompiling unmodified files on every code change:

Using previously compiled file: arduino_build_118542\core\WInterrupts.c.o
Using previously compiled file: arduino_build_118542\core\hooks.c.o
Using previously compiled file: arduino_build_118542\core\itoa.c.o
Using previously compiled file: arduino_build_118542\core\malloc.c.o
Using previously compiled file: arduino_build_118542\core\sbrk.c.o
Using previously compiled file: arduino_build_118542\core\wiring.c.o
Using previously compiled file: arduino_build_118542\core\wiring_analog.c.o
Using previously compiled file: arduino_build_118542\core\wiring_digital.c.o
Using previously compiled file: arduino_build_118542\core\wiring_shift.c.o
Using previously compiled file: arduino_build_118542\core\drivers\fe300prci\fe300prci_driver.c.o
Using previously compiled file: arduino_build_118542\core\drivers\plic\plic_driver.c.o
Using previously compiled file: arduino_build_118542\core\Print.cpp.o
Using previously compiled file: arduino_build_118542\core\UARTClass.cpp.o
Using previously compiled file: arduino_build_118542\core\WMath.cpp.o
Using previously compiled file: arduino_build_118542\core\WString.cpp.o
Using previously compiled file: arduino_build_118542\core\abi.cpp.o
Using previously compiled file: arduino_build_118542\core\main.cpp.o
Using previously compiled file: arduino_build_118542\core\new.cpp.o
Using previously compiled file: arduino_build_118542\core\wiring_pulse.cpp.o

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Since Arduino IDE is lacking of the debugging capability there is a little
sence to compile with debug info enabled.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The empty Arduino sketch for HiFive1 allocates 4200 bytes.
It's almost ten times more then 444 bytes for Arduino Uno build:
  Sketch uses 444 bytes (1%) of program storage space. Maximum is 32,256 bytes.

One of the major memory eaters is a stack size of 2KB configured as default
in linker script. Since this script serves also Freedom Studio IDE, define
the stack size of 512 bytes explicitly for Arduino build in the platform.txt file.

Before:
  Sketch uses 4,200 bytes (0%) of program storage space. Maximum is 8,388,608 bytes.

After:
  Sketch uses 2664 bytes (0%) of program storage space. Maximum is 8,388,608 bytes.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The OpenOCD prints a lot of INFO to the console comparing to
the Arduino UNO uploader. This patch decreases the verbosity level of OpenOCD
to several messages only.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The lightweight RGBL library tailored by default to the builtin RGB LED
on HiFive1 board. Can be easily configured for any common anode connected
RGB LED on any board.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The following examples are added:

- blink RGB LED via RGBL APIs
- fade RGB LED via RGBL APIs

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The rdmcycle() takes 10 cycles or ~630ns to complete at 16MHz CPU clk.
Togeter with 64bit arithmetics it led to low accuracy and high jitter
for below 10 usec delay values. Replacing the rdmcycle() polling with
"nop" sequence solved the problem.

Test results (excluding GPIO overhead of about 1.9us when measured
with "GPIO_REG(GPIO_OUTPUT_VAL) ^= (1 << PIN_3_OFFSET)"):

  usec      "nop"        "rdmcycle()"
   1         0.9          1.2 - 1.4
   2         2            2.2 - 2.5
   4         4            4.1 - 4.4
   10        10.4         10.5

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Use asm to prevent GCC from performing loop unrolling even though the flags
corresponding to this behaviour explicitly say don't do so.

Remeasured the delays with new macros and with rdmcycle overhead compensation:

     * The rdmcycle() takes 10 cycles or ~630ns to complete at 16MHz CPU clk.
     * Together with 64bit arithmetics it led to low accuracy and high jitter
     * for below 4 usec delay values. Replacing the rdmcycle() polling with
     * "nop" sequence improved the situation.
     *
     * Test results (excluding GPIO overhead of about 1.3us x 2 when measured
     * with "GPIO_REG(GPIO_OUTPUT_VAL) ^= (1 << PIN_3_OFFSET)")
     *
     * usec      "nop"        "rdmcycle()"    "rdmcycle() w/o overhead"
     *  1         1.13         1.6 - 2.2        1.6 - 2.2
     *  2         2.3          2.9 - 3.1        2.2 - 2.4
     *  3         3.1          4.07             2.9 - 3.5
     *  4         4            5.2              3.8 - 4
     *  5         4.8          5.8              5.2
     *  10        9            10.8             10.2

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
With -O2 optimization, compiler generates digitalPinToBitMask(pin) calculation
code twice - for if and else branches. Help the compiler explicitly to generate
smaller code. This saves 16 byts.

Before:

digitalWrite(uint32_t pin, uint32_t val)
{
  if (pin >= variant_pin_map_size)
204001a8:       800007b7                lui     a5,0x80000
204001ac:       4287a283                lw      t0,1064(a5) # 80000428 <_sp+0xffffc428>
204001b0:       02557863                bleu    t0,a0,204001e0 <digitalWrite+0x38>
    return;

  if (val)
    GPIO_REG(GPIO_OUTPUT_VAL) |=  digitalPinToBitMask(pin);
204001b4:       20401337                lui     t1,0x20401
204001b8:       050a                    slli    a0,a0,0x2
204001ba:       db030393                addi    t2,t1,-592 # 20400db0 <variant_pin_map>
204001be:       00a38633                add     a2,t2,a0
  if (val)
204001c2:       e185                    bnez    a1,204001e2 <digitalWrite+0x3a>
  else
    GPIO_REG(GPIO_OUTPUT_VAL) &= ~digitalPinToBitMask(pin);
204001c4:       00164583                lbu     a1,1(a2)
204001c8:       10012737                lui     a4,0x10012
204001cc:       4754                    lw      a3,12(a4)
204001ce:       4805                    li      a6,1
204001d0:       00b818b3                sll     a7,a6,a1
204001d4:       fff8ce13                not     t3,a7
204001d8:       00de7eb3                and     t4,t3,a3
204001dc:       01d72623                sw      t4,12(a4) # 1001200c <__stack_size+0x1001180c>
204001e0:       8082                    ret
    GPIO_REG(GPIO_OUTPUT_VAL) |=  digitalPinToBitMask(pin);
204001e2:       10012f37                lui     t5,0x10012
204001e6:       00164f83                lbu     t6,1(a2)
204001ea:       00cf2283                lw      t0,12(t5) # 1001200c <__stack_size+0x1001180c>
204001ee:       4785                    li      a5,1
204001f0:       01f79533                sll     a0,a5,t6
204001f4:       00556333                or      t1,a0,t0
204001f8:       006f2623                sw      t1,12(t5)
204001fc:       8082                    ret

After:

digitalWrite(uint32_t pin, uint32_t val)
{
  if (pin >= variant_pin_map_size)
204001c6:       800007b7                lui     a5,0x80000
204001ca:       4287a283                lw      t0,1064(a5) # 80000428 <_sp+0xffffc428>
204001ce:       02557963                bleu    t0,a0,20400200 <digitalWrite+0x3a>
    return;

  uint32_t bitmask = digitalPinToBitMask(pin);
204001d2:       20401337                lui     t1,0x20401
204001d6:       050a                    slli    a0,a0,0x2
204001d8:       d3030393                addi    t2,t1,-720 # 20400d30 <variant_pin_map>
204001dc:       00a38633                add     a2,t2,a0
204001e0:       00164703                lbu     a4,1(a2)
  if (val)
    GPIO_REG(GPIO_OUTPUT_VAL) |=  bitmask;
204001e4:       100128b7                lui     a7,0x10012
204001e8:       00c8ae03                lw      t3,12(a7) # 1001200c <__stack_size+0x1001180c>
  uint32_t bitmask = digitalPinToBitMask(pin);
204001ec:       4685                    li      a3,1
204001ee:       00e69833                sll     a6,a3,a4
  if (val)
204001f2:       e981                    bnez    a1,20400202 <digitalWrite+0x3c>
  else
    GPIO_REG(GPIO_OUTPUT_VAL) &= ~bitmask;
204001f4:       fff84593                not     a1,a6
204001f8:       01c5feb3                and     t4,a1,t3
204001fc:       01d8a623                sw      t4,12(a7)
20400200:       8082                    ret
    GPIO_REG(GPIO_OUTPUT_VAL) |=  bitmask;
20400202:       01c86f33                or      t5,a6,t3
20400206:       01e8a623                sw      t5,12(a7)
2040020a:       8082                    ret

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
The delayMicroseconds() code size is reduced from 206 to 162 bytes when used
with likely() hint.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
It will take 1827 years to wraparound the 64 bit counter at 320MHz clock.
So we can safely remove the wraparound part and free another 162-68=94 bytes.

This also reduces inaccuracy of microsecond delay for small numbers at low
CPU frequences.

The code looks much simpler and smaller now:
    rdmcycle(&current);
204000ae:       b80027f3                csrr    a5,mcycleh
204000b2:       b00026f3                csrr    a3,mcycle
204000b6:       b8002773                csrr    a4,mcycleh
204000ba:       fee79ae3                bne     a5,a4,204000ae <loop>
    later = current + usec * (F_CPU/1000000);
204000be:       65a1                    lui     a1,0x8
204000c0:       d0058293                addi    t0,a1,-768 # 7d00 <__stack_size+0x7500>
204000c4:       00568333                add     t1,a3,t0
204000c8:       00d33733                sltu    a4,t1,a3
204000cc:       00f703b3                add     t2,a4,a5
    while (later > current)
204000d0:       0077fc63                bleu    t2,a5,204000e8 <loop+0x3a>
      rdmcycle(&current);
204000d4:       b80027f3                csrr    a5,mcycleh
204000d8:       b00026f3                csrr    a3,mcycle
204000dc:       b8002673                csrr    a2,mcycleh
204000e0:       fec79ae3                bne     a5,a2,204000d4 <loop+0x26>
    while (later > current)
204000e4:       fe77e8e3                bltu    a5,t2,204000d4 <loop+0x26>
204000e8:       00f39463                bne     t2,a5,204000f0 <loop+0x42>
204000ec:       fe66e4e3                bltu    a3,t1,204000d4 <loop+0x26>
204000f0:       8082                    ret

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
It will take 1827 years to wraparound the 64 bit counter at 320MHz clock.
So we can safely remove the wraparound part and free another 118 bytes.

before:
	541065458 00000210 T delay
after:
	541065458 00000092 T delay

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Since we enabled 3 predefined CPU frequencies only, there is no sense
to compile generic pll configuration code calculating corresponding
divider ratio for 320MHz CPU clock frequency at run time. Instead, do it
in the same way as it's done for 256MHz CPU clock.

This decreased the code for 320MHz build by 1.8K:

before:
   text	   data	    bss	    dec	    hex
   5644	   1076	   2084	   8804	   2264

after:
   text	   data	    bss	    dec	    hex
   3792	   1076	   2084	   6952	   1b28

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
It turned out that ~320MHz is actually 306MHz, since the 1000 usec delay
is expressed in 1046 usec measured by oscilloscope. This fix is a quick
workaround until more appropriate solution will be found.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
For 320MHz the PLL dividers were calculated in PRCI_set_hfrosctrim_for_f_cpu()
at run time. As result of the calculations the actual F_CPU was measured as
306MHz.

The "e188c35 main: decrease pll configuration code size for 320MHz clk" commit
explicitly set the divider values to exactly 320MHz making unnecessary the
"f85a104 wiring: fix delay accuracy at ~320mhz" commit. So we revert it.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
A library supplementing software UART, intended to become
a generic SoftwareSerial library for 32-bit MCUs.

Tested on RISC-V based Hifive1 board from SiFive.
UART is stable on 9600, 19200 16MHz CPU frequency,
in addition to 38400, 57600* and 115200* bps at 256/320 MHz.

*: works best on data bursts not larger than 100 bytes.

Signed-off-by: Shadi Kamal <Shadi.Kamal@wdc.com>
HiFive1 HW I2C is not operational. The below libraries implement SW I2C.

The libraries were taken from the following URLs:
https://github.com/felias-fogg/SlowSoftI2CMaster
https://github.com/felias-fogg/SlowSoftWire

Signed-off-by: Roi Weisfeld <Roi.Weisfeld@wdc.com>
Original library had hard-coded delay.
Modified it so that it has the same default delay but also adds an
option during the creation of the instance to change the clock rate to a
custom one.

The motivation for this change was that the HiFive1 can run on higher
clocks than the hard-coded delay and originally there was no way to
change the delay without modifying the library.

Signed-off-by: Roi Weisfeld <Roi.Weisfeld@wdc.com>
Motivation for this update is that the Stream.h is outdated and was missing API
such as readBytes and readStrings and Stream.cpp was missing from the
repository.

Signed-off-by: Roi Weisfeld <Roi.Weisfeld@wdc.com>
Added simple touch screen library for SPI "2.8 TFT SPI 240x320 V1.1"
LCD display panel.

Why yet another touchscreen library?
Looked for touch screen library for "2.8 TFT SPI 240x320 V1.1" aftermarket panel
to use with RISC-V based HiFive1 board. Examined URTouch and XPT2046 libraries
but nothing worked for HiFive1 board out of the box. On top of that, none of the
libraries provided touchscreen to LCD resolution normalization.
So, I wrote the SimpleTouchscreen library with intention to keep it simple :-)

- Calibrated for SPI "2.8 TFT SPI 240x320 V1.1" ILI9341 LCD display panel
  bundled with XPT2046 touchscreen controller.
- Returns raw (in touchscreen resolution) or normilized (in LCD resolution)
  X and Y reading.
- Tested with HiFive1 and Arduino Uno boards.
- Tested with Adafruit_ILI9341 and Adafruit_GFX.h LCD libraries.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
- The reading from the resistive touch screen are very noisy.
  Improve low pass filter by ignoring 2 lowest and 2 highest samples
  in x and y calculation.
- Minor improvments in the examples.
- Updated README.md with photo of the test setup.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Implementation of Tone functions for Arduino. This is implemented as
Arduino core functions.

Limitations:
- Only works on pins 3,5,6 (PWM1) and 17,18,19 (PWM2).
- Supports pins 9,10,11 (PWM0), but there is a potential HW issue preventing
  PWM0 from working with large prescalar values, so nothing will be output
  on these pins.
- Only one active tone at a time with duration argument (subsequent calls will
  stop previous timed tone).
- Supports multiple tones, as long as they are on different PWMs (e.g. can
  play tones simultaneously on pins 3 and 17).
- Increasingly higher frequencies will lose accuracy due to rounding errors
- Very high frequencies will lose accuracy to the point where there is no
  difference between subsequent semitones.

Testing:
- Verified audibility with piezo beepers on pins 3,5,6,17,18,19.
- Verified 50% duty square waveform with oscilloscope.
- Verified input frequencies with multimeter.
austinliou and others added 13 commits October 2, 2018 10:42
Finalized SoftwareSerial32 library code:

 - Change infrastructure to make the library ready for future multi-uart support.
 - Resolved bug preventing usage of some pins (digital pin numbers 15-19).
 - Minor optimization to ISR.
 - Added baud rate values of 250k and 500k @ 256MHz CPU frequency.
 - Resolved bug in not clearing the interrupt pending registers
 correctly when multiple external interrupt sources are present.

Signed-off-by: Shadi Kamal <Shadi.Kamal@wdc.com>
Pulled Adafruit_NeoPixel from upstream based on ec3c77e28e84b221dc159b48358929cb24ba97bd.
Added support for HiFive1 board for 256MHz and 320MHz clock speeds for WS2812B and WS2812.
Tested with the simple and strandtest sketch and verified that it is working for WS2812B.

Adafruit_NeoPixel pulled from upstream ec3c77e28e84b221dc159b48358929cb24ba97bd
    - Added support for SiFive HiFive1 board.
    - 16 MHz clock speed is not supported
    - 40 KHz LEDs are not supported.
Renamed folders to remove "-master" suffix.

Removed 2 duplicate examples
(they can both be found in the Arduino Hardware Wire library).

Now using setClock function which is Wire compatible,
max rates and limitations are detailed in the README in SlowSoftI2CMastere folder.

Also optimized redundant code.

Signed-off-by: Roi Weisfeld <Roi.Weisfeld@wdc.com>
Signed-off-by: Michael Zaidman <Michael.Zaidman@wdc.com>
- PWM0 pins 9, 10 and 11 are not supported in Servo, because timing cannot
  be generated with 8bit counter.
- All three CPU speeds are supported
- Works on the following pins: 3, 5, 6, 17, 18, 19 (16bit PWM1 & PWM2)
Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Signed-off-by: Shadi Kamal <Shadi.Kamal@wdc.com>
Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
On a minimal sketch the program would just get stuck after consecutive calls
of clear_csr(mstatus, MSTATUS_MIE) and set_csr, not even writing  to serial or
turning a LED. Although mstatus MIE bit is necessary at least once for
interrupts to work properly but does not reflect correctly the desired behavior
of interrupts() and noInterrupts() although theoretically they're the same.
Thus, we turn off/on the ISR via the lesser interrupt enable control bits (in
mie register) while turning the global interrupts (mstatus MIE bit) only once
at the beggining.

Signed-off-by: Shadi Kamal <Shadi.Kamal@wdc.com>
Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
@palmer-dabbelt
Copy link
Contributor

Thanks for submitting this, I'll try to take a look. Unfortunately we're a bit oversubscribed right now so I can't promise when.

Signed-off-by: Michael Zaidman <michael.zaidman@wdc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants