Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine_bitstream: Implement for rp2 and unify all variants. #7805

Closed
wants to merge 6 commits into from

Conversation

robert-hh
Copy link
Contributor

@robert-hh robert-hh commented Sep 15, 2021

Like the variant for esp32 and mimxrt it is implemented by active polling the systick counter. The rp2 systick counter seemed to be unused before. Since it is a 24 bit counter, the longest bit pulse pair must not exceed ~67ms at 125MHz or ~34ms at 250MHz. For speed, the code uses direct register access. The timing is precise within +/- 20ns at 125 Mhz clock.

I did not find a way to implement it efficiently using the PIO. The PIO has not sufficient scratch memory registers to store the four bit times.

@robert-hh robert-hh force-pushed the rp2_bitstream branch 2 times, most recently from e2f1fa5 to 7fb087f Compare September 15, 2021 20:22
@dpgeorge
Copy link
Member

It would be nice to combine all these bitstream implementations to reduce code duplication. But maybe do that as a refactoring exercise once they are all implemented?

@jimmo
Copy link
Member

jimmo commented Sep 16, 2021

Thanks @robert-hh !

On rp2 should we could consider freezing the neopixel.py module?

I did find a way to implement it efficiently using the PIO. The PIO has not sufficient scratch memory registers to store the four bit times.

Not sure I follow -- there's examples/rp2/pio_ws2812.py that does this. Or is the issue that the timing needs to be hardcoded, so a PIO implementation cannot handle arbitrary bit widths?

@robert-hh
Copy link
Contributor Author

@dpgeorge: These implementations are short and very hardware dependent. For instance, esp32 and mimxrt use a 32bit increasing counter, rp2 a 24 bit decreasing one; and all mechanisms are different. Combining these would not reduce the variation. So either you would move the hardware-related code into other files (like mphalport.*), or you would have #ifdef sections with all variants in the same file, which does not improve readability.
@jimmo : Of course I know the pio_ws2812 example. But that implements a hard-coded timing just for that device. Bitstream uses a configurable timing. A good PIO implementation should only require, that after a set-up only the bit data stream is fed into the state machine as sequence of bytes and no interim data buffer has to be allocated. But for that the four timing values would have to be stored in the state machine. And that's not possible. Only two values can be permanently stored. OSR would be needed for receiving the data, and another register for counting the time ticks.

@robert-hh robert-hh force-pushed the rp2_bitstream branch 2 times, most recently from 4cd2be3 to 2c36d89 Compare September 16, 2021 10:46
@robert-hh
Copy link
Contributor Author

The most recent attempt generalizes the hardware related calls, but one problem exists, of how to tell the compiler/linker to move that code into RAM. When executed from flash, the timing is not good. The first bit(s) timing is wrong.

@dpgeorge
Copy link
Member

Try looking at the EXCLUDE_FILE part of memmap_mp.ld.

@robert-hh
Copy link
Contributor Author

So that works fine for the ARM based devices. Where is it defined for the ESP32?

@dpgeorge
Copy link
Member

Where is it defined for the ESP32?

I'm not sure.

But, I'm happy for this PR to be merged as-is, ie without generalising it. We can refactor later on if it makes sense/is possible.

@robert-hh
Copy link
Contributor Author

I have a version now which is identical for mimxrt and rp2. That includes changes to the mimxrt files and the rp2 files, both c and .ld files. I can push that, or keep it aside.

@hoihu
Copy link
Sponsor Contributor

hoihu commented Sep 16, 2021

On rp2 should we could consider freezing the neopixel.py module?

IMO a good addition, especially when looking at the quickref doc:
https://docs.micropython.org/en/latest/rp2/quickref.html#neopixel-and-apa106-driver

I believe the user expects to have the neopixel module present when flashing the device...

@robert-hh
Copy link
Contributor Author

The not-yet-pushed esp32 variant is now also identical to rp2 and mimxrt, besides the instruction for the linker to move the code to RAM. I found information and examples how to do that, but I expect a little bit of exercise until it works. Then the function could be moved to a common place and the code in the respective mphalport.h files can be cleaned if required. What name should it have, if moved to extmod? machine_bitstream_high_low.c?

@robert-hh
Copy link
Contributor Author

The linking challenge for ESP32 is solved. Now machine_bitstream is identical for ESP32, rp2 and mimxrt.

@robert-hh robert-hh changed the title rp2: Implement the machine.bitstream() method. machine_bitstream: Implement for rp2 and unify the variants. Sep 17, 2021
@robert-hh robert-hh changed the title machine_bitstream: Implement for rp2 and unify the variants. machine_bitstream: Implement for rp2 and unify all variants. Sep 17, 2021
@robert-hh
Copy link
Contributor Author

With the most recent commit, all of stm32, esp32, esp8266, rp2 and mimxrt use the same implementation of machine_bitstream.c, located in drivers/bitstream. Except for the esp8266, the timing is pretty good. The ESP8266 is somewhat asymmetric, possibly requiring separate compensation for the high and low pulse.

@dpgeorge
Copy link
Member

Thanks for the effort to unify the variants.

Please note that @jimmo spent a lot of time tuning the stm32 code, in particular the Cortex-M0 version with hand-tuned assembly. Both versions for stm32 should remain unchanged after any code factoring (ie the generated machine code should be the same before and after refactoring).

Also, I think the common source code should be extmod/machine_bitstream.c, to live alongside extmod/machine_pulse.c etc.

@robert-hh
Copy link
Contributor Author

I have seen this STM32 code, a very pretty one with precise timing. It is removed for now, but I can put that back for STM32. STM32 would then use it's specific code. That would allow me to roll back a change that was made for STM32 only.
extmod/machine_bitstream exists already. That is the Python API module. Therefore I went for placing it into drivers/bitstream, assuming that there may drivers for other coding than just high/low sequences.

@robert-hh
Copy link
Contributor Author

I looked again into @jimmo's code. The non-Cortex M0 version is identical to the one I have as common version. Only instead of direct register access the common code has function calls, which in turn are defined as inline ode #defines in the port specific mphalport.h, turning into direct register access again. Wouldn't the cortex M0 also also apply to the RP2? I have to check. For RP2, I use at the moment the systick counter.

@robert-hh
Copy link
Contributor Author

robert-hh commented Sep 19, 2021

So this includes now the assembler version of @jimmo for STM32 Cortex-M0. Basically the common code is still the STM32 driver with stubs for Macros from mphalport.h instead of direct calls. Obviously the difference between the versions did not disappear. It just moved from the driver to mphalport.h. The donwside of unification is, that options for port-specific timing optimization are reduced.

All HW related calls to mphalport are placed in mphalport.h

Doing this the respective mp_hal_tick_cpu_xxx() functions or defines
cover the existing ports. These are:

mp_uint_t mp_hal_get_cpu_freq(void)
    Return the CPU frequency; also used by machine.freq()

mp_uint_t mp_hal_ticks_cpu(void)
    Return fast ticks as increasing with time values.

void mp_hal_ticks_cpu_init(void)
    Initialize the ticks_cpu counter. Required so far by the mimxrt and
    rp2 port. May be empty.

mp_uint_t mp_hal_ticks_cpu_reset(void)
    Resets the ticks_cpu counter and returns a start value. If the
    counter cannot be reset, it must return the value of
    mp_hal_ticks_cpu() (be identical to ..)

void mp_hal_pin_output(mp_hal_pin_obj_t pin)
    Sets pin to output mode.

void mp_hal_pin_low(mp_hal_pin_obj_t pin)
    Set the pin value to 0.

void mp_hal_pin_high(mp_hal_pin_obj_t pin)
    Set the pin value to 1.

MP_HAL_BITSTREAM_NS_OVERHEAD
    A define for the timing correction. For esp32 and mimxrt this
    value was 6, for rp2 it is 9.

Some of these may already exist in various ports. They can obviously
either be implemented as inline fuctions of #defines, where possible.
These three are now identical and use linker methods for
moving machine_bitstream to IRAM.
All hardware depedent code is in the respective mphalport.h
and the linker scripts.
- Adapt the Make files for esp32, rp2 and mimxrt and remove the local
  machine_bitstream.c copies.
- Rename mp_hal_ticks_cpu_init to mp_hal_ticks_cpu_enable to match
  the existing code of esp8266 and stm32.
- Rename mp_hal_ticks_cpu_reset to mp_hal_ticks_cpu_start. It sounds
  like a more appropriate name.
The timing error is higher than for the other ports, especially at
80 MHz. The timing is also worse than with the specific driver, but
good enough for neopixel and the like.
The STM32 mp_hal_ticks_cpu has the unfavourable habit that it checks
on each call whether cpu_ticks is enabled. That make the timing
somewhat instable. Therefore the calls were renamed
from mp_hal_ticks_cpu into mp_hal_ticks_bitstream and
mp_hal_ticks_cpu_start into mp_hal_ticks_bitstream_start. In all
ports, the respective defines are added to mphalport.h.
With that change, the STM32 implementation works pretty precise.
For the STM32 M0 MCU, the assembler version is used.

Timing varies quite a bit  with changing CPU clock frequencies.

Testing done with:
PYBV11 at 168MHz
PBYD_SF6 at 196MHz
ESP8266 at both 80 and 160MHz
ESP32 at 240MHz
RP2 Pico at 125, 180 and 250MHz
MIMXRT1050 at 600MHz
MIMXRT1020 at 500MHz

Testing was done with symmetrical 1000ns cycles.
The worst figure is for ESP8266 at 80Mhz, being off at about +62ns for
a low phase, -38ns for the high phase, and a total error of 20ns for
the full 2000ns cycle.
@robert-hh
Copy link
Contributor Author

I close that one for now. Reasons:

  • the new esp32 PR for bitstream works better than the old polling based.
  • the port-specific implementations have a more precise timing than the generalized one. That holds especially for the STM variant.

I will roll it back to the rp2-only implementation and make a PR for that later.

@robert-hh robert-hh closed this Nov 12, 2021
@dpgeorge
Copy link
Member

I will roll it back to the rp2-only implementation and make a PR for that later.

Ok, sounds good.

@robert-hh robert-hh deleted the rp2_bitstream branch November 15, 2021 15:52
tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Apr 7, 2023
…-rate

mimxrt: Fix output frequency for samples that don't divide 192kHz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants